$ docker version
Client:
Version: 27.1.1
API version: 1.46
Go version: go1.21.12
Git commit: 6312585
Built: Tue Jul 23 19:54:12 2024
OS/Arch: darwin/arm64
Context: desktop-linux
Server: Docker Desktop 4.33.0 (160616)
Engine:
Version: 27.1.1
API version: 1.46 (minimum version 1.24)
Go version: go1.21.12
Git commit: cc13f95
Built: Tue Jul 23 19:57:14 2024
OS/Arch: linux/arm64
Experimental: false
containerd:
Version: 1.7.19
GitCommit: 2bf793ef6dc9a18e00cb12efb64355c2c9d5eb41
runc:
Version: 1.7.19
GitCommit: v1.1.13-0-g58aa920
docker-init:
Version: 0.19.0
GitCommit: de40ad0
$ docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
$ docker volume ls
DRIVER VOLUME NAME
$
$ git clone https://github.com/Zipstack/unstract unstract-test && cd unstract-test
I don't think this involves when using Docker, but run-platform.sh
use python inside of it and may somehow involves.
$ which python
/Users/kun432/.pyenv/shims/python
$ cat .python-version
3.9.6
$ python --version
Python 3.9.6
$ ./run-platform.sh
Once the services are up, visit http://frontend.unstract.localhost in your browser.
See logs with:
docker compose -f docker/docker-compose.yaml logs -f
Configure services by updating corresponding <service>/.env files.
Make sure to restart the services with:
docker compose -f docker/docker-compose.yaml up -d
###################### BACKUP ENCRYPTION KEY ######################
Copy the value of ENCRYPTION_KEY in any of the following env files
to a secure location:
- backend/.env
- platform-service/.env
Aapter credentials are encrypted by the platform using this key.
Its loss or change will make all existing adapters inaccessible!
###################################################################
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
68f14c14c6d7 unstract/frontend:latest "/docker-entrypoint.…" 46 minutes ago Up 46 minutes 80/tcp, 0.0.0.0:3000->3000/tcp unstract-frontend
16c373bdb4e4 unstract/backend:latest "./entrypoint.sh mig…" 46 minutes ago Up 46 minutes 0.0.0.0:8000->8000/tcp unstract-backend
988bf2eb71d7 unstract/worker:latest "./entrypoint.sh" 46 minutes ago Up 46 minutes 0.0.0.0:5002->5002/tcp unstract-worker
499be651d9c4 unstract/prompt-service:latest "./entrypoint.sh" 46 minutes ago Up 46 minutes 0.0.0.0:3003->3003/tcp unstract-prompt-service
271229a6ee7a unstract/platform-service:latest ".venv/bin/gunicorn …" 46 minutes ago Up 46 minutes 0.0.0.0:3001->3001/tcp unstract-platform-service
b840e1f24ae1 unstract/backend:latest ".venv/bin/celery -A…" 46 minutes ago Up 46 minutes 8000/tcp unstract-execution-consumer
56d8d66d308a unstract/backend:latest ".venv/bin/celery -A…" 46 minutes ago Up 44 minutes 8000/tcp unstract-celery-beat
175c9b041457 unstract/x2text-service:latest ".venv/bin/gunicorn …" 46 minutes ago Up 46 minutes 0.0.0.0:3004->3004/tcp unstract-x2text-service
47cb9c1efc57 redis:7.2.3 "docker-entrypoint.s…" 46 minutes ago Up 46 minutes 0.0.0.0:6379->6379/tcp unstract-redis
5a0635356bfc pgvector/pgvector:pg15 "docker-entrypoint.s…" 46 minutes ago Up 46 minutes 0.0.0.0:5432->5432/tcp unstract-db
e624a063571b minio/minio:latest "/usr/bin/docker-ent…" 46 minutes ago Up 46 minutes 0.0.0.0:9000-9001->9000-9001/tcp unstract-minio
506d5437fcdb qdrant/qdrant:v1.8.3 "./entrypoint.sh" 46 minutes ago Up 46 minutes 0.0.0.0:6333->6333/tcp, 6334/tcp unstract-vector-db
304f2756a3ac flipt/flipt:v1.34.0 "./flipt" 46 minutes ago Up 46 minutes 0.0.0.0:8082->8080/tcp, 0.0.0.0:9005->9000/tcp unstract-flipt
adfd91dcdea9 traefik:v2.10 "/entrypoint.sh --ap…" 46 minutes ago Up 46 minutes 0.0.0.0:80->80/tcp, 0.0.0.0:8080->8080/tcp unstract-proxy
Choose OpenAI
with the following settings:
Params | Value |
---|---|
Name | openai gpt-4o-mini |
API Key | ******** |
Model | gpt-4o-mini |
NOTES: Set to default values except above
Choose Qdrant
with the following settings:
Params | Value |
---|---|
Name | https://XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX.europe-west3-0.gcp.cloud.qdrant.io:6333 |
URL | ******** |
API Key | ******** |
Choose OpenAI
with the following settings:
Params | Value |
---|---|
Name | openai embedding |
API Key | ******** |
NOTES: Set to default values except above
Choose LlamaParse
with the following settings:
Params | Value |
---|---|
Name | llama parse |
API Key | ******** |
NOTES: Set to default values except above
$ docker compose -f docker/docker-compose.yaml down
$ vi worker/.env
REMOVE_CONTAINER_ON_EXIT=False
$ ./run-platform.sh
Params | Value | Notes |
---|---|---|
Tool Name | 請求書パース |
menas "invoice parser" |
Author/Org Name | kun432 |
|
Description | 請求書をパースするツール |
means "a tool for parsing invoice" |
Icon | 📄 |
Params | Value |
---|---|
Name | 請求書パース プロファイル1 |
LLM | openai gpt-4o-mini |
Vector Database | qdrant cloud |
Embedding Model | openai embedding |
Text Extractor | llama parse |
Chunk Size | 0 |
Overlap | 0 |
NOTES: Set to default values except above
Your ability to extract and summarize this information accurately is essential for effective Japanese invoice analysis. Pay close attention to the invoice's language, structure, and any cross-references to ensure a comprehensive and precise extraction of information. Do not use prior knowledge or information from outside the context to answer the questions. Only use the information provided in the context to answer the questions.
Do not include any explanation in the reply. Only include the extracted information in the reply.
NOTES: the same as default
upload the following 2 PDFs.
請求書サンプル1.pdf
: https://drive.google.com/file/d/1DZGvHPoPCOa6VlyLQsz7PZbOYzxLOHd2/view請求書サンプル2.pdf
: https://drive.google.com/file/d/1kr8SunLLdzw9Az5Cw3kEA_1ElaKuhu7Z/view
Field | Prompts | Type |
---|---|---|
invoice_issuer_name | この請求書を発行した発行者または会社の名称は何ですか? | Text |
invoice_customer_name | この請求書に記載されているお客様名または会社の名前はなんですか?敬称は不要です。 | Text |
invoice_customer_address | 提供された文脈には複数の住所が記載されている可能性があるため、まずすべての住所を収集してください。次に、この請求書が誰宛てに送られているのか、つまり請求先お客様の名前を理解するようにしてください。そして、その名前が合致する住所を見つけてください。お客様の住所を常に返すようにしてください。他の住所を返さないでください。 お客様の住所については、以下のフィールドを持つシンプルなJSONオブジェクトを作成してください。 - full_address: お客様の完全な住所である必要がある - prefecture: 住所から取得した都道府県名のみである必要がある - city: 住所から取得した市区町村名のみである必要がある - zip: 郵便番号のみである必要がある |
json |
invoice_payment_info | 請求金額は請求書において重要な部分であり、支払い方法・小計(税抜)・消費税額・合計請求額(税込)で構成される。 以下のフィールドを含むJSONオブジェクトを返してください。 - payment_method: 支払い方法。以下の3つから選択。"請求書"、"小切手"、"クレジットカード" - total_wo_tax: 税抜の小計金額 - tax:消費税額 - total_w_tax:税込の合計請求額 |
json |
invoice_line_items | この請求書には請求内容の内訳が記載されており、内訳に記載された各請求項目は与えられたコンテキスト全体にわたって分割することができる。常に全体的なコンテキストを確認し、すべての請求項目の詳細を回答してください。 各請求項目について、以下のフィールドを含むシンプルなJSONオブジェクトを作成してください。 - item_name: 請求項目の項目名 - item_num: 請求項目の個数 - price_per_unit: 請求項目のユニットあたりの単価 - price_per_item: 請求項目ごとの金額 これらの項目を含むオブジェクトをJSON配列に格納し、それを返してください。 |
json |
※In English (for description purpose):
Field | Prompts | Type |
---|---|---|
invoice_issuer_name | What is the name of the issuer or company that issued this invoice? | Text |
invoice_customer_name | What is the name of the customer or company on this invoice? Honorific titles are not required. | Text |
invoice_customer_address | First collect all addresses, as the context provided may contain more than one address. Next, try to understand to whom this invoice is being sent, i.e., the name of the billing customer. Then find the address that matches that name. Always return the customer's address. Do not return other addresses. For customers' address, create a simple JSON object with the following fields - full_address: Must be the complete address of the customer - prefecture: Must be only the name of the prefecture taken from the address - city: Must be only the name of the municipality obtained from the address - zip: Must be zip code only |
json |
invoice_payment_info | The invoice amount is an important part of the invoice and consists of the payment method, subtotal (excluding tax), sales tax amount, and total invoice amount (including tax). Return a JSON object containing the following fields - payment_method: Payment Method. Choose from the following three options.” “Bill”, ‘Check’, or ‘Credit Card’. - total_wo_tax: Subtotal amount excluding tax - tax:amount of consumption tax - total_w_tax:Total billing amount including tax |
json |
invoice_line_items | The invoice contains a breakdown of the billing details, and each billing item listed in the breakdown can be broken down over the entire given context. Always check the overall context and respond with details for all billing items. For each billing item, create a simple JSON object containing the following fields - item_name: Item name of the billing item - item_num: Number of billing items - price_per_unit: Unit price per unit for billed items - price_per_item: Amount per billing item Store the object containing these items in a JSON array and return it. |
json |
"Run All" for each propmpt and check Output Analyzer
Also check Raw View to make sure if text extractor works. (This means Llama Parse works for Japanese docs, IMO.)
Then, export Prompt Studio project as tool.
create a new workflow.
Params | Value |
---|---|
Workflow Name | 請求書パースAPIワークフロー |
Description | 請求書パースAPIワークフロー |
Workflow Settings:
Params | Value | Notes |
---|---|---|
Input Setting | API |
|
Workflow Chain | 請求書パース |
the tool exported in previous section. |
Output Setting | API |
Then, "Deploy as API".
Params | Value | Notes |
---|---|---|
Display Name | 請求書パースAPI |
|
Description | 請求書パースAPI |
|
API Name | parse_japanese_invoice |
Deployed API is below:
Params | Value |
---|---|
API Name | 請求書パースAPI |
API Endpoint | http://frontend.unstract.localhost/deployment/api/mock_org/parse_japanese_invoice/ |
API Key | XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX |
Using postman and test API:
Params | Value |
---|---|
URL | http://frontend.unstract.localhost/deployment/api/mock_org/parse_japanese_invoice/ |
Method | POST |
Authorization
Params | Value |
---|---|
Auth Type | Bearer Token |
Token | XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX |
Body(form-data
)
Key | Type | Value |
---|---|---|
files |
File |
請求書サンプル3.pdf |
timeout |
Text |
300 |
NOTES: the file used above is https://drive.google.com/file/d/1St9it9cj3SY0GnamZkBjDO3tyVMkA5GZ/view
results:
{
"message": {
"execution_status": "ERROR",
"status_api": "/deployment/api/mock_org/parse_japanese_invoice/?execution_id=3c37e236-a1f0-49c8-8fb8-09960fffb72b",
"error": null,
"result": [
{
"file": "請求書サンプル3.pdf",
"status": "Failed",
"error": "Error fetching data and indexing: [Errno 2] No such file or directory: '/data/INFILE.pdf'"
}
]
}
}
logs during API call
how status of
workflow_data
dir changes are below:before API call
during API call
after API call