Following the procedures below
https://gist.github.com/kun432/a8d7238c9c1fd738aed5f7d7771ba4a5
except:
- using LLMWhisperer as text extractor instead of Llama Parse
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
9aa84bd0c233 unstract/frontend:latest "/docker-entrypoint.…" 25 hours ago Up 25 hours 80/tcp, 0.0.0.0:3000->3000/tcp unstract-frontend
d91cbc6302f3 unstract/backend:latest "./entrypoint.sh mig…" 25 hours ago Up 25 hours 0.0.0.0:8000->8000/tcp unstract-backend
c1a7ba34c88f unstract/x2text-service:latest ".venv/bin/gunicorn …" 25 hours ago Up 25 hours 0.0.0.0:3004->3004/tcp unstract-x2text-service
32b85581e254 unstract/platform-service:latest ".venv/bin/gunicorn …" 25 hours ago Up 25 hours 0.0.0.0:3001->3001/tcp unstract-platform-service
c28d054c6869 unstract/worker:latest "./entrypoint.sh" 25 hours ago Up 25 hours 0.0.0.0:5002->5002/tcp unstract-worker
a269e2548fae unstract/backend:latest ".venv/bin/celery -A…" 25 hours ago Up 25 hours 8000/tcp unstract-celery-beat
9fe29b9c09ab unstract/prompt-service:latest "./entrypoint.sh" 25 hours ago Up 25 hours 0.0.0.0:3003->3003/tcp unstract-prompt-service
f7efea0227e0 unstract/backend:latest ".venv/bin/celery -A…" 25 hours ago Up 25 hours 8000/tcp unstract-execution-consumer
d3debd931b0a redis:7.2.3 "docker-entrypoint.s…" 25 hours ago Up 25 hours 0.0.0.0:6379->6379/tcp unstract-redis
20d7ef9b2b64 pgvector/pgvector:pg15 "docker-entrypoint.s…" 25 hours ago Up 25 hours 0.0.0.0:5432->5432/tcp unstract-db
b4a52603a022 minio/minio:latest "/usr/bin/docker-ent…" 25 hours ago Up 25 hours 0.0.0.0:9000-9001->9000-9001/tcp unstract-minio
200d6d7c951c qdrant/qdrant:v1.8.3 "./entrypoint.sh" 25 hours ago Up 25 hours 0.0.0.0:6333->6333/tcp, 6334/tcp unstract-vector-db
d59b6e0f04de flipt/flipt:v1.34.0 "./flipt" 25 hours ago Up 25 hours 0.0.0.0:8082->8080/tcp, 0.0.0.0:9005->9000/tcp unstract-flipt
ccd35f916287 traefik:v2.10 "/entrypoint.sh --ap…" 25 hours ago Up 25 hours 0.0.0.0:80->80/tcp, 0.0.0.0:8080->8080/tcp unstract-proxy
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
unstract/backend latest 086b0b10b430 28 hours ago 3.01GB
unstract/frontend latest 940e8980cc7c 28 hours ago 305MB
unstract/prompt-service latest 0c54fad5890d 28 hours ago 2.69GB
unstract/worker latest 4138fa33df6d 28 hours ago 1.04GB
unstract/platform-service latest 507e52fa0589 28 hours ago 481MB
unstract/x2text-service latest d765e5bbb1f6 28 hours ago 413MB
unstract/tool-structure 0.0.39 f063fc2ce9e8 47 hours ago 2.97GB
minio/minio latest 6f23072e3e22 4 days ago 205MB
pgvector/pgvector pg15 6688455f2364 2 weeks ago 627MB
qdrant/qdrant v1.8.3 15bd3cee31b3 5 months ago 251MB
traefik v2.10 6341b98aec5e 6 months ago 193MB
flipt/flipt v1.34.0 369cf32903cb 7 months ago 89MB
redis 7.2.3 a7cee7c8178f 8 months ago 223MB
$ docker volume ls
DRIVER VOLUME NAME
local docker_minio_data
local docker_postgres_data
local docker_prompt_studio_data
local docker_qdrant_data
local docker_redis_data
$ docker compose -f docker/docker-compose.yaml down
$ docker ps -a | grep "unstract/" | awk '{ print $1 }' | xargs docker rm
$ docker images | awk '{ print $3 }' | grep -v "IMAGE" | xargs docker rmi
$ docker volume ls | awk '{ print $2 }' | grep -v "VOLUME" | xargs docker volume rm
then removed cloned repo.
Also, removed the cluster in Qdrant Cloud used and recreated a new one.
the same as previous.
the same as previous except:
Choose LLMWhisperer with the following settings:
Params | Value |
---|---|
Name | llmwhisperer |
Unstract Key | ******** |
Processing Mode | ocr |
NOTES: Set to default values except above
the same as previous.
the same as previous except:
Params | Value |
---|---|
Name | 請求書パース プロファイル1 |
LLM | openai gpt-4o-mini |
Vector Database | qdrant cloud |
Embedding Model | openai embedding |
Text Extractor | llmwhisperer |
Chunk Size | 0 |
Overlap | 0 |
NOTES: Set to default values except above
the same as previous.
the same as previous.
result:
{
"message": {
"execution_status": "COMPLETED",
"status_api": "/deployment/api/mock_org/parse_japanese_invoice/?execution_id=bf11826a-f129-4e84-911a-f21abd3dbde7",
"error": null,
"result": [
{
"file": "請求書サンプル3.pdf",
"status": "Success",
"result": {
"output": {
"invoice_customer_address": {
"city": "千代田区",
"full_address": "〒 100-0001 東 京 都 千 代 田 区 見 本 町 1-1",
"prefecture": "東京都",
"zip": "100-0001"
},
"invoice_customer_name": "範 例 工 業 株 式 会 社",
"invoice_issuer_name": "模範商事株式会社",
"invoice_line_items": [
{
"item_name": "特選和紙 (A4サイズ)",
"item_num": 1000,
"price_per_item": 50000,
"price_per_unit": 50
},
{
"item_name": "高級墨 (松煙)",
"item_num": 20,
"price_per_item": 40000,
"price_per_unit": 2000
},
{
"item_name": "筆セット (各種)",
"item_num": 50,
"price_per_item": 50000,
"price_per_unit": 1000
}
],
"invoice_payment_info": {
"payment_method": "請求書",
"tax": 14000,
"total_w_tax": 154000,
"total_wo_tax": 140000
}
}
},
"metadata": {
"source_name": "請求書サンプル3.pdf",
"source_hash": "0a362e7b1825f8c507b2306d88451ef83e5a7390065a770184865824cce55e7b",
"organization_id": "mock_org",
"workflow_id": "1b9fa3f9-bbf2-4565-a121-4fd51c9695e0",
"execution_id": "bf11826a-f129-4e84-911a-f21abd3dbde7",
"total_elapsed_time": 56.497058,
"tool_metadata": [
{
"tool_name": "structure_tool",
"elapsed_time": 56.497032,
"output_type": "JSON"
}
]
}
}
]
}
}
how status of workflow_data dir changes are below:
before API call
during API call (just before API call finished)
after API call