Requesting Example for Structured Information Extraction via cURL

#10

by andynoodles - opened 5 days ago

5 days ago

Description
The model card for GLM-OCR mentions a specific capability for Information Extraction where prompts must follow a strict JSON schema (e.g., for personal ID information). However, the standard examples primarily show simple text recognition.

I am trying to implement a structured extraction using a vLLM or OpenAI-compatible server entry point but need clarification on how to combine the image input with the schema instructions in a single curl request.

Current Implementation Attempt
I am currently using the following curl command, but I want to ensure I am correctly passing the JSON schema prompt to trigger the "Information Extraction" mode rather than standard OCR.

curl -s http://localhost:8000/v1/chat/completions \
     -H "Content-Type: application/json" \
     -d '{
          "model": "zai-org/GLM-OCR",
          "messages": [
               {
                    "role": "user",
                    "content": [
                         {
                              "type": "image_url",
                              "image_url": {
                                   "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
                              }
                         },
                         {
                              "type": "text",
                              "text": "请按下列JSON格式输出图中信息:\n{\n    \"id_number\": \"\",\n    \"last_name\": \"\",\n    \"first_name\": \"\",\n    \"date_of_birth\": \"\",\n    \"address\": {\n        \"street\": \"\",\n        \"city\": \"\",\n        \"state\": \"\",\n        \"zip_code\": \"\"\n    },\n    \"dates\": {\n        \"issue_date\": \"\",\n        \"expiration_date\": \"\"\n    },\n    \"sex\": \"\"\n}"
                         }
                    ]
               }
          ],
          "max_tokens": 2048,
          "temperature": 0.0
     }'

Questions
Is the provided curl structure the recommended way to pass the schema?

Should the system prompt or the user prompt contain specific trigger words (e.g., "Information Extraction:") similar to how "Text Recognition:" is used for standard OCR?

Does the model require a specific stop sequence to ensure the JSON is valid and terminates correctly?

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment