Requesting Example for Structured Information Extraction via cURL

#10
by andynoodles - opened

Description
The model card for GLM-OCR mentions a specific capability for Information Extraction where prompts must follow a strict JSON schema (e.g., for personal ID information). However, the standard examples primarily show simple text recognition.

I am trying to implement a structured extraction using a vLLM or OpenAI-compatible server entry point but need clarification on how to combine the image input with the schema instructions in a single curl request.

Current Implementation Attempt
I am currently using the following curl command, but I want to ensure I am correctly passing the JSON schema prompt to trigger the "Information Extraction" mode rather than standard OCR.

curl -s http://localhost:8000/v1/chat/completions \
     -H "Content-Type: application/json" \
     -d '{
          "model": "zai-org/GLM-OCR",
          "messages": [
               {
                    "role": "user",
                    "content": [
                         {
                              "type": "image_url",
                              "image_url": {
                                   "url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
                              }
                         },
                         {
                              "type": "text",
                              "text": "请按下列JSON格式输出图中信息:\n{\n    \"id_number\": \"\",\n    \"last_name\": \"\",\n    \"first_name\": \"\",\n    \"date_of_birth\": \"\",\n    \"address\": {\n        \"street\": \"\",\n        \"city\": \"\",\n        \"state\": \"\",\n        \"zip_code\": \"\"\n    },\n    \"dates\": {\n        \"issue_date\": \"\",\n        \"expiration_date\": \"\"\n    },\n    \"sex\": \"\"\n}"
                         }
                    ]
               }
          ],
          "max_tokens": 2048,
          "temperature": 0.0
     }'

Questions
Is the provided curl structure the recommended way to pass the schema?

Should the system prompt or the user prompt contain specific trigger words (e.g., "Information Extraction:") similar to how "Text Recognition:" is used for standard OCR?

Does the model require a specific stop sequence to ensure the JSON is valid and terminates correctly?

Sign up or log in to comment