Requesting Example for Structured Information Extraction via cURL
Description
The model card for GLM-OCR mentions a specific capability for Information Extraction where prompts must follow a strict JSON schema (e.g., for personal ID information). However, the standard examples primarily show simple text recognition.
I am trying to implement a structured extraction using a vLLM or OpenAI-compatible server entry point but need clarification on how to combine the image input with the schema instructions in a single curl request.
Current Implementation Attempt
I am currently using the following curl command, but I want to ensure I am correctly passing the JSON schema prompt to trigger the "Information Extraction" mode rather than standard OCR.
curl -s http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "zai-org/GLM-OCR",
"messages": [
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": "https://ofasys-multimodal-wlcb-3-toshanghai.oss-accelerate.aliyuncs.com/wpf272043/keepme/image/receipt.png"
}
},
{
"type": "text",
"text": "请按下列JSON格式输出图中信息:\n{\n \"id_number\": \"\",\n \"last_name\": \"\",\n \"first_name\": \"\",\n \"date_of_birth\": \"\",\n \"address\": {\n \"street\": \"\",\n \"city\": \"\",\n \"state\": \"\",\n \"zip_code\": \"\"\n },\n \"dates\": {\n \"issue_date\": \"\",\n \"expiration_date\": \"\"\n },\n \"sex\": \"\"\n}"
}
]
}
],
"max_tokens": 2048,
"temperature": 0.0
}'
Questions
Is the provided curl structure the recommended way to pass the schema?
Should the system prompt or the user prompt contain specific trigger words (e.g., "Information Extraction:") similar to how "Text Recognition:" is used for standard OCR?
Does the model require a specific stop sequence to ensure the JSON is valid and terminates correctly?