Instructions to use zai-org/GLM-OCR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use zai-org/GLM-OCR with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="zai-org/GLM-OCR")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForImageTextToText

tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-OCR")
model = AutoModelForImageTextToText.from_pretrained("zai-org/GLM-OCR")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Inference
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use zai-org/GLM-OCR with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "zai-org/GLM-OCR"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-OCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/zai-org/GLM-OCR

SGLang

How to use zai-org/GLM-OCR with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "zai-org/GLM-OCR" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-OCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "zai-org/GLM-OCR" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "zai-org/GLM-OCR",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use zai-org/GLM-OCR with Docker Model Runner:
```
docker model run hf.co/zai-org/GLM-OCR
```

ERROR occurs when loading AutoProcessor

#33

by FronyAI - opened Feb 12

Discussion

FronyAI

Feb 12

Is there anyone who got error when loading AutoProcessor?
[versions]
"torch==2.9.1" (cpu only)
"transformers==5.1.0"

---> 20 processor = AutoProcessor.from_pretrained("zai-org/GLM-OCR")
     21 model = AutoModelForImageTextToText.from_pretrained(
     22     pretrained_model_name_or_path=MODEL_PATH,
     23     torch_dtype="auto",
     24     device_map="auto",
     25 )
     26 inputs = processor.apply_chat_template(
     27     messages,
     28     tokenize=True,
   (...)     31     return_tensors="pt"
     32 ).to(model.device)

File c:\Users\flash\projects\cpuinfer-convert\.venv\Lib\site-packages\transformers\models\auto\processing_auto.py:398, in AutoProcessor.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
    394     return processor_class.from_pretrained(
    395         pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
    396     )
    397 elif processor_class is not None:
--> 398     return processor_class.from_pretrained(
    399         pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
    400     )
    401 # Last try: we use the PROCESSOR_MAPPING.
    402 elif type(config) in PROCESSOR_MAPPING:

File c:\Users\flash\projects\cpuinfer-convert\.venv\Lib\site-packages\transformers\processing_utils.py:1402, in ProcessorMixin.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, **kwargs)
   1400 # Get processor_dict first so we can use it to instantiate non-tokenizer sub-processors
   1401 processor_dict, instantiation_kwargs = cls.get_processor_dict(pretrained_model_name_or_path, **kwargs)
-> 1402 args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, processor_dict, **kwargs)
   1403 return cls.from_args_and_dict(args, processor_dict, **instantiation_kwargs)

File c:\Users\flash\projects\cpuinfer-convert\.venv\Lib\site-packages\transformers\processing_utils.py:1523, in ProcessorMixin._get_arguments_from_pretrained(cls, pretrained_model_name_or_path, processor_dict, **kwargs)
   1520 elif is_primary:
   1521     # Primary non-tokenizer sub-processor: load via Auto class
   1522     auto_processor_class = MODALITY_TO_AUTOPROCESSOR_MAPPING[sub_processor_type]
-> 1523     sub_processor = auto_processor_class.from_pretrained(
   1524         pretrained_model_name_or_path, subfolder=subfolder, **kwargs
   1525     )
   1526     args.append(sub_processor)
   1528 elif sub_processor_type in processor_dict:
   1529     # Additional non-tokenizer sub-processor: instantiate from config in processor_dict

File c:\Users\flash\projects\cpuinfer-convert\.venv\Lib\site-packages\transformers\models\auto\video_processing_auto.py:342, in AutoVideoProcessor.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
    338     video_processor_class_inferred = image_processor_class.replace("ImageProcessor", "VideoProcessor")
    340     # Some models have different image processors, e.g. InternVL uses GotOCRImageProcessor
    341     # We cannot use GotOCRVideoProcessor when falling back for BC and should try to infer from config later on
--> 342     if video_processor_class_from_name(video_processor_class_inferred) is not None:
    343         video_processor_class = video_processor_class_inferred
    344 if "AutoImageProcessor" in config_dict.get("auto_map", {}):

File c:\Users\flash\projects\cpuinfer-convert\.venv\Lib\site-packages\transformers\models\auto\video_processing_auto.py:96, in video_processor_class_from_name(class_name)
     94 def video_processor_class_from_name(class_name: str):
     95     for module_name, extractors in VIDEO_PROCESSOR_MAPPING_NAMES.items():
---> 96         if class_name in extractors:
     97             module_name = model_type_to_module_name(module_name)
     99             module = importlib.import_module(f".{module_name}", "transformers.models")

TypeError: argument of type 'NoneType' is not iterable

christian-winkler-th

Feb 13

Yes, got the same error!

DavidGeorge

Feb 14

•

edited Feb 14

As per this issue on GitHub https://github.com/huggingface/transformers/issues/43986 you need torchvision installed.

I've found a minimal list of dependancies to run the example are:

`pillow`
`torch`
`torchvision`
`transformers>=5.1.0`

Also accelerate was also required for me to run it on GPU on an EC2.

You also need a png.

FronyAI

Feb 14

Thanks @DavidGeorge
I fixed it adding torchvision !

iyuge2 changed discussion status to closed Feb 25

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment