Instructions to use zai-org/GLM-OCR with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use zai-org/GLM-OCR with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="zai-org/GLM-OCR") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoTokenizer, AutoModelForImageTextToText tokenizer = AutoTokenizer.from_pretrained("zai-org/GLM-OCR") model = AutoModelForImageTextToText.from_pretrained("zai-org/GLM-OCR") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Inference
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use zai-org/GLM-OCR with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "zai-org/GLM-OCR" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/GLM-OCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/zai-org/GLM-OCR
- SGLang
How to use zai-org/GLM-OCR with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "zai-org/GLM-OCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/GLM-OCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "zai-org/GLM-OCR" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "zai-org/GLM-OCR", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use zai-org/GLM-OCR with Docker Model Runner:
docker model run hf.co/zai-org/GLM-OCR
ERROR occurs when loading AutoProcessor
#33
by FronyAI - opened
Is there anyone who got error when loading AutoProcessor?
[versions]
"torch==2.9.1" (cpu only)
"transformers==5.1.0"
---> 20 processor = AutoProcessor.from_pretrained("zai-org/GLM-OCR")
21 model = AutoModelForImageTextToText.from_pretrained(
22 pretrained_model_name_or_path=MODEL_PATH,
23 torch_dtype="auto",
24 device_map="auto",
25 )
26 inputs = processor.apply_chat_template(
27 messages,
28 tokenize=True,
(...) 31 return_tensors="pt"
32 ).to(model.device)
File c:\Users\flash\projects\cpuinfer-convert\.venv\Lib\site-packages\transformers\models\auto\processing_auto.py:398, in AutoProcessor.from_pretrained(cls, pretrained_model_name_or_path, **kwargs)
394 return processor_class.from_pretrained(
395 pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
396 )
397 elif processor_class is not None:
--> 398 return processor_class.from_pretrained(
399 pretrained_model_name_or_path, trust_remote_code=trust_remote_code, **kwargs
400 )
401 # Last try: we use the PROCESSOR_MAPPING.
402 elif type(config) in PROCESSOR_MAPPING:
File c:\Users\flash\projects\cpuinfer-convert\.venv\Lib\site-packages\transformers\processing_utils.py:1402, in ProcessorMixin.from_pretrained(cls, pretrained_model_name_or_path, cache_dir, force_download, local_files_only, token, revision, **kwargs)
1400 # Get processor_dict first so we can use it to instantiate non-tokenizer sub-processors
1401 processor_dict, instantiation_kwargs = cls.get_processor_dict(pretrained_model_name_or_path, **kwargs)
-> 1402 args = cls._get_arguments_from_pretrained(pretrained_model_name_or_path, processor_dict, **kwargs)
1403 return cls.from_args_and_dict(args, processor_dict, **instantiation_kwargs)
File c:\Users\flash\projects\cpuinfer-convert\.venv\Lib\site-packages\transformers\processing_utils.py:1523, in ProcessorMixin._get_arguments_from_pretrained(cls, pretrained_model_name_or_path, processor_dict, **kwargs)
1520 elif is_primary:
1521 # Primary non-tokenizer sub-processor: load via Auto class
1522 auto_processor_class = MODALITY_TO_AUTOPROCESSOR_MAPPING[sub_processor_type]
-> 1523 sub_processor = auto_processor_class.from_pretrained(
1524 pretrained_model_name_or_path, subfolder=subfolder, **kwargs
1525 )
1526 args.append(sub_processor)
1528 elif sub_processor_type in processor_dict:
1529 # Additional non-tokenizer sub-processor: instantiate from config in processor_dict
File c:\Users\flash\projects\cpuinfer-convert\.venv\Lib\site-packages\transformers\models\auto\video_processing_auto.py:342, in AutoVideoProcessor.from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs)
338 video_processor_class_inferred = image_processor_class.replace("ImageProcessor", "VideoProcessor")
340 # Some models have different image processors, e.g. InternVL uses GotOCRImageProcessor
341 # We cannot use GotOCRVideoProcessor when falling back for BC and should try to infer from config later on
--> 342 if video_processor_class_from_name(video_processor_class_inferred) is not None:
343 video_processor_class = video_processor_class_inferred
344 if "AutoImageProcessor" in config_dict.get("auto_map", {}):
File c:\Users\flash\projects\cpuinfer-convert\.venv\Lib\site-packages\transformers\models\auto\video_processing_auto.py:96, in video_processor_class_from_name(class_name)
94 def video_processor_class_from_name(class_name: str):
95 for module_name, extractors in VIDEO_PROCESSOR_MAPPING_NAMES.items():
---> 96 if class_name in extractors:
97 module_name = model_type_to_module_name(module_name)
99 module = importlib.import_module(f".{module_name}", "transformers.models")
TypeError: argument of type 'NoneType' is not iterable
Yes, got the same error!
As per this issue on GitHub https://github.com/huggingface/transformers/issues/43986 you need torchvision installed.
I've found a minimal list of dependancies to run the example are:
`pillow`
`torch`
`torchvision`
`transformers>=5.1.0`
Also accelerate was also required for me to run it on GPU on an EC2.
You also need a png.
iyuge2 changed discussion status to closed