GUI-Owl-7B bf16

This is an MLX conversion of mPLUG/GUI-Owl-7B, optimized for Apple Silicon.

GUI-Owl is a GUI automation model family developed as part of the Mobile-Agent-V3 project. Upstream, it is positioned for screen understanding, GUI grounding, and agentic action planning across benchmark suites such as ScreenSpot and OSWorld-style tasks.

This MLX artifact was converted with mlx-vlm and validated locally with both mlx_vlm prompt-packet checks and vllm-mlx OpenAI-compatible serve checks.

Conversion Details

Field Value
Upstream model mPLUG/GUI-Owl-7B
Artifact type bf16 MLX conversion
Source posture direct upstream conversion
Conversion tool mlx_vlm.convert via mlx-vlm 0.3.12
Python 3.11.14
MLX 0.31.0
Transformers 5.2.0
Validation backend vllm-mlx (phase/p1 @ 8a5d41b)
Quantization bf16
Group size n/a
Quantization mode n/a
Artifact size 15.47G
Template repair tokenizer_config.json["chat_template"] was re-injected after conversion

Additional notes:

  • Direct upstream conversion from mPLUG/GUI-Owl-7B succeeded on mlx-vlm 0.3.12; no local source mirror was required.
  • chat_template.json, chat_template.jinja, and tokenizer_config.json["chat_template"] were aligned for downstream compatibility checks.
  • Root-level preprocessor_config.json and processor_config.json are present intentionally for multimodal detection compatibility.

Validation

This artifact passed local validation in this workspace:

  • mlx_vlm prompt-packet validation: PASS
  • vllm-mlx OpenAI-compatible serve validation: PASS

Local validation notes:

  • This family stayed on the original Track A packet; no ShowUI-style packet split was required.
  • Grounding returned the right object shape, but coordinates were not normalized to the requested 0-1000 grid.
  • Streamed answers drifted into Chinese on one serve-path check even though the non-stream answer stayed correct in English.

Performance

  • Artifact size on disk: 15.47G
  • Local fixed-packet mlx_vlm validation used about 18.12 GB peak memory
  • Local vllm-mlx serve validation completed in about 22.14s non-stream and 23.39s streamed

These are local validation measurements, not a full benchmark suite.

Usage

Install

pip install -U mlx-vlm

CLI

python -m mlx_vlm.generate \
  --model mlx-community/GUI-Owl-7B-bf16 \
  --image path/to/image.png \
  --prompt "Describe the visible controls on this screen in five short bullet points." \
  --max-tokens 256 \
  --temperature 0.0

Python

from mlx_vlm import load, generate

model, processor = load("mlx-community/GUI-Owl-7B-bf16")
result = generate(
    model,
    processor,
    prompt="Describe the visible controls on this screen in five short bullet points.",
    image="path/to/image.png",
    max_tokens=256,
    temp=0.0,
)
print(result.text)

vllm-mlx Serve

python -m vllm_mlx.cli serve mlx-community/GUI-Owl-7B-bf16 --mllm --localhost --port 8000

Links

Other Quantizations

Planned sibling repos in this wave:

Notes and Limitations

  • This card reports local MLX conversion and validation results only.
  • Upstream benchmark claims belong to the original GUI-Owl family and were not re-run here unless explicitly stated.
  • This family is better aligned to the Track A packet than ShowUI, but local validation still showed weak structured-action targeting and grounding normalization issues.
  • Streamed response quality can diverge from the non-stream path even when the serve path itself stays healthy.

Citation

If you use this MLX conversion, please also cite the original GUI-Owl work:

@misc{ye2025mobileagentv3foundamentalagentsgui,
      title={Mobile-Agent-v3: Foundamental Agents for GUI Automation},
      author={Jiabo Ye and Xi Zhang and Haiyang Xu and Haowei Liu and Junyang Wang and Zhaoqing Zhu and Ziwei Zheng and Feiyu Gao and Junjie Cao and Zhengxi Lu and Jitong Liao and Qi Zheng and Fei Huang and Jingren Zhou and Ming Yan},
      year={2025},
      eprint={2508.15144},
      archivePrefix={arXiv},
      primaryClass={cs.AI},
      url={https://arxiv.org/abs/2508.15144},
}

License

This repo follows the upstream model license: MIT. See the upstream model card for the authoritative license details: mPLUG/GUI-Owl-7B.

Downloads last month
32
Safetensors
Model size
8B params
Tensor type
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/GUI-Owl-7B-bf16

Finetuned
mPLUG/GUI-Owl-7B
Finetuned
(2)
this model

Paper for mlx-community/GUI-Owl-7B-bf16