Instructions to use Horama/Horama_BTP_v2 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use Horama/Horama_BTP_v2 with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="Horama/Horama_BTP_v2")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoProcessor, AutoModelForMultimodalLM

processor = AutoProcessor.from_pretrained("Horama/Horama_BTP_v2")
model = AutoModelForMultimodalLM.from_pretrained("Horama/Horama_BTP_v2")
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
inputs = processor.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

PEFT
How to use Horama/Horama_BTP_v2 with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use Horama/Horama_BTP_v2 with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "Horama/Horama_BTP_v2"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Horama/Horama_BTP_v2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/Horama/Horama_BTP_v2

SGLang

How to use Horama/Horama_BTP_v2 with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "Horama/Horama_BTP_v2" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Horama/Horama_BTP_v2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "Horama/Horama_BTP_v2" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "Horama/Horama_BTP_v2",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use Horama/Horama_BTP_v2 with Docker Model Runner:
```
docker model run hf.co/Horama/Horama_BTP_v2
```

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

HORAMA-BTP V2

Enhanced Vision-Language Model for Construction Site Analysis & Safety Inspection

Image → Structured JSON | Safety-Enhanced | HPO-Optimized | Built on Qwen2.5-VL

Horama-BTP V2 builds on V1's structured analysis capabilities with significantly enhanced safety inspection -- better PPE detection, improved hazard recognition, and more reliable risk assessment, trained on 10,000+ construction site images with hyperparameter-optimized LoRA.

What's New in V2

Aspect	V1	V2
Safety detection	Baseline PPE and hazard detection	Enhanced PPE compliance, multi-hazard recognition
Training scale	Initial domain adaptation	10,000+ construction site images
LoRA capacity	r=32 (lightweight)	r=128 (HPO-optimized, 4x capacity)
Hyperparameters	Manual tuning	Bayesian optimization (Optuna)
Focus	Structured JSON output learning	Safety inspection depth + structured output

Overview

Horama-BTP V2 is the safety-enhanced evolution of Horama-BTP. Starting from V1's structured JSON output capabilities, V2 was fine-tuned on a large-scale construction safety dataset (10,000+ images) with hyperparameter-optimized LoRA to dramatically improve:

PPE detection accuracy -- helmets, vests, harnesses, boots, goggles
Hazard recognition -- fall risks, open trenches, unstable loads, electrical hazards
Risk level assessment -- more calibrated overall risk scoring
Safety control measures -- guardrails, barriers, signage, netting identification

The model retains V1's full 15-dimension analysis pipeline (progress, quality, logistics, environment) while excelling at safety compliance -- making it ideal for automated site safety audits.

Key Capabilities

Dimension	What the model extracts
Safety	PPE compliance per worker (8 equipment types), hazard identification (9 types), control measures, overall risk level
Progress	Construction stage (earthworks → commissioning), estimated % completion, milestones
Quality	Structural defects (cracks, corrosion, misalignment...), non-conformities
Observations	Objects, materials, equipment, personnel, vehicles with attributes and confidence
Logistics	Materials inventory, equipment status (idle/operating), access constraints
Environment	Dust, waste, spills; waste management assessment
Evidence	Traceable evidence entries with unique IDs linking every finding to visual proof

Architecture

                    ┌─────────────────────────────────────────┐
                    │           HORAMA-BTP V2                  │
                    │                                          │
Input Image ───┐   │  Qwen2.5-VL-3B    V1 LoRA     V2 LoRA  │
               ├──►│  (backbone)    ──► (merged) ──► (merged) │──► Structured JSON
System Prompt ─┘   │                    r=32         r=128    │
                    │                   domain       safety   │
                    │                   adaptation   enhanced  │
                    └─────────────────────────────────────────┘

V2 is a two-stage fine-tuned model:

Stage 1 (V1): LoRA fine-tuning (r=32) on domain-specific annotations to learn the Horama-BTP JSON schema and construction vocabulary
Stage 2 (V2): LoRA fine-tuning (r=128, HPO-optimized) on 10,000+ safety-focused construction images to deepen detection capabilities

Both LoRA adapters are merged into the backbone -- V2 is a standalone model with no runtime adapter dependencies.

Component	Details
Backbone	Qwen2.5-VL-3B-Instruct -- 3B parameter multimodal transformer
Stage 1 adaptation	LoRA r=32, alpha=64, targeting all attention + MLP projections
Stage 2 adaptation	LoRA r=128, alpha=256, HPO-optimized on 10k+ safety images
Target Modules	`q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`
Precision	BF16 (GPU) / FP32 (CPU/MPS)
Output	Deterministic JSON (temperature=0, greedy decoding)

Quick Start

import torch
from transformers import AutoModelForVision2Seq, AutoProcessor
from PIL import Image

# Load model and processor
model_id = "Horama/Horama_BTP_v2"

model = AutoModelForVision2Seq.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

# Load image
image = Image.open("construction_site.jpg").convert("RGB")

# System prompt
system_prompt = """You are Horama-BTP v1. Analyze construction site images. Output ONLY valid JSON. No text before/after.
CRITICAL RULES:
1. ONLY describe what you can CLEARLY SEE in the image
2. If you cannot see something -> use empty array [] or "unknown"
3. Output must follow the Horama-BTP v1 JSON schema exactly"""

user_prompt = "Analyze this construction site image and return the Horama-BTP v1 JSON output."

# Prepare messages
messages = [
    {"role": "system", "content": system_prompt},
    {
        "role": "user",
        "content": [
            {"type": "image", "image": image},
            {"type": "text", "text": user_prompt},
        ],
    },
]

# Generate
text = processor.apply_chat_template(messages, add_generation_prompt=True, tokenize=False)
inputs = processor(text=[text], images=[image], return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(**inputs, max_new_tokens=4096, do_sample=False)

result = processor.decode(output[0], skip_special_tokens=True)

# Extract JSON from response
import json
json_start = result.rfind("{")
json_end = result.rfind("}") + 1
analysis = json.loads(result[json_start:json_end])

print(json.dumps(analysis, indent=2))

Output Schema

Identical to V1 -- the model outputs a single JSON object with 15 required top-level fields:

{
  "job_type":        "construction" | "renovation" | "infrastructure" | "unknown",
  "asset_type":      "house" | "building" | "road" | "bridge" | "tunnel" | "site" | "unknown",
  "scene_context":   { location_hint, weather_light, viewpoint },
  "summary":         { one_liner, confidence },
  "progress":        { overall_stage, stage_confidence, progress_percent_estimate, milestones_detected },
  "work_activities":  [{ activity, status, confidence, evidence_ids }],
  "observations":    [{ type, label, attributes, confidence, evidence_ids }],
  "safety":          { overall_risk_level, ppe[], hazards[], control_measures[] },
  "quality":         { issues[], non_conformities[] },
  "logistics":       { materials_on_site[], equipment_on_site[], access_constraints[] },
  "environment":     { impacts[], waste_management },
  "evidence":        [{ evidence_id, source, bbox_xyxy, description }],
  "unknown":         [{ question, why_unknown, needed_input }],
  "domain_fields":   { custom_kpis, lot_breakdown, client_specific },
  "metadata":        { model, version, generated_at }
}

Safety-Specific Schema Detail

V2 particularly excels at populating the safety section:

{
  "safety": {
    "overall_risk_level": "low | medium | high | unknown",
    "ppe": [
      {
        "role": "worker | visitor | unknown",
        "ppe_item": "helmet | vest | gloves | goggles | harness | boots | mask | other",
        "status": "compliant | non_compliant | unknown",
        "confidence": 0.0,
        "evidence_ids": ["ev_XXX"]
      }
    ],
    "hazards": [
      {
        "hazard_type": "fall_risk | open_trench | moving_vehicle | electrical | fire | unstable_load | poor_housekeeping | restricted_area | other",
        "severity": "low | medium | high | unknown",
        "confidence": 0.0,
        "evidence_ids": ["ev_XXX"]
      }
    ],
    "control_measures": [
      {
        "measure": "guardrail | barrier | signage | netting | cones | spotter | lockout_tagout | other",
        "status": "present | missing | unknown",
        "confidence": 0.0,
        "evidence_ids": ["ev_XXX"]
      }
    ]
  }
}

Example Output

Given a photograph of an active construction site with workers:

{
  "job_type": "construction",
  "asset_type": "building",
  "scene_context": {
    "location_hint": "outdoor",
    "weather_light": "day",
    "viewpoint": "ground"
  },
  "summary": {
    "one_liner": "Active multi-story building construction site with scaffolding, multiple workers performing structural work with mixed PPE compliance.",
    "confidence": 0.90
  },
  "progress": {
    "overall_stage": "structure",
    "stage_confidence": 0.88,
    "progress_percent_estimate": 45,
    "progress_confidence": 0.40,
    "milestones_detected": [
      { "name": "Foundation complete", "status": "done", "confidence": 0.85, "evidence_ids": ["ev_001"] },
      { "name": "Structural framing in progress", "status": "in_progress", "confidence": 0.90, "evidence_ids": ["ev_002"] }
    ]
  },
  "safety": {
    "overall_risk_level": "high",
    "ppe": [
      { "role": "worker", "ppe_item": "helmet", "status": "compliant", "confidence": 0.92, "evidence_ids": ["ev_003"] },
      { "role": "worker", "ppe_item": "vest", "status": "compliant", "confidence": 0.90, "evidence_ids": ["ev_003"] },
      { "role": "worker", "ppe_item": "harness", "status": "non_compliant", "confidence": 0.75, "evidence_ids": ["ev_004"] },
      { "role": "worker", "ppe_item": "boots", "status": "compliant", "confidence": 0.80, "evidence_ids": ["ev_003"] }
    ],
    "hazards": [
      { "hazard_type": "fall_risk", "severity": "high", "confidence": 0.88, "evidence_ids": ["ev_005"] },
      { "hazard_type": "unstable_load", "severity": "medium", "confidence": 0.65, "evidence_ids": ["ev_006"] }
    ],
    "control_measures": [
      { "measure": "guardrail", "status": "present", "confidence": 0.85, "evidence_ids": ["ev_007"] },
      { "measure": "netting", "status": "missing", "confidence": 0.70, "evidence_ids": ["ev_005"] }
    ]
  },
  "evidence": [
    { "evidence_id": "ev_001", "source": "image", "description": "Completed concrete foundation visible at ground level" },
    { "evidence_id": "ev_003", "source": "image", "description": "Workers wearing hard hats, high-vis vests, and safety boots on scaffolding" },
    { "evidence_id": "ev_004", "source": "image", "description": "Worker at height without visible safety harness attachment" },
    { "evidence_id": "ev_005", "source": "image", "description": "Open edges on upper floors without safety netting" }
  ]
}

(Truncated for readability -- full output includes all 15 top-level fields)

Training Details

Stage 2 (V2 Safety Enhancement)

Parameter	Value
Base model	Horama/Horama_BTP (V1)
Method	LoRA (Parameter-Efficient Fine-Tuning)
Training images	10,000+ construction site photographs
Focus	Safety inspection, PPE detection, hazard recognition
LoRA rank	r=128 (HPO-optimized, 4x V1 capacity)
LoRA alpha	256 (2x rank)
LoRA dropout	0.05 (HPO-optimized)
Epochs	3
Effective batch size	8 (batch=2, accumulation=4)
Learning rate	2.52e-4 (HPO-optimized, cosine schedule)
Warmup	3% of training steps
Weight decay	0.0 (HPO-optimized)
Gradient checkpointing	Enabled
Framework	Transformers + PEFT
Hyperparameter search	Bayesian optimization via Optuna

Hyperparameter Optimization

V2 hyperparameters were selected through Bayesian optimization (Optuna) searching over:

Hyperparameter	Search space	Optimal value
Learning rate	[1e-5, 5e-4]	2.52e-4
LoRA rank	{16, 32, 64, 128}	128
LoRA dropout	{0.0, 0.1, 0.2}	0.05
Weight decay	{0.0, 0.01, 0.05, 0.1}	0.0
Gradient accumulation	{4, 8, 16}	4

V1 vs V2: When to Use Which

Use case	Recommended
Safety audits & PPE compliance	V2 -- significantly better safety detection
Hazard identification	V2 -- trained on diverse hazard scenarios
General progress tracking	Both work well; V1 is lighter
Quality defect detection	Both comparable
Resource-constrained deployment	V1 (identical architecture but lighter training)
Safety-critical applications	V2 -- deeper safety understanding

Intended Uses

Primary use cases:

Automated safety compliance auditing from site photographs
PPE verification across construction teams
Hazard detection and risk level assessment
Construction progress reporting
Quality control and defect identification
Environmental impact documentation

Input requirements:

Single construction site image (JPEG, PNG, WebP, BMP)
Supports ground-level, drone, and fixed-camera viewpoints
Works best with daylight, well-lit images

Limitations

Single-image analysis: Analyzes one image at a time; no temporal comparison between images
Visible elements only: Cannot detect hidden structural issues or elements behind walls/coverings
No sensory data: Cannot measure noise levels, dust concentration, or air quality from static images
PPE at distance: Small or distant workers may have lower PPE detection confidence
Schema-bound: Output follows the Horama-BTP v1 schema strictly -- custom fields use the domain_fields extension
Not a replacement for human inspectors: The model assists and augments human safety inspections but should not be the sole decision-maker for safety-critical assessments

Hardware Requirements

Setup	VRAM / RAM	Precision	Notes
NVIDIA GPU	~8 GB VRAM	BF16	Recommended for production
Apple Silicon	~8 GB RAM	FP32	Supported via MPS backend
CPU	~12 GB RAM	FP32	Functional but slower

License

AGPL-3.0 -- This model can be freely used, modified, and redistributed as long as derivative work remains open-source under the same license.

For commercial or closed-source usage, please contact Horama for a commercial license.

Citation

@misc{horama-btp-v2-2025,
  title   = {Horama-BTP V2: Safety-Enhanced Vision-Language Model for Construction Site Analysis},
  author  = {Horama},
  year    = {2025},
  url     = {https://huggingface.co/Horama/Horama_BTP_v2},
  note    = {Two-stage LoRA fine-tuning from Qwen2.5-VL-3B-Instruct with HPO-optimized safety training}
}

Built by Horama | Construction intelligence, powered by vision AI