Beynele

Beynele is a Lumina-Image 2.0 based text-to-image model adapted for Kazakh cultural image generation. It is trained with a data-centric pipeline that combines curated cultural data, synthetic supervision, revive-before-reject curation, a base-model anchor dataset, and reference-based evaluation with Beynele-Bench.

Use With Diffusers

import torch
from diffusers import Lumina2Pipeline

pipe = Lumina2Pipeline.from_pretrained(
    "issai/Beynele",
    torch_dtype=torch.bfloat16,
)
pipe.enable_model_cpu_offload()

prompt = "A Kazakh dombra resting on a patterned felt carpet."
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=4.0,
    num_inference_steps=40,
    cfg_trunc_ratio=0.25,
    cfg_normalization=True,
    generator=torch.Generator("cpu").manual_seed(42),
).images[0]
image.save("beynele_dombra.png")

Model Details

Field	Value
Architecture	Lumina-Image 2.0 / flow-based diffusion transformer
Base pipeline	`Alpha-VLLM/Lumina-Image-2.0`
Text encoder	`google/gemma-2-2b`
Diffusers class	`Lumina2Pipeline`
Resolution	1024 x 1024
Recommended dtype	`torch.bfloat16`
Recommended steps	40
Recommended guidance	4.0

Only the diffusion transformer is adapted. The tokenizer, text encoder, scheduler, and VAE are carried over from the Lumina-Image 2.0 Diffusers release to provide a direct from_pretrained loading path.

Training Data Summary

The final training pool contains three branches:

Branch	Examples
Core cultural dataset	196k image-text pairs, about 73k unique images
Text-image dataset	128k examples
Base-model anchor dataset	109k examples

The cultural dataset covers Kazakh people, material culture, buildings, landmarks, food, national symbols, natural scenes, activities, and text-bearing images. The full fine-tuning corpus is not released because of privacy, licensing, and cultural-data governance constraints.

Evaluation

Model	Beynele-Bench	GenEval	WISE	UniGenBench++
Lumina-Image 2.0	4.85	0.73	0.54	64.98
Qwen-Image	6.51	0.87	0.62	78.36
Beynele	7.29	0.74	0.51	65.53
Beynele + prompt mediation	7.01	0.78	0.73	68.89

Beynele-Bench uses 750 prompt-reference pairs and reports the arithmetic mean of Qwen3-VL 32B and Gemini 2.5 Pro similarity scores on a 1-10 scale.

Intended Use

Beynele is intended for research on cultural text-to-image generation, low-resource visual adaptation, Kazakh cultural representation, benchmarked T2I evaluation, and data-centric model adaptation.

Limitations and Safety

The model may hallucinate cultural details, produce imperfect Kazakh text, blur faces under difficult compositions, or shift prompt details under strong cultural specialization. It should not be used for identity verification, historical authentication, or high-stakes cultural documentation. Human review and local cultural expertise remain important for sensitive uses.

Provenance

The Hub package contains the converted Diffusers transformer/ safetensors used by Lumina2Pipeline.from_pretrained. The source EMA checkpoint is retained in the local release backup and cache for internal traceability.

Licensing

Beynele is released under the Apache License 2.0. The model is adapted from Alpha-VLLM/Lumina-Image-2.0; users should also follow the licenses and terms of any bundled or upstream components used by the Diffusers pipeline.

Citation

@article{aikyn2026beynele,
  title = {A Data-Centric Framework for Adapting Text-to-Image Models to Low-Resource Cultural Domains},
  author = {Aikyn, Nartay and Aryngazin, Anuar and Maxutov, Akylbek and Varol, Huseyin Atakan},
  year = {2026},
  note = {Pre-release manuscript}
}

Downloads last month: 13

Model tree for issai/Beynele

Base model

Alpha-VLLM/Lumina-Image-2.0

Finetuned

(15)

this model

issai
/

Beynele