ShweYon-Qwen-V3-Base

ShweYon-Qwen-V3-Base is a Myanmar-centric base language model built on top of the Qwen 2.5 1.5B architecture. This model is a milestone in the "ShweYon" project, focusing on improving the efficiency of Myanmar script processing through a custom tokenizer.

🌟 Key Highlights

  • Custom Myanmar Tokenizer: We expanded the vocabulary to include thousands of Myanmar-specific tokens, significantly reducing the token-to-word ratio and improving generation speed/quality.
  • Base Pre-training: Fine-tuned for 150 steps on a curated Myanmar text corpus to align the new vocabulary with the base model's knowledge.
  • Efficient Size: At 1.5B parameters, it offers a great balance between performance and resource efficiency (suitable for mobile and edge devices).

πŸ› οΈ Training Details

  • Base Model: Qwen/Qwen2.5-1.5B
  • Technique: LoRA (Low-Rank Adaptation) merged into base weights.
  • Training Steps: 150
  • Final Loss: 1.0711
  • Max Length: 512 tokens

⚠️ Important Note

This is a Base Model. It is designed to predict the next token and complete sentences. It has not yet been instruction-tuned. Therefore, it may not respond correctly to direct questions or chat commands. For a chat-style experience, further SFT (Supervised Fine-Tuning) is required.

πŸš€ How to Use

Python

from transformers import AutoModelForCausalLM, AutoTokenizer import torch

model_id = "URajinda/ShweYon-Qwen-V3-Base"

Load Tokenizer

tokenizer = AutoTokenizer.from_pretrained(model_id, trust_remote_code=True)

Load Model

model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.float16, device_map="auto", trust_remote_code=True )

Text Generation Step

prompt = "α€™α€Όα€”α€Ία€™α€¬α€”α€­α€―α€„α€Ία€„α€Άα€žα€Šα€Ί" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) # device_map="auto" ထတွက် α€•α€­α€―α€…α€­α€α€Ία€α€»α€›α€žα€Šα€Ί outputs = model.generate( **inputs, max_new_tokens=100, do_sample=True, top_p=0.9, temperature=0.7 )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Downloads last month
2
Safetensors
Model size
2B params
Tensor type
F32
Β·
U8
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for URajinda/ShweYon-Qwen-V3-Base

Base model

Qwen/Qwen2.5-1.5B
Quantized
(64)
this model
Finetunes
2 models