agentlans/en-zhtw-google-translate
Viewer • Updated • 1,000k • 32
How to use agentlans/en-zhtw with Transformers:
# Use a pipeline as a high-level helper
# Warning: Pipeline type "translation" is no longer supported in transformers v5.
# You must load the model directly (see below) or downgrade to v4.x with:
# 'pip install "transformers<5.0.0'
from transformers import pipeline
pipe = pipeline("translation", model="agentlans/en-zhtw") # Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("agentlans/en-zhtw")
model = AutoModelForSeq2SeqLM.from_pretrained("agentlans/en-zhtw")This model is a fine-tuned version of Helsinki-NLP/opus-mt-en-zh, trained on the agentlans/en-zhtw-google-translate dataset.
It is optimized to produce Traditional Chinese translations by default, enhancing the naturalness and fluency of the output.
本模型為 Helsinki-NLP/opus-mt-en-zh 的微調版本,使用 agentlans/en-zhtw-google-translate 資料集進行訓練。
模型已針對輸出繁體中文進行最佳化,提升了翻譯結果的自然度與流暢性。
from transformers import pipeline
# Load the translation model
# 載入翻譯模型
model_checkpoint = "agentlans/en-zhtw"
translator = pipeline("translation", model=model_checkpoint)
# This is for correcting English punctuation marks to Traditional Chinese.
# 這是為了將英語標點符號校正為繁體中文。
def en_to_zh_punct(text):
punct = {
'!': '!', '?': '?', ',': ',', '.': '。',
':': ':', ';': ';', '(': '(', ')': ')',
'[': '【', ']': '】', '{': '{', '}': '}'
}
result, in_dq, in_sq = [], False, False
for ch in text:
if ch == '"':
result.append("」" if in_dq else "「")
in_dq = not in_dq
elif ch == "'":
result.append("』" if in_sq else "『")
in_sq = not in_sq
else:
result.append(punct.get(ch, ch))
return "".join(result)
# The main function for translating English to Traditional Chinese
# 將英語翻譯成繁體中文的主要功能
def translate(en_text):
return [en_to_zh_punct(x["translation_text"]) for x in translator(en_text)]
# Example
# 範例
translate(
[
"Trump announces new tariffs on penguin islands. The penguins plan to tax U.S. imports in retaliation.",
"We now return to the White House for the latest developments on the trade war.",
]
)
# ['川普宣佈對企鵝島徵收新關稅,企鵝打算對美國進口產品徵稅報復。', '我們現在回到白宮尋找貿易戰的最新發展。']
The following hyperparameters were used during training:
| Training Loss | Epoch | Step | Validation Loss | Input Tokens Seen |
|---|---|---|---|---|
| 1.3993 | 1.0 | 99952 | 1.2487 | 54454616 |
| 1.2801 | 2.0 | 199904 | 1.1701 | 108935048 |
| 1.1728 | 3.0 | 299856 | 1.1232 | 163424808 |
| 1.1001 | 4.0 | 399808 | 1.0871 | 217911400 |
| 1.0243 | 5.0 | 499760 | 1.0584 | 272407288 |
Base model
Helsinki-NLP/opus-mt-en-zh