PaddleOCR-Pytorch / README_zh.md

JoyCN

Reorg: move v2/v3/v4 pth to legacy/, English README as default (+ README_zh.md), update config.json

aae75f0 verified about 2 months ago

preview code

raw

history blame contribute delete

12.2 kB

metadata

license: apache-2.0
language:
  - zh
  - en
  - ja
  - ko
  - fr
  - de
  - es
  - ru
  - ar
  - hi
  - th
  - el
library_name: pytorch
tags:
  - ocr
  - text-detection
  - text-recognition
  - paddleocr
  - pp-ocrv5
  - multilingual
  - svtr
  - db
pipeline_tag: image-to-text

PP-OCRv5 PyTorch Model Zoo（中文版）

本仓库的主 README 为英文版 README.md。本文件为中文对照版。

PP-OCRv5 全系列模型的 PyTorch 版本（safetensors 格式），从百度 PaddlePaddle 官方 .pdparams 动态图权重精确转换而来，推理结果与 PaddleOCR 原版位精确一致。

文本检测：2 个（mobile / server）
文本识别（基础）：2 个，覆盖简中 / 繁中 / 英文 / 日文
文本识别（多语言）：11 个，覆盖 100+ 语种（韩 / 法 / 德 / 俄 / 阿拉伯 / 天城文 / 泰 / 希腊 / 泰米尔 / 泰卢固 / 纯英文等）

本仓库仅包含权重、配置和字典，不包含推理代码。推理请配合 PaddleOCR2Pytorch 使用，或参考下文"自定义 Python 推理"章节自行集成。

仓库结构

.
├── README.md / README_zh.md
├── LICENSE                                                     # Apache 2.0
├── config.json                                                 # 仓库元数据 + 模型索引
├── *.safetensors                                               # 15 个 PP-OCRv5 权重（位于根目录，URL 稳定）
├── ptocr_v5_server_{det,rec}.pth                               # V5 服务端的 pth 副本（向后兼容保留）
├── configs/
│   ├── det/PP-OCRv5/
│   │   ├── PP-OCRv5_mobile_det.yml                             # 移动端检测
│   │   └── PP-OCRv5_server_det.yml                             # 服务端检测
│   └── rec/PP-OCRv5/
│       ├── PP-OCRv5_mobile_rec.yml                             # 基础识别（中繁英日，移动端）
│       ├── PP-OCRv5_server_rec.yml                             # 基础识别（中繁英日，服务端）
│       └── multi_language/
│           ├── en_PP-OCRv5_mobile_rec.yaml                     # 英文专用
│           ├── korean_PP-OCRv5_mobile_rec.yml                  # 韩文 + 英文
│           ├── latin_PP-OCRv5_mobile_rec.yml                   # 拉丁字母 40+ 语种（法/德/西/意/葡 等）
│           ├── eslav_PP-OCRv5_mobile_rec.yml                   # 东斯拉夫（俄/白俄/乌克兰）
│           ├── cyrillic_PP-OCRv5_mobile_rec.yaml               # 西里尔字母 33 种
│           ├── arabic_PP-OCRv5_mobile_rec.yaml                 # 阿拉伯 / 波斯 / 维吾尔 / 乌尔都 等
│           ├── devanagari_PP-OCRv5_mobile_rec.yaml             # 天城文系 14 种（印地/马拉地/尼泊尔/梵文 等）
│           ├── th_PP-OCRv5_mobile_rec.yaml                     # 泰文
│           ├── el_PP-OCRv5_mobile_rec.yaml                     # 希腊文
│           ├── ta_PP-OCRv5_mobile_rec.yaml                     # 泰米尔文
│           └── te_PP-OCRv5_mobile_rec.yaml                     # 泰卢固文
└── dicts/                                                      # 字符集字典（rec 推理必需）
    ├── ppocrv5_dict.txt                                        # 基础（中繁英日）
    ├── ppocrv5_en_dict.txt
    ├── ppocrv5_korean_dict.txt
    └── ...（共 12 个）

legacy/                                                         # 旧版本（v2/v3/v4）pth 集中目录
├── ch_ptocr_mobile_v2.0_cls_infer.pth
├── ch_ptocr_v4_det_infer.pth
├── ch_ptocr_v4_rec_infer.pth
├── en_ptocr_v3_det_infer.pth
└── en_ptocr_v4_rec_infer.pth

所有 rec yaml 的 character_dict_path 已改写为相对路径 ./dicts/...，git clone 或 snapshot_download 下载后无需修改路径即可使用。

模型清单

文本检测

权重文件	对应 yaml	场景	文件大小
`ptocr_v5_mobile_det.safetensors`	`configs/det/PP-OCRv5/PP-OCRv5_mobile_det.yml`	移动端 / CPU 推荐	~14 MB
`ptocr_v5_server_det.safetensors`	`configs/det/PP-OCRv5/PP-OCRv5_server_det.yml`	服务端 / 高精度	~101 MB

文本识别（基础）

权重文件	对应 yaml	支持语种	文件大小
`ptocr_v5_mobile_rec.safetensors`	`configs/rec/PP-OCRv5/PP-OCRv5_mobile_rec.yml`	简中 / 繁中 / 英文 / 日文	~31 MB
`ptocr_v5_server_rec.safetensors`	`configs/rec/PP-OCRv5/PP-OCRv5_server_rec.yml`	简中 / 繁中 / 英文 / 日文	~128 MB

文本识别（多语言）

所有多语言识别模型共享相同网络（SVTR_LCNet + PPLCNetV3），仅字符集不同。文件大小 23–28 MB。

权重文件	支持语种
`ptocr_v5_en_mobile_rec.safetensors`	英文专用（针对英文场景定向优化）
`ptocr_v5_korean_mobile_rec.safetensors`	韩文、英文
`ptocr_v5_latin_mobile_rec.safetensors`	法文、德文、南非荷兰文、意大利文、西班牙文、葡萄牙文、捷克文、丹麦文、爱沙尼亚文、克罗地亚文、荷兰文、挪威文、波兰文、瑞典文、芬兰文、土耳其文、越南文等 40+ 语种
`ptocr_v5_eslav_mobile_rec.safetensors`	俄罗斯文、白俄罗斯文、乌克兰文、英文
`ptocr_v5_cyrillic_mobile_rec.safetensors`	俄文、白俄文、乌克兰文、塞尔维亚（西里尔）、保加利亚、蒙古等 33 种西里尔字母语言
`ptocr_v5_arabic_mobile_rec.safetensors`	阿拉伯文、波斯文、维吾尔文、乌尔都文、普什图文、信德文等
`ptocr_v5_devanagari_mobile_rec.safetensors`	印地文、马拉地文、尼泊尔文、梵文等 14 种天城文系语言
`ptocr_v5_th_mobile_rec.safetensors`	泰文、英文
`ptocr_v5_el_mobile_rec.safetensors`	希腊文、英文
`ptocr_v5_ta_mobile_rec.safetensors`	泰米尔文、英文
`ptocr_v5_te_mobile_rec.safetensors`	泰卢固文、英文

快速开始

下载权重

from huggingface_hub import snapshot_download, hf_hub_download

# 方式 1：下载整个仓库（权重 + yml + 字典 + README）
repo_dir = snapshot_download(repo_id="JoyCN/PaddleOCR-Pytorch")
print("仓库下载到：", repo_dir)

# 方式 2：只下载单个权重文件
weight_path = hf_hub_download(
    repo_id="JoyCN/PaddleOCR-Pytorch",
    filename="ptocr_v5_korean_mobile_rec.safetensors"
)

使用 PaddleOCR2Pytorch 项目做推理（推荐）

# 1. clone 推理代码仓
git clone https://github.com/frotms/PaddleOCR2Pytorch
cd PaddleOCR2Pytorch
pip install torch safetensors pyyaml shapely pyclipper opencv-python pillow scikit-image

# 2. 用本仓库下载的权重 + yml（假设下载到 /path/to/hf_repo）
python tools/infer/predict_rec.py \
  --image_dir doc/imgs_words/korean/1.jpg \
  --rec_algorithm SVTR_LCNet \
  --rec_model_path /path/to/hf_repo/ptocr_v5_korean_mobile_rec.safetensors \
  --rec_yaml_path  /path/to/hf_repo/configs/rec/PP-OCRv5/multi_language/korean_PP-OCRv5_mobile_rec.yml \
  --rec_image_shape "3,48,320" \
  --rec_char_dict_path /path/to/hf_repo/dicts/ppocrv5_korean_dict.txt \
  --use_gpu False

PaddleOCR2Pytorch 的 base_ocr_v20.py 已原生支持 .safetensors（按后缀自动识别，向后兼容 .pth）。

自定义 Python 推理代码

如果你不想依赖 PaddleOCR2Pytorch 完整推理栈，下面是一个最小 rec 推理代码片段的骨架。它展示了如何加载权重并做前向推理——但你仍然需要 PaddleOCR2Pytorch 项目中的网络定义代码（pytorchocr/modeling/）。

import sys, numpy as np, cv2, torch, yaml
from safetensors.torch import load_file

# 以下 import 需要你先 clone https://github.com/frotms/PaddleOCR2Pytorch
# 并把其根目录加入 PYTHONPATH
sys.path.insert(0, "/path/to/PaddleOCR2Pytorch")
from pytorchocr.modeling.architectures.base_model import BaseModel
from pytorchocr.postprocess import build_post_process

HF_REPO = "/path/to/hf_repo"   # snapshot_download 得到的路径
yml_path    = f"{HF_REPO}/configs/rec/PP-OCRv5/multi_language/korean_PP-OCRv5_mobile_rec.yml"
weight_path = f"{HF_REPO}/ptocr_v5_korean_mobile_rec.safetensors"

# 1. 读配置 + 字符集
with open(yml_path, encoding="utf-8") as f:
    cfg = yaml.safe_load(f)
dict_path = cfg["Global"]["character_dict_path"]      # './dicts/ppocrv5_korean_dict.txt'
dict_abs  = f"{HF_REPO}/{dict_path.lstrip('./')}"
with open(dict_abs, encoding="utf-8") as f:
    chars = [l.strip("\n\r") for l in f]
n_char = len(chars) + 2                               # +1 blank, +1 space（依 use_space_char 而定）

# 2. 构建网络 + 加载权重（safetensors 零代码执行、mmap 快速加载）
cfg["Architecture"]["Head"]["out_channels_list"] = {
    "CTCLabelDecode": n_char,
    "SARLabelDecode": n_char + 2,
    "NRTRLabelDecode": n_char + 3,
}
net = BaseModel(cfg["Architecture"], out_channels=n_char)
net.load_state_dict(load_file(weight_path, device="cpu"))
net.eval()

# 3. 读图 + 预处理（resize 到 [3, 48, 320]，归一化到 [-1, 1]）
img = cv2.imread("input_word.jpg")
h, w = img.shape[:2]
ratio = w / h
tw = min(int(48 * ratio), 320)
img = cv2.resize(img, (tw, 48))
canvas = np.zeros((48, 320, 3), dtype=np.uint8)
canvas[:, :tw] = img
x = canvas.astype(np.float32).transpose(2, 0, 1) / 255.0
x = (x - 0.5) / 0.5
x = torch.from_numpy(x).unsqueeze(0)

# 4. 前向 + CTC 解码
with torch.no_grad():
    logits = net(x)
post_op = build_post_process({"name": "CTCLabelDecode",
                              "character_dict_path": dict_abs,
                              "use_space_char": True})
result = post_op(logits)
print("识别结果:", result)     # e.g. [('바탕으로', 0.9998)]

推理所需依赖

torch >= 1.13
safetensors >= 0.4
numpy, pillow, opencv-python
pyyaml, shapely, pyclipper
scikit-image      # det 后处理需要

转换 & 验证来源

源权重：PaddlePaddle 官方 .pdparams，来自 paddle-model-ecology.bj.bcebos.com/paddlex/official_pretrained_model/
转换工具：PaddleOCR2Pytorch 中的 converter/ppocr_v5_det_converter.py / ppocr_v5_rec_converter.py
验证：在 macOS Apple Silicon (M 系列) CPU 环境下做过端到端推理，多语言识别结果与 PaddleOCR 官方 .pdparams 位精确一致（float32 小数点后 8 位完全相同）

样例推理输出（CPU，<0.7 s / 张）：

样例	识别结果	置信度
中文 `word_1.jpg`	韩国小馆	0.99797755
韩文 `korean/1.jpg`	바탕으로	0.99977183
法文 `french/1.jpg`	de l'amendement,	0.99656343
阿拉伯 `arabic/ar_1.jpg`	الكيصياوي	0.68281130

Legacy 文件说明（`legacy/`）

原本放在仓库根目录的 PP-OCR v2 / v3 / v4 老版本权重，现已统一迁移到 legacy/ 目录以便整理。这些文件仍然存在且可正常使用，只需在 URL 路径前面加上 legacy/ 前缀即可：

legacy/ch_ptocr_mobile_v2.0_cls_infer.pth
legacy/ch_ptocr_v4_det_infer.pth
legacy/ch_ptocr_v4_rec_infer.pth
legacy/en_ptocr_v3_det_infer.pth
legacy/en_ptocr_v4_rec_infer.pth

15 个 PP-OCRv5 safetensors 权重依然位于仓库根目录，URL 未变。

许可证 & 致谢

License: Apache License 2.0
权重来源：PaddleOCR by PaddlePaddle 团队，Apache 2.0
转换工具：PaddleOCR2Pytorch，Apache 2.0

如果本仓库对你有帮助，请同时给上述两个原始项目 star 致谢。

引用

@misc{pp_ocrv5_pytorch_joycn_2025,
  title        = {PP-OCRv5 PyTorch Model Zoo},
  author       = {JoyCN},
  howpublished = {\url{https://huggingface.co/JoyCN/PaddleOCR-Pytorch}},
  year         = {2025}
}