Ken Tsui

kenhktsui

LighterDarkness's profile picture

weizhiwang's profile picture

muhammadzeeshan007's profile picture

https://kenhktsui.github.io/

kenhktsui
kenhktsui

AI & ML interests

ML engineer, researcher VLM, LLM benchmark Opinions are my own

Recent Activity

upvoted a paper about 1 month ago

A Very Big Video Reasoning Suite

liked a model 2 months ago

moonshotai/Kimi-K2.5

liked a dataset 3 months ago

VITRA-VLA/VITRA-1M

View all activity

Organizations

kenhktsui 's collections 7

Self Correction Bench

Benchmarking LLM capability of external and internal error correction

kenhktsui/scli5

Viewer • Updated Jul 6, 2025 • 286 • 53
kenhktsui/gsm8k_sc

Viewer • Updated Jul 6, 2025 • 1.31k • 54
kenhktsui/prm800k_sc

Viewer • Updated Jul 6, 2025 • 448 • 80
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Paper • 2507.02778 • Published Jul 3, 2025 • 9

LongTalk

A Very Long Chain-of-Thought Dataset for Reasoning Model Post-Training

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 104 • 13
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 17 • 1
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged

Text Generation • 8B • Updated Dec 30, 2024 • 2
kenhktsui/llama3.1-8b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 52 • 1

CoT

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 104 • 13
open-thoughts/OpenThoughts-114k

Viewer • Updated Aug 31, 2025 • 228k • 169k • 821
ServiceNow-AI/R1-Distill-SFT

Viewer • Updated Feb 8, 2025 • 1.85M • 2.09k • 315
Tiiny/QWQ-LONGCOT-500K

Viewer • Updated Dec 26, 2024 • 286k • 277 • 124

VLM Data

HuggingFaceM4/the_cauldron

Viewer • Updated May 6, 2024 • 1.88M • 56.5k • 523
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24, 2025 • 3.94M • 27.2k • 233
HuggingFaceM4/Docmatix

Viewer • Updated Aug 26, 2024 • 2.55M • 8.8k • 300
zwq2018/embodied_reasoner

Preview • Updated Apr 21, 2025 • 692 • 21

FastText Model for Pretraining Data Curation

kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26, 2025 • 215 • 28
kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 4 • 4
kenhktsui/code-natural-language-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 407 • 5
kenhktsui/math-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 14 • 2

textbook-quality-classifier

kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 4 • 4
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26, 2025 • 215 • 28
kenhktsui/llm-data-textbook-quality-classifier-v1

Text Classification • 0.3B • Updated May 25, 2024 • 15 • 10
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v1

Text Classification • Updated May 25, 2024 • 13 • 4

nano-phi

Small Language Model Trained with Textbook Quality Data - How Far Can It Go?

kenhktsui/nano-phi-115M-v0.1

Text Generation • 0.1B • Updated Apr 6, 2024 • 105 • 4
kenhktsui/nano-phi-115M-control-v0.1

Text Generation • 0.1B • Updated Feb 4, 2024 • 7 • 1
kenhktsui/nano-phi-192M-v0.1

Text Generation • 0.2B • Updated May 8, 2024 • 1 • 1

Self Correction Bench

Benchmarking LLM capability of external and internal error correction

kenhktsui/scli5

Viewer • Updated Jul 6, 2025 • 286 • 53
kenhktsui/gsm8k_sc

Viewer • Updated Jul 6, 2025 • 1.31k • 54
kenhktsui/prm800k_sc

Viewer • Updated Jul 6, 2025 • 448 • 80
Self-Correction Bench: Revealing and Addressing the Self-Correction Blind Spot in LLMs

Paper • 2507.02778 • Published Jul 3, 2025 • 9

FastText Model for Pretraining Data Curation

kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26, 2025 • 215 • 28
kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 4 • 4
kenhktsui/code-natural-language-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 407 • 5
kenhktsui/math-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 14 • 2

LongTalk

A Very Long Chain-of-Thought Dataset for Reasoning Model Post-Training

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 104 • 13
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 17 • 1
kenhktsui/qwen2.5-7b-instruct-thinking-sft-merged

Text Generation • 8B • Updated Dec 30, 2024 • 2
kenhktsui/llama3.1-8b-instruct-thinking-sft-merged-gguf

8B • Updated Dec 30, 2024 • 52 • 1

textbook-quality-classifier

kenhktsui/fineweb-edu-fasttext-classifier

Text Classification • Updated Jul 3, 2025 • 4 • 4
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v2

Text Classification • Updated Jun 26, 2025 • 215 • 28
kenhktsui/llm-data-textbook-quality-classifier-v1

Text Classification • 0.3B • Updated May 25, 2024 • 15 • 10
kenhktsui/llm-data-textbook-quality-fasttext-classifier-v1

Text Classification • Updated May 25, 2024 • 13 • 4

CoT

kenhktsui/longtalk-cot-v0.1

Viewer • Updated Dec 30, 2024 • 61.2k • 104 • 13
open-thoughts/OpenThoughts-114k

Viewer • Updated Aug 31, 2025 • 228k • 169k • 821
ServiceNow-AI/R1-Distill-SFT

Viewer • Updated Feb 8, 2025 • 1.85M • 2.09k • 315
Tiiny/QWQ-LONGCOT-500K

Viewer • Updated Dec 26, 2024 • 286k • 277 • 124

nano-phi

Small Language Model Trained with Textbook Quality Data - How Far Can It Go?

kenhktsui/nano-phi-115M-v0.1

Text Generation • 0.1B • Updated Apr 6, 2024 • 105 • 4
kenhktsui/nano-phi-115M-control-v0.1

Text Generation • 0.1B • Updated Feb 4, 2024 • 7 • 1
kenhktsui/nano-phi-192M-v0.1

Text Generation • 0.2B • Updated May 8, 2024 • 1 • 1

VLM Data

HuggingFaceM4/the_cauldron

Viewer • Updated May 6, 2024 • 1.88M • 56.5k • 523
lmms-lab/LLaVA-OneVision-Data

Viewer • Updated May 24, 2025 • 3.94M • 27.2k • 233
HuggingFaceM4/Docmatix

Viewer • Updated Aug 26, 2024 • 2.55M • 8.8k • 300
zwq2018/embodied_reasoner

Preview • Updated Apr 21, 2025 • 692 • 21