ViSoLex Resources
community
AI & ML interests
None defined yet.
Recent Activity
Organization Card
📦 ViSoNorm Toolkit — Vietnamese Text Normalization & Processing
ViSoNorm is a specialized toolkit for Vietnamese text normalization and processing, optimized for NLP environments and easily installable via PyPI. Resources (datasets, models) are stored and managed directly on Hugging Face Hub and GitHub Releases.
🚀 Key Features
1. 🔧 BasicNormalizer — Basic Text Normalization
- Case folding: convert entire text to lowercase/uppercase/capitalize.
- Tone normalization: normalize Vietnamese tone marks.
- Basic preprocessing: remove extra whitespace, special characters, sentence formatting.
2. 😀 EmojiHandler — Emoji Processing
- Detect emojis: detect emojis in text.
- Split emoji text: separate emojis from sentences.
- Remove emojis: remove all emojis.
3. ✏️ Lexical Normalization — Social Media Text Normalization
- ViSoLexNormalizer: Normalize text using deep learning models from HuggingFace.
- NswDetector: Detect non-standard words (NSW).
- detect_nsw(): Utility function to detect NSW.
- normalize_sentence(): Utility function to normalize sentences.
4. 📊 Resource Management — Dataset Management
list_datasets()— List available datasets.load_dataset()— Load dataset from GitHub Releases.get_dataset_info()— View detailed dataset information.
5. 🧠 Task Models — Task Processing Models
- SpamReviewDetection — Spam detection.
- HateSpeechDetection — Hate speech detection.
- HateSpeechSpanDetection — Hate speech span detection.
- EmotionRecognition — Emotion recognition.
- AspectSentimentAnalysis — Aspect-based sentiment analysis.
📥 Installation
Install from PyPI (Recommended)
pip install visonorm
models
103
visolex/emotion-sphobert
Text Classification
•
0.1B
•
Updated
•
8
visolex/textcnn-hsd
Text Classification
•
Updated
•
99
visolex/bilstm-hsd
Text Classification
•
Updated
•
131
visolex/sphobert-hsd
Text Classification
•
Updated
•
68
visolex/mbert-hsd
Text Classification
•
Updated
•
66
visolex/roberta-gru-hsd
Text Classification
•
0.1B
•
Updated
•
49
visolex/xlm-r-hsd
Text Classification
•
0.6B
•
Updated
•
62
visolex/vihate-t5-hsd
Text Classification
•
Updated
•
66
•
1
visolex/visobert-hsd
Text Classification
•
97.6M
•
Updated
•
84
visolex/bartpho-hsd
Text Classification
•
Updated
•
92
datasets
12
visolex/ViHSD
Viewer
•
Updated
•
33.4k
•
52
visolex/UIT-VSMEC
Viewer
•
Updated
•
6.93k
•
23
visolex/VLSP2018-ABSA-Restaurant
Viewer
•
Updated
•
4.75k
•
5
visolex/VLSP2018-ABSA-Hotel
Viewer
•
Updated
•
5.6k
•
18
visolex/BKEE
Viewer
•
Updated
•
19k
•
11
visolex/ViLexNorm
Viewer
•
Updated
•
10.5k
•
22
visolex/ViSFD
Viewer
•
Updated
•
11.1k
•
29
visolex/VITHSD
Viewer
•
Updated
•
10k
•
5
visolex/ViSpamReviews
Viewer
•
Updated
•
19.9k
•
85
visolex/VN-HSD
Viewer
•
Updated
•
40.5k
•
5