AI & ML interests

None defined yet.

Recent Activity

AnnyNguyen  updated a Space 1 day ago
visolex/README
AnnyNguyen  updated a model 1 day ago
visolex/emotion-sphobert
AnnyNguyen  updated a model 1 day ago
visolex/textcnn-hsd
View all activity

📦 ViSoNorm Toolkit — Vietnamese Text Normalization & Processing

ViSoNorm is a specialized toolkit for Vietnamese text normalization and processing, optimized for NLP environments and easily installable via PyPI. Resources (datasets, models) are stored and managed directly on Hugging Face Hub and GitHub Releases.


🚀 Key Features

1. 🔧 BasicNormalizer — Basic Text Normalization

  • Case folding: convert entire text to lowercase/uppercase/capitalize.
  • Tone normalization: normalize Vietnamese tone marks.
  • Basic preprocessing: remove extra whitespace, special characters, sentence formatting.

2. 😀 EmojiHandler — Emoji Processing

  • Detect emojis: detect emojis in text.
  • Split emoji text: separate emojis from sentences.
  • Remove emojis: remove all emojis.

3. ✏️ Lexical Normalization — Social Media Text Normalization

  • ViSoLexNormalizer: Normalize text using deep learning models from HuggingFace.
  • NswDetector: Detect non-standard words (NSW).
  • detect_nsw(): Utility function to detect NSW.
  • normalize_sentence(): Utility function to normalize sentences.

4. 📊 Resource Management — Dataset Management

  • list_datasets() — List available datasets.
  • load_dataset() — Load dataset from GitHub Releases.
  • get_dataset_info() — View detailed dataset information.

5. 🧠 Task Models — Task Processing Models

  • SpamReviewDetection — Spam detection.
  • HateSpeechDetection — Hate speech detection.
  • HateSpeechSpanDetection — Hate speech span detection.
  • EmotionRecognition — Emotion recognition.
  • AspectSentimentAnalysis — Aspect-based sentiment analysis.

📥 Installation

Install from PyPI (Recommended)

pip install visonorm