File size: 10,701 Bytes
4dd008e
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c4b369c
 
 
 
 
 
7275aef
c4b369c
7275aef
 
 
 
 
c4b369c
7275aef
c4b369c
 
 
 
7275aef
 
 
 
 
 
 
c4b369c
 
 
 
 
 
 
 
7275aef
 
 
 
c4b369c
7275aef
 
c4b369c
 
 
 
 
 
7275aef
 
 
 
 
 
 
 
c4b369c
 
 
 
 
 
 
 
 
 
7275aef
 
c4b369c
7275aef
 
 
 
 
 
 
 
 
 
 
 
 
c4b369c
7275aef
c4b369c
 
7275aef
c4b369c
7275aef
 
 
c4b369c
7275aef
 
c4b369c
 
 
7275aef
c4b369c
7275aef
 
 
 
c4b369c
7275aef
 
 
 
 
 
c4b369c
7275aef
 
 
 
 
c4b369c
7275aef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c4b369c
 
 
 
 
 
 
 
7275aef
 
c4b369c
7275aef
 
c4b369c
 
 
 
 
7275aef
 
 
 
c4b369c
7275aef
c4b369c
7275aef
c4b369c
7275aef
 
 
 
 
c4b369c
7275aef
c4b369c
7275aef
 
 
 
 
 
c4b369c
7275aef
c4b369c
7275aef
c4b369c
7275aef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c4b369c
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7275aef
c4b369c
 
7275aef
 
 
c4b369c
7275aef
c4b369c
7275aef
 
 
 
 
 
 
c4b369c
7275aef
c4b369c
7275aef
 
 
c4b369c
7275aef
 
c4b369c
7275aef
 
c4b369c
 
7275aef
c4b369c
7275aef
c4b369c
7275aef
c4b369c
7275aef
 
 
 
 
 
 
c4b369c
7275aef
c4b369c
7275aef
c4b369c
7275aef
c4b369c
7275aef
 
 
 
c4b369c
7275aef
c4b369c
7275aef
 
 
 
 
 
 
 
c4b369c
7275aef
c4b369c
7275aef
c4b369c
7275aef
c4b369c
7275aef
 
c4b369c
7275aef
 
c4b369c
 
7275aef
 
 
 
 
c4b369c
7275aef
 
 
 
c4b369c
7275aef
 
 
c4b369c
7275aef
c4b369c
7275aef
 
 
c4b369c
7275aef
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c4b369c
 
 
 
 
 
7275aef
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
---
language: en
library_name: transformers
license: mit
tags:
- finetuning
- lora
- qlora
- unsloth
- gpu
- distributed
datasets:
- wikitext
pipeline_tag: text-generation
model-index:
- name: Humigence
  results:
  - task:
      type: text-generation
    dataset:
      name: WikiText-2
      type: wikitext
    metrics:
      - type: loss
        value: 1.50
---

# ๐Ÿง  Humigence CLI

**Your AI. Your pipeline. Zero code.**

A complete MLOps suite built for makers, teams, and enterprises. Humigence provides zero-config, GPU-aware fine-tuning with surgical precision and complete reproducibility.

## โœจ Key Features

- ๐ŸŽฏ **Interactive Wizard**: Step-by-step configuration with Basic/Advanced modes
- ๐Ÿ–ฅ๏ธ **Smart GPU Detection**: Automatic detection and selection of available GPUs
- ๐Ÿš€ **Dual-GPU Training**: Multi-GPU support with Unsloth + TorchRun
- ๐Ÿงช **Training Recipes**: QLoRA (4-bit), LoRA (FP16/BF16), Full Fine-tuning
- ๐Ÿ“Š **Intelligent Batching**: Auto-fit batch size to available VRAM
- ๐Ÿ”„ **Complete Reproducibility**: Config snapshots and reproduce scripts
- ๐Ÿ“ˆ **Built-in Evaluation**: Curated prompts and quality gates
- ๐Ÿ“ฆ **Artifact Export**: Structured outputs with run summaries

## ๐Ÿš€ Quick Start

### Prerequisites

- **GPU**: NVIDIA GPU with CUDA support (RTX 5090, RTX 4080, etc.)
- **RAM**: 8GB+ recommended
- **Storage**: 10GB+ for models and datasets
- **Python**: 3.8+ with PyTorch

### Installation

```bash
# Clone the repository
git clone https://github.com/your-username/humigence.git
cd humigence

# Install dependencies
pip install -r requirements.txt

# Set up Unsloth (required for training)
python3 training/unsloth/setup_humigence_unsloth.py

# Launch the interactive wizard
python3 cli/main.py
```

### Basic Usage

```bash
# Launch the interactive wizard
python3 cli/main.py

# The wizard will guide you through:
# 1. Model selection
# 2. Dataset configuration  
# 3. Training parameters
# 4. GPU selection (single or multi-GPU)
# 5. Launch training
```

## ๐ŸŽฏ Training Workflow

### 1. Interactive Setup

The Humigence wizard guides you through:

- **Setup Mode**: Basic (essential config) or Advanced (full control)
- **Hardware Detection**: Automatic GPU, CPU, and memory detection
- **Model Selection**: Choose from supported models or custom paths
- **Dataset Loading**: Auto-detection from `~/humigence_data/` or custom paths
- **Training Recipe**: QLoRA, LoRA, or Full Fine-tuning
- **GPU Selection**: Single-GPU auto-selection or multi-GPU prompting

### 2. GPU Selection

Humigence intelligently handles GPU selection:

- **Single GPU**: Automatically selects and uses the available GPU
- **Multiple GPUs**: Prompts you to choose:
  ```
  ๐Ÿ”ง Training Mode:
  > Multi-GPU Training (all available GPUs)
    Single GPU Training (choose specific GPU)
  ```

### 3. Training Execution

```bash
๐Ÿš€ Humigence Training Starting...
โœ… Configuration Loaded: [all settings]
๐Ÿ–ฅ๏ธ GPU Detection: 2x RTX 5090 detected
๐Ÿ”ง Training Mode: Multi-GPU Training
๐Ÿ“ฆ Loading model: Qwen/Qwen2.5-0.5B
โœ… LoRA adapters applied
๐Ÿ“š Loading dataset: wikitext2 (10,000 samples)
๐Ÿš€ Starting training with TorchRun...
โœ… Training complete โ€” adapters saved.
```

## ๐Ÿ“Š Supported Models

- **Qwen/Qwen2.5-0.5B**: 77M parameters (recommended for testing)
- **microsoft/Phi-2**: 839M parameters
- **TinyLlama/TinyLlama-1.1B-Chat-v1.0**: 369M parameters
- **Custom Models**: Any HuggingFace model or local path

## ๐Ÿ—‚๏ธ Dataset Support

- **JSONL Format**: Line-by-line JSON with instruction/output pairs
- **Auto-Detection**: Scans `~/humigence_data/` directory
- **Custom Paths**: Specify any local dataset file
- **Sample Datasets**: Includes demo datasets for testing

### Dataset Format

```json
{"instruction": "What is machine learning?", "output": "Machine learning is a subset of artificial intelligence..."}
{"instruction": "Explain quantum computing", "output": "Quantum computing uses quantum mechanical phenomena..."}
```

## ๐Ÿ–ฅ๏ธ Hardware Requirements

### Minimum Requirements
- **GPU**: NVIDIA GPU with 8GB+ VRAM
- **RAM**: 16GB+ system RAM
- **Storage**: 20GB+ free space

### Recommended Setup
- **GPU**: RTX 4080/4090/5090 or better
- **RAM**: 32GB+ system RAM
- **Storage**: 50GB+ free space

### Multi-GPU Support
- **Dual-GPU**: RTX 5090 + RTX 5090 (tested)
- **Memory**: 16GB+ VRAM per GPU recommended
- **Training**: Automatic TorchRun distribution

## ๐Ÿ“ Project Structure

```
humigence/
โ”œโ”€โ”€ cli/
โ”‚   โ”œโ”€โ”€ main.py              # Main CLI entry point
โ”‚   โ”œโ”€โ”€ config_wizard.py     # Interactive configuration wizard
โ”‚   โ””โ”€โ”€ lora_wizard.py       # LoRA-specific wizard
โ”œโ”€โ”€ training/
โ”‚   โ””โ”€โ”€ unsloth/            # Unsloth integration
โ”‚       โ”œโ”€โ”€ wizard.py       # Unsloth training wizard
โ”‚       โ””โ”€โ”€ train_lora_dual.py  # Multi-GPU training script
โ”œโ”€โ”€ pipelines/
โ”‚   โ””โ”€โ”€ lora_trainer.py     # Training pipeline
โ”œโ”€โ”€ utils/
โ”‚   โ”œโ”€โ”€ device.py           # Hardware detection
โ”‚   โ”œโ”€โ”€ dataset_loader.py   # Dataset utilities
โ”‚   โ””โ”€โ”€ validators.py       # Data validation
โ”œโ”€โ”€ config/
โ”‚   โ””โ”€โ”€ default_config.json # Default configuration
โ””โ”€โ”€ runs/                   # Training outputs
    โ””โ”€โ”€ humigence/
        โ”œโ”€โ”€ config.snapshot.json
        โ”œโ”€โ”€ adapters/       # LoRA weights
        โ””โ”€โ”€ artifacts.zip   # Complete export
```

## ๐Ÿ”ง Configuration

### Basic Mode (Recommended)

Essential configuration with sensible defaults:

- **Learning Rate**: 2e-4
- **Epochs**: 1
- **Gradient Accumulation**: 4
- **LoRA Rank**: 16
- **LoRA Alpha**: 32

### Advanced Mode

Full control over all parameters:

- LoRA configuration (rank, alpha, dropout)
- Training hyperparameters
- Data processing options
- Evaluation settings

## ๐Ÿš€ Training Modes

### Single-GPU Training

```bash
# Automatically selected when 1 GPU detected
๐Ÿ”ง Single GPU detected - using GPU 0: RTX 5090
๐Ÿš€ Launching single-GPU training...
```

### Multi-GPU Training

```bash
# Prompts when multiple GPUs detected
๐Ÿ”ง 2 GPUs detected - choose training mode
> Multi-GPU Training (all available GPUs)
  Single GPU Training (choose specific GPU)
```

## ๐Ÿ“ˆ Evaluation & Monitoring

### Built-in Evaluation

- **Curated Prompts**: 5 diverse evaluation questions
- **Model Inference**: Generation with temperature sampling
- **Quality Gates**: Loss thresholds and evaluation metrics
- **Status Tracking**: ACCEPTED.txt or REJECTED.txt files

### Run Monitoring

```bash
# View training progress
tail -f runs/humigence/training.log

# Check evaluation results
cat runs/humigence/eval_results.jsonl

# View run summary
cat runs/humigence/run_summary.json
```

## ๐Ÿ”„ Reproducibility

Every training run generates:

- **Config Snapshot**: Complete configuration in JSON
- **Reproduce Script**: One-click rerun capability
- **Artifact Archive**: Complete export of all outputs
- **Run Summary**: Structured metadata for tracking

```bash
# Rerun any training
./runs/humigence/reproduce.sh

# Or use the config directly
python3 training/unsloth/train_lora_dual.py --config runs/humigence/config.snapshot.json
```

## ๐Ÿ› ๏ธ Development

### Dependencies

Core dependencies are pinned for stability:

```txt
transformers>=4.41.0,<5.0.0
torch>=2.1.0
unsloth @ git+https://github.com/unslothai/unsloth.git
rich>=13.0.0
inquirer>=3.1.0
```

### Local Development

```bash
# Install in development mode
pip install -e .

# Run tests
python3 -m pytest tests/

# Run specific test
python3 test_gpu_selection.py
```

## ๐Ÿค Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for details.

### Quick Contribution Guide

1. Fork the repository
2. Create a feature branch: `git checkout -b feature/amazing-feature`
3. Make your changes
4. Add tests if applicable
5. Commit your changes: `git commit -m 'Add amazing feature'`
6. Push to the branch: `git push origin feature/amazing-feature`
7. Open a Pull Request

## ๐Ÿ“„ License

This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details.

## ๐Ÿ™ Acknowledgments

- [Unsloth](https://github.com/unslothai/unsloth) for fast LoRA training
- [HuggingFace](https://huggingface.co/) for the transformers library
- [Microsoft](https://github.com/microsoft) for PEFT and LoRA implementations
- The open-source ML community

## ๐Ÿ†š Comparison with Other Tools

| Feature | Humigence CLI | Other Tools |
|---------|---------------|-------------|
| **Setup** | Interactive wizard | Manual config |
| **GPU Detection** | Automatic | Manual |
| **Multi-GPU** | Built-in TorchRun | Complex setup |
| **Reproducibility** | Complete snapshots | Partial |
| **Evaluation** | Built-in prompts | External tools |
| **Artifacts** | Structured export | Manual collection |

## ๐Ÿ› Troubleshooting

### Common Issues

**GPU not detected:**
```bash
# Check CUDA installation
python3 -c "import torch; print(torch.cuda.is_available())"

# Check GPU visibility
nvidia-smi
```

**Out of memory:**
```bash
# Reduce batch size in config
# Or use QLoRA for memory efficiency
```

**Training fails:**
```bash
# Check logs
cat runs/humigence/training.log

# Verify dataset format
head -5 ~/humigence_data/your_dataset.jsonl
```

### Getting Help

- **Issues**: [GitHub Issues](https://github.com/your-username/humigence/issues)
- **Discussions**: [GitHub Discussions](https://github.com/your-username/humigence/discussions)
- **Documentation**: [Wiki](https://github.com/your-username/humigence/wiki)

## ๐Ÿ—บ๏ธ Roadmap

### Current Features โœ…
- Interactive configuration wizard
- Single and multi-GPU training
- QLoRA and LoRA support
- Built-in evaluation
- Complete reproducibility

### Coming Soon ๐Ÿšง
- RAG implementation
- EnterpriseGPT integration
- Batch inference
- Context length optimization
- Web UI interface
- Model serving

### Future Features ๐Ÿ”ฎ
- Distributed training across nodes
- Advanced evaluation metrics
- Model compression
- Deployment automation

---

**Built with โค๏ธ for the AI community**

*Humigence โ€” Your AI. Your pipeline. Zero code.*

## ๐Ÿ“Š Stats

![GitHub stars](https://img.shields.io/github/stars/your-username/humigence?style=social)
![GitHub forks](https://img.shields.io/github/forks/your-username/humigence?style=social)
![GitHub issues](https://img.shields.io/github/issues/your-username/humigence)
![GitHub license](https://img.shields.io/github/license/your-username/humigence)
![Python version](https://img.shields.io/badge/python-3.8%2B-blue)
![PyTorch](https://img.shields.io/badge/PyTorch-2.1%2B-red)
![CUDA](https://img.shields.io/badge/CUDA-11.8%2B-green)