---
license: apache-2.0
language:
- en
pipeline_tag: text-generation
library_name: transformers
tags:
- tinyllama
- json
- intent-detection
- qlora
- gptq
---

# TinyLlama-JSON-Intent (GPTQ 4-bit)

This is a fine-tuned version of [`TinyLlama/TinyLlama-1.1B-Chat-v1.0`](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) that has been specifically trained to act as an e-commerce intent detection model. Given a catalog of products and a user's request, it outputs a structured JSON object representing the user's intent (`add` or `remove`), the `product` name, and the `quantity`.

This version of the model is **quantized to 4-bit using GPTQ**, making it highly efficient for inference in terms of memory usage and speed.
The QLoRA adapter was merged into the final GPTQ model—no separate adapter loading is required.

- **Adapter Version:** [jtlicardo/tinyllama-ecommerce-intent-adapter](https://huggingface.co/jtlicardo/tinyllama-ecommerce-intent-adapter)

## Model Description

The base model, TinyLlama-Chat, was fine-tuned using the QLoRA method on a synthetic dataset of 100 examples. The training objective was to teach the model to ignore conversational pleasantries and strictly output a JSON object that can be directly parsed by a backend system for managing a shopping cart.

## Intended Use & Limitations

This model is designed for a specific task: parsing user requests in an e-commerce context. It should not be used as a general-purpose chatbot.

- **Primary Use:** Backend service for intent detection from user text.
- **Out-of-Scope:** General conversation, answering questions, or any task not related to adding/removing items from a list.

## How to Use

The model expects a prompt formatted in a specific way, following the TinyLlama-Chat template. You must provide the `Catalog` and the `User` request.

**Important:** You need to install `optimum` and `auto-gptq` to run this 4-bit GPTQ model.
```bash
pip install -q optimum auto-gptq transformers
```

Here's how to run inference in Python:

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline

# Model repository on the Hugging Face Hub
model_id = "jtlicardo/tinyllama-ecommerce-intent-gptq"

# Load the tokenizer and the 4-bit quantized model
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    device_map="auto",
    torch_dtype=torch.float16 # Recommended for inference
)

# --- Define the prompt ---
catalog = """Catalog:
Shampoo (400ml bottle)
Hand Soap (250ml dispenser)
Peanut Butter (340g jar)
Headphones
Green Tea (25 tea bags)"""

user_query = "Could you please take off 4 pairs of headphons from my cart?"

# --- Format the prompt using the model's chat template ---
# The model was trained to see this structure.
prompt = f"<|user|>\n{catalog}\n\nUser:\n{user_query}\n<|assistant|>\n"

# --- Generate the output ---
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer)
outputs = pipe(
    prompt,
    max_new_tokens=50,       # Max length of the JSON output
    do_sample=False,         # Use deterministic output
    temperature=None,        # Not needed for do_sample=False
    top_p=None,              # Not needed for do_sample=False
    return_full_text=False   # Only return the generated part
)

# The output will be a clean JSON string
generated_json = outputs[0]['generated_text'].strip()
print(generated_json)
# Expected output:
# {"action": "remove", "product": "Headphones", "quantity": 4}
```

## Training Procedure

This model was fine-tuned using the `trl` library's `SFTTrainer`.

- **Method:** QLoRA (4-bit quantization with LoRA adapters)
- **Dataset:** A custom JSONL file with 100 `prompt`/`completion` pairs.
- **Configuration:** `completion_only_loss=True` was used to ensure the model only learned to generate the assistant's JSON response.