--- license: apache-2.0 language: - en pipeline_tag: text-generation library_name: transformers tags: - tinyllama - json - intent-detection - qlora - gptq --- # TinyLlama-JSON-Intent (GPTQ 4-bit) This is a fine-tuned version of [`TinyLlama/TinyLlama-1.1B-Chat-v1.0`](https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0) that has been specifically trained to act as an e-commerce intent detection model. Given a catalog of products and a user's request, it outputs a structured JSON object representing the user's intent (`add` or `remove`), the `product` name, and the `quantity`. This version of the model is **quantized to 4-bit using GPTQ**, making it highly efficient for inference in terms of memory usage and speed. The QLoRA adapter was merged into the final GPTQ model—no separate adapter loading is required. - **Adapter Version:** [jtlicardo/tinyllama-ecommerce-intent-adapter](https://huggingface.co/jtlicardo/tinyllama-ecommerce-intent-adapter) ## Model Description The base model, TinyLlama-Chat, was fine-tuned using the QLoRA method on a synthetic dataset of 100 examples. The training objective was to teach the model to ignore conversational pleasantries and strictly output a JSON object that can be directly parsed by a backend system for managing a shopping cart. ## Intended Use & Limitations This model is designed for a specific task: parsing user requests in an e-commerce context. It should not be used as a general-purpose chatbot. - **Primary Use:** Backend service for intent detection from user text. - **Out-of-Scope:** General conversation, answering questions, or any task not related to adding/removing items from a list. ## How to Use The model expects a prompt formatted in a specific way, following the TinyLlama-Chat template. You must provide the `Catalog` and the `User` request. **Important:** You need to install `optimum` and `auto-gptq` to run this 4-bit GPTQ model. ```bash pip install -q optimum auto-gptq transformers ``` Here's how to run inference in Python: ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline # Model repository on the Hugging Face Hub model_id = "jtlicardo/tinyllama-ecommerce-intent-gptq" # Load the tokenizer and the 4-bit quantized model tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto", torch_dtype=torch.float16 # Recommended for inference ) # --- Define the prompt --- catalog = """Catalog: Shampoo (400ml bottle) Hand Soap (250ml dispenser) Peanut Butter (340g jar) Headphones Green Tea (25 tea bags)""" user_query = "Could you please take off 4 pairs of headphons from my cart?" # --- Format the prompt using the model's chat template --- # The model was trained to see this structure. prompt = f"<|user|>\n{catalog}\n\nUser:\n{user_query}\n<|assistant|>\n" # --- Generate the output --- pipe = pipeline("text-generation", model=model, tokenizer=tokenizer) outputs = pipe( prompt, max_new_tokens=50, # Max length of the JSON output do_sample=False, # Use deterministic output temperature=None, # Not needed for do_sample=False top_p=None, # Not needed for do_sample=False return_full_text=False # Only return the generated part ) # The output will be a clean JSON string generated_json = outputs[0]['generated_text'].strip() print(generated_json) # Expected output: # {"action": "remove", "product": "Headphones", "quantity": 4} ``` ## Training Procedure This model was fine-tuned using the `trl` library's `SFTTrainer`. - **Method:** QLoRA (4-bit quantization with LoRA adapters) - **Dataset:** A custom JSONL file with 100 `prompt`/`completion` pairs. - **Configuration:** `completion_only_loss=True` was used to ensure the model only learned to generate the assistant's JSON response.