---
license: gemma
language:
- en
pipeline_tag: text-generation
tags:
- litert
- litert-lm
- gemma
- agent
- tool-calling
- function-calling
- multimodal
- on-device
library_name: litert-lm
---

# Agent Gemma 3n E2B - Tool Calling Edition

A specialized version of **Gemma 3n E2B** optimized for **on-device tool/function calling** with LiteRT-LM. While Google's standard LiteRT-LM models focus on general text generation, this model is specifically designed for agentic workflows with advanced tool calling capabilities.

## Why This Model?

Google's official LiteRT-LM releases provide excellent on-device inference but don't include built-in tool calling support. This model bridges that gap by:

- ✅ **Native tool/function calling** via Jinja templates
- ✅ **Multimodal support** (text, vision, audio)
- ✅ **On-device optimized** - No cloud API required
- ✅ **INT4 quantized** - Efficient memory usage
- ✅ **Production ready** - Tested and validated

Perfect for building AI agents that need to interact with external tools, APIs, or functions while running completely on-device.

## Model Details

- **Base Model**: Gemma 3n E2B
- **Format**: LiteRT-LM v1.4.0
- **Quantization**: INT4
- **Size**: ~3.2GB
- **Tokenizer**: SentencePiece
- **Capabilities**:
  - Advanced tool/function calling
  - Multi-turn conversations with tool interactions
  - Vision processing (images)
  - Audio processing
  - Streaming responses

## Tool Calling Example

The model uses a sophisticated Jinja template that supports OpenAI-style function calling:

```python
from litert_lm import Engine, Conversation

# Load the model
engine = Engine.create("gemma-3n-E2B-it-agent-fixed.litertlm", backend="cpu")
conversation = Conversation.create(engine)

# Define tools the model can use
tools = [
    {
        "name": "get_weather",
        "description": "Get current weather for a location",
        "parameters": {
            "type": "object",
            "properties": {
                "location": {"type": "string", "description": "City name"},
                "unit": {"type": "string", "enum": ["celsius", "fahrenheit"]}
            },
            "required": ["location"]
        }
    },
    {
        "name": "search_web",
        "description": "Search the internet for information",
        "parameters": {
            "type": "object",
            "properties": {
                "query": {"type": "string", "description": "Search query"}
            },
            "required": ["query"]
        }
    }
]

# Have a conversation with tool calling
message = {
    "role": "user",
    "content": "What's the weather in San Francisco and latest news about AI?"
}

response = conversation.send_message(message, tools=tools)
print(response)
```

### Example Output

The model will generate structured tool calls:

```
<start_function_call>call:get_weather{location:San Francisco,unit:celsius}<end_function_call>
<start_function_call>call:search_web{query:latest AI news}<end_function_call>
<start_function_response>
```

You then execute the functions and send back results:

```python
# Execute tools (your implementation)
weather = get_weather("San Francisco", "celsius")
news = search_web("latest AI news")

# Send tool responses back
tool_response = {
    "role": "tool",
    "content": [
        {
            "name": "get_weather",
            "response": {"temperature": 18, "condition": "partly cloudy"}
        },
        {
            "name": "search_web",
            "response": {"results": ["OpenAI releases GPT-5...", "..."]}
        }
    ]
}

final_response = conversation.send_message(tool_response)
print(final_response)
# "The weather in San Francisco is 18°C and partly cloudy.
#  In AI news, OpenAI has released GPT-5..."
```

## Advanced Features

### Multi-Modal Tool Calling

Combine vision, audio, and tool calling:

```python
message = {
    "role": "user",
    "content": [
        {"type": "image", "data": image_bytes},
        {"type": "text", "text": "What's in this image? Search for more info about it."}
    ]
}

response = conversation.send_message(message, tools=[search_tool])
# Model can see the image AND call search functions
```

### Streaming Tool Calls

Get tool calls as they're generated:

```python
def on_token(token):
    if "<start_function_call>" in token:
        print("Tool being called...")
    print(token, end="", flush=True)

conversation.send_message_async(message, tools=tools, callback=on_token)
```

### Nested Tool Execution

The model can chain tool calls:

```python
# User: "Book me a flight to Tokyo and reserve a hotel"
# Model: calls check_flights() → calls book_hotel() → confirms both
```

## Performance

Benchmarked on CPU (no GPU acceleration):

- **Prefill Speed**: 21.20 tokens/sec
- **Decode Speed**: 11.44 tokens/sec
- **Time to First Token**: ~1.6s
- **Cold Start**: ~4.7s
- **Tool Call Latency**: ~100-200ms additional

GPU acceleration provides 3-5x speedup on supported hardware.

## Installation & Usage

### Requirements

1. **LiteRT-LM Runtime** - Build from source:
   ```bash
   git clone https://github.com/google-ai-edge/LiteRT.git
   cd LiteRT/LiteRT-LM
   bazel build -c opt //runtime/engine:litert_lm_main
   ```

2. **Supported Platforms**: Linux (clang), macOS, Android

### Quick Start

```bash
# Download model
wget https://huggingface.co/kontextdev/agent-gemma/resolve/main/gemma-3n-E2B-it-agent-fixed.litertlm

# Run with simple prompt
./bazel-bin/runtime/engine/litert_lm_main \
  --model_path=gemma-3n-E2B-it-agent-fixed.litertlm \
  --backend=cpu \
  --input_prompt="Hello, I need help with some tasks"

# Run with GPU (if available)
./bazel-bin/runtime/engine/litert_lm_main \
  --model_path=gemma-3n-E2B-it-agent-fixed.litertlm \
  --backend=gpu \
  --input_prompt="What can you help me with?"
```

### Python API (Recommended)

```python
from litert_lm import Engine, Conversation, SessionConfig

# Initialize
engine = Engine.create("gemma-3n-E2B-it-agent-fixed.litertlm", backend="gpu")

# Configure session
config = SessionConfig(
    max_tokens=2048,
    temperature=0.7,
    top_p=0.9
)

# Start conversation
conversation = Conversation.create(engine, config)

# Define your tools
tools = [...]  # Your function definitions

# Chat with tool calling
while True:
    user_input = input("You: ")
    response = conversation.send_message(
        {"role": "user", "content": user_input},
        tools=tools
    )

    # Handle tool calls if present
    if has_tool_calls(response):
        results = execute_tools(extract_calls(response))
        response = conversation.send_message({
            "role": "tool",
            "content": results
        })

    print(f"Agent: {response['content']}")
```

## Tool Call Format

The model uses this format for tool interactions:

**Function Declaration** (system/developer role):
```
<start_of_turn>developer
<start_function_declaration>
{
  "name": "function_name",
  "description": "What it does",
  "parameters": {...}
}
<end_function_declaration>
<end_of_turn>
```

**Function Call** (assistant):
```
<start_function_call>call:function_name{arg1:value1,arg2:value2}<end_function_call>
```

**Function Response** (tool role):
```
<start_function_response>response:function_name{result:value}<end_function_response>
```

## Use Cases

### Personal AI Assistant
- Calendar management
- Email sending
- Web searching
- File operations

### IoT & Smart Home
- Device control
- Sensor monitoring
- Automation workflows
- Voice commands

### Development Tools
- Code generation with API calls
- Database queries
- Deployment automation
- Testing & debugging

### Business Applications
- CRM integration
- Data analysis
- Report generation
- Customer support

## Model Architecture

Built on Gemma 3n E2B with 9 optimized components:

```
Section 0: LlmMetadata (Agent Jinja template)
Section 1: SentencePiece Tokenizer
Section 2: TFLite Embedder
Section 3: TFLite Per-Layer Embedder
Section 4: TFLite Audio Encoder (HW accelerated)
Section 5: TFLite End-of-Audio Detector
Section 6: TFLite Vision Adapter
Section 7: TFLite Vision Encoder
Section 8: TFLite Prefill/Decode (INT4)
```

All components are optimized for on-device inference with hardware acceleration support.

## Comparison

| Feature | Standard Gemma LiteRT-LM | This Model |
|---------|-------------------------|------------|
| Text Generation | ✅ | ✅ |
| Tool Calling | ❌ | ✅ |
| Multimodal | ✅ | ✅ |
| Streaming | ✅ | ✅ |
| On-Device | ✅ | ✅ |
| Jinja Templates | Basic | Advanced Agent Template |
| INT4 Quantization | ✅ | ✅ |

## Limitations

- **Tool Execution**: The model generates tool calls but doesn't execute them - you need to implement the actual functions
- **Context Window**: Limited to 4096 tokens (configurable)
- **Streaming Tool Calls**: Partial tool calls may need buffering
- **Hardware Requirements**: Minimum 4GB RAM recommended
- **No Native GPU on CPU-only systems**: Falls back to CPU inference

## Tips for Best Results

1. **Clear Tool Descriptions**: Provide detailed function descriptions
2. **Schema Validation**: Validate tool call arguments before execution
3. **Error Handling**: Handle malformed tool calls gracefully
4. **Context Management**: Keep conversation history concise
5. **Temperature**: Use 0.7-0.9 for creative tasks, 0.3-0.5 for precise tool calls
6. **Batching**: Process multiple tool calls in parallel when possible

## License

This model inherits the [Gemma license](https://ai.google.dev/gemma/terms) from the base model.

## Citation

```bibtex
@misc{agent-gemma-litertlm,
  title={Agent Gemma 3n E2B - Tool Calling Edition},
  author={kontextdev},
  year={2025},
  publisher={HuggingFace},
  howpublished={\url{https://huggingface.co/kontextdev/agent-gemma}}
}
```

## Links

- [LiteRT-LM GitHub](https://github.com/google-ai-edge/LiteRT/tree/main/LiteRT-LM)
- [Gemma Model Family](https://ai.google.dev/gemma)
- [LiteRT Documentation](https://ai.google.dev/edge/litert)
- [Tool Calling Guide](https://ai.google.dev/gemma/docs/function-calling)

## Support

For issues or questions:
- Open an issue on [GitHub](https://github.com/google-ai-edge/LiteRT/issues)
- Check the [LiteRT-LM docs](https://ai.google.dev/edge/litert/inference)
- Community forum: [Google AI Edge](https://discuss.ai.google.dev/)

---

Built with ❤️ for the on-device AI community