YOLOv5s TFLite (Qualcomm Edge AI Deployment Package)

This repository provides multiple YOLOv5s TensorFlow Lite models optimized for edge deployment scenarios, especially for:

Qualcomm QCS8550 platforms
Android APK inference pipelines
Embedded Linux (Wayland / GStreamer)
Multi-stream video analytics systems

These models are suitable for real-time object detection with hardware acceleration using Qualcomm QNN (Hexagon NPU), GPU delegate, or CPU fallback.

Available Model Variants

File Precision Input Size Recommended Usage

yolov5s_fp32_320.tflite FP32 320×320 Fast testing / compatibility mode

yolov5s_fp32_640.tflite FP32 640×640 Higher accuracy baseline

yolov5s_int8_320.tflite INT8 320×320 Maximum performance edge inference

yolov5s_int8_640.tflite INT8 640×640 Balanced accuracy + performance

Choosing FP32 vs INT8

Use FP32 models when:

debugging pipelines
validating inference correctness
running CPU-only environments
comparing baseline accuracy

Use INT8 models when:

deploying on Qualcomm NPU
running real-time multi-stream pipelines
optimizing latency and power efficiency
building production edge AI systems

INT8 is typically recommended for Qualcomm Hexagon acceleration workflows.

Deployment Pipeline Example

Typical real-time inference architecture:

Camera / RTSP / Video File
→ Resize (320×320 or 640×640)
→ Normalize
→ TFLite Interpreter
→ NMS Post-processing
→ Overlay Rendering (TextureView / SurfaceView)

Supported delegate priority:

QNN (Hexagon NPU) → GPU → CPU

Deployment Environments Tested

Designed for integration into:

Android Edge AI Applications
Qualcomm QCS8550 Platforms
Embedded Linux (Wayland display pipelines)
Multi-stream video inference schedulers
TextureView / SurfaceView overlay rendering architectures

Example Python Inference

import tensorflow as tf

interpreter = tf.lite.Interpreter(model_path="yolov5s_int8_640.tflite")
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

print(input_details)
print(output_details)

Example Android Delegate Configuration

Interpreter.Options options = new Interpreter.Options();

options.addDelegate(qnnDelegate);   // Preferred
options.addDelegate(gpuDelegate);   // Fallback

Recommended runtime order:

NPU → GPU → CPU

Repository Contents

yolov5s_fp32_320.tflite
yolov5s_fp32_640.tflite
yolov5s_int8_320.tflite
yolov5s_int8_640.tflite
yolov5s.labels
README.md

Intended Use Cases

Real-time edge object detection
Android APK AI inference pipelines
Qualcomm NPU acceleration demonstrations
Multi-stream surveillance analytics
Embedded AI deployment validation workflows

Author

Prepared by

Andrew Chiao
Edge AI Engineer
SYSGRATION Ltd. -- AMD Team / RD2

Notes

This repository focuses on deployment-ready TensorFlow Lite object detection models designed for embedded inference scenarios rather than notebook-only experimentation workflows.

Downloads last month: 44

Model tree for anan19990108/yolov5s_tflite

Base model

Ultralytics/YOLOv5

Finetuned

(17)

this model