YOLOv5s TFLite (Qualcomm Edge AI Deployment Package)

This repository provides multiple YOLOv5s TensorFlow Lite models optimized for edge deployment scenarios, especially for:

  • Qualcomm QCS8550 platforms
  • Android APK inference pipelines
  • Embedded Linux (Wayland / GStreamer)
  • Multi-stream video analytics systems

These models are suitable for real-time object detection with hardware acceleration using Qualcomm QNN (Hexagon NPU), GPU delegate, or CPU fallback.


Available Model Variants


File Precision Input Size Recommended Usage


yolov5s_fp32_320.tflite FP32 320Γ—320 Fast testing / compatibility mode

yolov5s_fp32_640.tflite FP32 640Γ—640 Higher accuracy baseline

yolov5s_int8_320.tflite INT8 320Γ—320 Maximum performance edge inference

yolov5s_int8_640.tflite INT8 640Γ—640 Balanced accuracy + performance


Choosing FP32 vs INT8

Use FP32 models when:

  • debugging pipelines
  • validating inference correctness
  • running CPU-only environments
  • comparing baseline accuracy

Use INT8 models when:

  • deploying on Qualcomm NPU
  • running real-time multi-stream pipelines
  • optimizing latency and power efficiency
  • building production edge AI systems

INT8 is typically recommended for Qualcomm Hexagon acceleration workflows.


Deployment Pipeline Example

Typical real-time inference architecture:

Camera / RTSP / Video File
β†’ Resize (320Γ—320 or 640Γ—640)
β†’ Normalize
β†’ TFLite Interpreter
β†’ NMS Post-processing
β†’ Overlay Rendering (TextureView / SurfaceView)

Supported delegate priority:

QNN (Hexagon NPU) β†’ GPU β†’ CPU


Deployment Environments Tested

Designed for integration into:

Android Edge AI Applications
Qualcomm QCS8550 Platforms
Embedded Linux (Wayland display pipelines)
Multi-stream video inference schedulers
TextureView / SurfaceView overlay rendering architectures


Example Python Inference

import tensorflow as tf

interpreter = tf.lite.Interpreter(model_path="yolov5s_int8_640.tflite")
interpreter.allocate_tensors()

input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()

print(input_details)
print(output_details)

Example Android Delegate Configuration

Interpreter.Options options = new Interpreter.Options();

options.addDelegate(qnnDelegate);   // Preferred
options.addDelegate(gpuDelegate);   // Fallback

Recommended runtime order:

NPU β†’ GPU β†’ CPU


Repository Contents

yolov5s_fp32_320.tflite
yolov5s_fp32_640.tflite
yolov5s_int8_320.tflite
yolov5s_int8_640.tflite
yolov5s.labels
README.md


Intended Use Cases

Real-time edge object detection
Android APK AI inference pipelines
Qualcomm NPU acceleration demonstrations
Multi-stream surveillance analytics
Embedded AI deployment validation workflows


Author

Prepared by

Andrew Chiao
Edge AI Engineer
SYSGRATION Ltd. -- AMD Team / RD2


Notes

This repository focuses on deployment-ready TensorFlow Lite object detection models designed for embedded inference scenarios rather than notebook-only experimentation workflows.

Downloads last month
44
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for anan19990108/yolov5s_tflite

Finetuned
(17)
this model