Shoriful025
/

autonomous_drone_nav_vision

drone-navigation

Model card Files Files and versions

Shoriful025 commited on Jan 10

Commit

c688e8e

·

verified ·

1 Parent(s): 367ab27

Create README.md

Files changed (1) hide show

README.md +31 -0

README.md ADDED Viewed

	@@ -0,0 +1,31 @@

+---
+license: mit
+tags:
+- vision
+- robotics
+- drone-navigation
+- vit
+---
+# autonomous_drone_nav_vision
+## Overview
+A Vision Transformer (ViT) fine-tuned for tactical aerial navigation. This model enables Small Unmanned Aircraft Systems (sUAS) to classify environmental obstacles and identify safe landing zones in real-time using downward and forward-facing RGB cameras.
+## Model Architecture
+The model utilizes a **Vision Transformer (ViT-Base)** backbone:
+- **Patch Extraction**: Images are divided into $16 \times 16$ fixed-size patches.
+- **Position Embeddings**: Learnable spatial embeddings are added to the patch sequence to retain structural context.
+- **Attention Mechanism**: Global self-attention allows the model to correlate distant visual cues, such as horizon lines and ground markers.
+## Intended Use
+- **Obstacle Avoidance**: Integrated into flight control stacks for autonomous "sense and avoid" maneuvers.
+- **Precision Landing**: Identifying designated markers or flat terrain for autonomous recovery.
+- **Search and Rescue**: Preliminary screening of aerial footage to identify human-made structures or anomalies.
+## Limitations
+- **Low Light**: Performance degrades significantly in nighttime or heavy fog conditions without thermal input.
+- **Motion Blur**: Rapid yaw movements at high speeds may cause misclassification due to pixel streaking.
+- **Scale Invariance**: Small objects at extreme altitudes may be missed due to the fixed $224 \times 224$ input resolution.