Shoriful025 commited on
Commit
c688e8e
·
verified ·
1 Parent(s): 367ab27

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +31 -0
README.md ADDED
@@ -0,0 +1,31 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: mit
3
+ tags:
4
+ - vision
5
+ - robotics
6
+ - drone-navigation
7
+ - vit
8
+ ---
9
+
10
+ # autonomous_drone_nav_vision
11
+
12
+ ## Overview
13
+ A Vision Transformer (ViT) fine-tuned for tactical aerial navigation. This model enables Small Unmanned Aircraft Systems (sUAS) to classify environmental obstacles and identify safe landing zones in real-time using downward and forward-facing RGB cameras.
14
+
15
+
16
+
17
+ ## Model Architecture
18
+ The model utilizes a **Vision Transformer (ViT-Base)** backbone:
19
+ - **Patch Extraction**: Images are divided into $16 \times 16$ fixed-size patches.
20
+ - **Position Embeddings**: Learnable spatial embeddings are added to the patch sequence to retain structural context.
21
+ - **Attention Mechanism**: Global self-attention allows the model to correlate distant visual cues, such as horizon lines and ground markers.
22
+
23
+ ## Intended Use
24
+ - **Obstacle Avoidance**: Integrated into flight control stacks for autonomous "sense and avoid" maneuvers.
25
+ - **Precision Landing**: Identifying designated markers or flat terrain for autonomous recovery.
26
+ - **Search and Rescue**: Preliminary screening of aerial footage to identify human-made structures or anomalies.
27
+
28
+ ## Limitations
29
+ - **Low Light**: Performance degrades significantly in nighttime or heavy fog conditions without thermal input.
30
+ - **Motion Blur**: Rapid yaw movements at high speeds may cause misclassification due to pixel streaking.
31
+ - **Scale Invariance**: Small objects at extreme altitudes may be missed due to the fixed $224 \times 224$ input resolution.