Limitations with Existing Approaches for Object Detection in Field Environments
1. Military Aircraft Detection
Model Evaluated: DINO, Faster R-CNN, YOLOv8
Dataset: Military Aircraft Dataset
- 20,000+ images (Kaggle format)
- 2,907 training, 330 validation, 605 test (COCO format on Hugging Face)
- 20 aircraft classes (A1–A20), with significant class imbalance
- Real-world surveillance and defense applications
Performance:
YOLOv8 (Fine-Tuned):
-
Precision: 0.2097
-
Recall: 0.1568
-
F1: 0.1794
-
mAP: 0.2037
Faster R-CNN (Fine-Tuned):
-
Precision: 0.0148
-
Recall: 0.0531
-
F1: 0.0231
-
mAP: 0.0034
DINO (Zero-Shot):
- Poor detection confidence, even with prompt filtering
2. Satellite Object Detection with FAIR1M
Model Evaluated: DINO, Faster R-CNN, YOLOv8
Dataset: FAIR1M (COCO format and Kaggle)
- 15,000+ satellite images
- 1,732 image subset used for benchmarking
- Includes cars, ships, aircraft, trucks (strong class imbalance)
Performance:
YOLOv8 (Fine-Tuned):
-
Precision: 0
-
Recall: 0
-
F1: 0
-
mAP: 0
Faster R-CNN (Fine-Tuned):
-
Precision: 0.0002
-
Recall: 0.0001
-
F1: 0.0002
-
mAP: 0.0025
DINO (Zero-Shot):
- mAP: 0.0072; general bounding boxes but lacks fine-grained discrimination
3. Marine Debris Detection with TrashCan
Model Evaluated: DINO, Faster R-CNN, YOLOv8
Dataset: TrashCan
- 7,212 total underwater images
- Classes: plastic, metal, fabric, marine life, ROV
- Dominant class imbalance (e.g., ROV: 2,679 images vs. rubber: <10)
Performance:
YOLOv8 (Fine-Tuned):
-
Precision: 0
-
Recall: 0
-
F1: 0
-
mAP: 0
Faster R-CNN (Fine-Tuned):
-
Precision: 0.0064
-
Recall: 0.0229
-
F1: 0.01
-
mAP: 0.0032
DINO (Zero-Shot):
- mAP: 0.0031; multiple false positives with broad prompts
Common Limitations Identified
- Zero-shot models perform poorly on specialized datasets without aligned classes.
- DINO suffers from attention dilution with multiple prompts.
- Faster R-CNN struggles with fine-grained object detection in dense scenes.
- YOLOv8, while strong on Military Aircraft, underperformed on satellite and marine datasets without further hyperparameter tuning.
- No preprocessing was applied—raw, noisy images were used to test robustness.
Potential Improvements
- Domain-Aligned Fine-Tuning: Tailor class labels and augmentations specific to environment (marine, aerial, satellite).
- Synthetic Data Generation: Simulate rare classes using controlled data to balance class distribution.
- Active Learning: Use uncertainty sampling to prioritize annotations that yield highest performance gains.