Limitations with Existing Approaches for Object Detection in Field Environments

1. Military Aircraft Detection

Model Evaluated: DINO, Faster R-CNN, YOLOv8

Dataset: Military Aircraft Dataset

20,000+ images (Kaggle format)

2,907 training, 330 validation, 605 test (COCO format on Hugging Face)

20 aircraft classes (A1–A20), with significant class imbalance

Real-world surveillance and defense applications

Performance:

YOLOv8 (Fine-Tuned):

Precision: 0.2097
Recall: 0.1568
F1: 0.1794
mAP: 0.2037

Faster R-CNN (Fine-Tuned):

Precision: 0.0148
Recall: 0.0531
F1: 0.0231
mAP: 0.0034

DINO (Zero-Shot):

Poor detection confidence, even with prompt filtering

2. Satellite Object Detection with FAIR1M

Model Evaluated: DINO, Faster R-CNN, YOLOv8

Dataset: FAIR1M (COCO format and Kaggle)

15,000+ satellite images

1,732 image subset used for benchmarking

Includes cars, ships, aircraft, trucks (strong class imbalance)

Performance:

YOLOv8 (Fine-Tuned):

Precision: 0
Recall: 0
F1: 0
mAP: 0

Faster R-CNN (Fine-Tuned):

Precision: 0.0002
Recall: 0.0001
F1: 0.0002
mAP: 0.0025

DINO (Zero-Shot):

mAP: 0.0072; general bounding boxes but lacks fine-grained discrimination

3. Marine Debris Detection with TrashCan

Model Evaluated: DINO, Faster R-CNN, YOLOv8

Dataset: TrashCan

7,212 total underwater images

Classes: plastic, metal, fabric, marine life, ROV

Dominant class imbalance (e.g., ROV: 2,679 images vs. rubber: <10)

Performance:

YOLOv8 (Fine-Tuned):

Precision: 0
Recall: 0
F1: 0
mAP: 0

Faster R-CNN (Fine-Tuned):

Precision: 0.0064
Recall: 0.0229
F1: 0.01
mAP: 0.0032

DINO (Zero-Shot):

mAP: 0.0031; multiple false positives with broad prompts

Common Limitations Identified

Zero-shot models perform poorly on specialized datasets without aligned classes.

DINO suffers from attention dilution with multiple prompts.

Faster R-CNN struggles with fine-grained object detection in dense scenes.

YOLOv8, while strong on Military Aircraft, underperformed on satellite and marine datasets without further hyperparameter tuning.

No preprocessing was applied—raw, noisy images were used to test robustness.

Potential Improvements

Domain-Aligned Fine-Tuning: Tailor class labels and augmentations specific to environment (marine, aerial, satellite).
Synthetic Data Generation: Simulate rare classes using controlled data to balance class distribution.
Active Learning: Use uncertainty sampling to prioritize annotations that yield highest performance gains.