Skip to content

Limitations with Existing Approaches for Object Detection in Field Environments


1. Military Aircraft Detection

Model Evaluated: DINO, Faster R-CNN, YOLOv8

Dataset: Military Aircraft Dataset

  • 20,000+ images (Kaggle format)
  • 2,907 training, 330 validation, 605 test (COCO format on Hugging Face)
  • 20 aircraft classes (A1–A20), with significant class imbalance
  • Real-world surveillance and defense applications

Performance:

YOLOv8 (Fine-Tuned):

  • Precision: 0.2097

  • Recall: 0.1568

  • F1: 0.1794

  • mAP: 0.2037

Faster R-CNN (Fine-Tuned):

  • Precision: 0.0148

  • Recall: 0.0531

  • F1: 0.0231

  • mAP: 0.0034

DINO (Zero-Shot):

  • Poor detection confidence, even with prompt filtering

2. Satellite Object Detection with FAIR1M

Model Evaluated: DINO, Faster R-CNN, YOLOv8

Dataset: FAIR1M (COCO format and Kaggle)

  • 15,000+ satellite images
  • 1,732 image subset used for benchmarking
  • Includes cars, ships, aircraft, trucks (strong class imbalance)

Performance:

YOLOv8 (Fine-Tuned):

  • Precision: 0

  • Recall: 0

  • F1: 0

  • mAP: 0

Faster R-CNN (Fine-Tuned):

  • Precision: 0.0002

  • Recall: 0.0001

  • F1: 0.0002

  • mAP: 0.0025

DINO (Zero-Shot):

  • mAP: 0.0072; general bounding boxes but lacks fine-grained discrimination

3. Marine Debris Detection with TrashCan

Model Evaluated: DINO, Faster R-CNN, YOLOv8

Dataset: TrashCan

  • 7,212 total underwater images
  • Classes: plastic, metal, fabric, marine life, ROV
  • Dominant class imbalance (e.g., ROV: 2,679 images vs. rubber: <10)

Performance:

YOLOv8 (Fine-Tuned):

  • Precision: 0

  • Recall: 0

  • F1: 0

  • mAP: 0

Faster R-CNN (Fine-Tuned):

  • Precision: 0.0064

  • Recall: 0.0229

  • F1: 0.01

  • mAP: 0.0032

DINO (Zero-Shot):

  • mAP: 0.0031; multiple false positives with broad prompts

Common Limitations Identified

  • Zero-shot models perform poorly on specialized datasets without aligned classes.
  • DINO suffers from attention dilution with multiple prompts.
  • Faster R-CNN struggles with fine-grained object detection in dense scenes.
  • YOLOv8, while strong on Military Aircraft, underperformed on satellite and marine datasets without further hyperparameter tuning.
  • No preprocessing was applied—raw, noisy images were used to test robustness.

Potential Improvements

  1. Domain-Aligned Fine-Tuning: Tailor class labels and augmentations specific to environment (marine, aerial, satellite).
  2. Synthetic Data Generation: Simulate rare classes using controlled data to balance class distribution.
  3. Active Learning: Use uncertainty sampling to prioritize annotations that yield highest performance gains.