Evaluation: Fine-Tuned YOLO vs Zero-Shot YOLO for Undersea Object Detection

This evaluation compares the performance of a fine-tuned YOLO model against a zero-shot YOLO baseline on an undersea object detection task. The objective is to determine whether supervised fine-tuning on a domain-specific training set of sonar/underwater imagery improves detection metrics on a separate test set.

Dataset

Training Dataset: 638 annotated undersea images with bounding boxes (7 classes: Fish, Jellyfish, Penguin, Puffin, Shark, Starfish, Stingray)
Test Dataset: 63 unseen images from the same domain, manually annotated

Models

Zero-Shot Model: Pretrained YOLOv11 model trained on MS COCO
Fine-Tuned Model: YOLOv11 fine-tuned on the above undersea training dataset for 100 epochs

Evaluation Metrics

Zero Shot

tasks tasks

Fine Tuned

tasks tasks

tasks tasks tasks

Side By Side Comparison

Zero-Shot YOLOv11
Predictions without any fine-tuning

Fine-Tuned YOLOv11
Predictions after training on undersea dataset

Key Takeaways

Fine-tuning YOLO on a domain-specific dataset significantly improves mAP, IoU, and detection accuracy from 40% accuracy to 80% accuracy.
The compute overhead of fine-tuned inference is negligible (<0.5 ms/image).
Small cost increase (~$0.002 per 1K images) is justified by a >30% improvement in detection quality.

Conclusion

Supervised fine-tuning of object detection models on domain-specific undersea imagery leads to substantial gains in performance over zero-shot baselines. For mission-critical applications involving underwater inspection, debris identification, or marine robotics, incorporating a fine-tuning step on annotated undersea image data is recommended.