Evaluation: Fine-Tuned YOLO vs Zero-Shot YOLO for Undersea Object Detection
This evaluation compares the performance of a fine-tuned YOLO model against a zero-shot YOLO baseline on an undersea object detection task. The objective is to determine whether supervised fine-tuning on a domain-specific training set of sonar/underwater imagery improves detection metrics on a separate test set.
Dataset
- Training Dataset: 638 annotated undersea images with bounding boxes (7 classes: Fish, Jellyfish, Penguin, Puffin, Shark, Starfish, Stingray)
- Test Dataset: 63 unseen images from the same domain, manually annotated
Models
- Zero-Shot Model: Pretrained YOLOv11 model trained on MS COCO
- Fine-Tuned Model: YOLOv11 fine-tuned on the above undersea training dataset for 100 epochs
Evaluation Metrics
Zero Shot
Fine Tuned
Side By Side Comparison

Zero-Shot YOLOv11
Predictions without any fine-tuning

Fine-Tuned YOLOv11
Predictions after training on undersea dataset
Key Takeaways
- Fine-tuning YOLO on a domain-specific dataset significantly improves mAP, IoU, and detection accuracy from 40% accuracy to 80% accuracy.
- The compute overhead of fine-tuned inference is negligible (<0.5 ms/image).
- Small cost increase (~$0.002 per 1K images) is justified by a >30% improvement in detection quality.
Conclusion
Supervised fine-tuning of object detection models on domain-specific undersea imagery leads to substantial gains in performance over zero-shot baselines. For mission-critical applications involving underwater inspection, debris identification, or marine robotics, incorporating a fine-tuning step on annotated undersea image data is recommended.