| | --- |
| | license: mit |
| | library_name: ultralytics |
| | tags: |
| | - object-detection |
| | - computer-vision |
| | - yolov10 |
| | - faster-rcnn |
| | - pytorch |
| | - autonomous-driving |
| | - hallucination-mitigation |
| | - out-of-distribution |
| | - ood-detection |
| | - proximal-ood |
| | - benchmark-analysis |
| | - bdd100k |
| | - pascal-voc |
| | pipeline_tag: object-detection |
| | datasets: |
| | - bdd100k |
| | - pascal-voc |
| | - openimages |
| | model-index: |
| | - name: m-hood-yolov10-bdd-finetuned |
| | results: |
| | - task: |
| | type: object-detection |
| | dataset: |
| | type: bdd100k |
| | name: BDD 100K |
| | metrics: |
| | - type: mAP@50-95 |
| | value: 0.34 |
| | - type: hallucination_reduction_near_ood |
| | name: Hallucination Reduction (Near-OoD) |
| | value: "79.5%" |
| | - name: m-hood-faster-rcnn-bdd-finetuned |
| | results: |
| | - task: |
| | type: object-detection |
| | dataset: |
| | type: bdd100k |
| | name: BDD 100K |
| | metrics: |
| | - type: mAP@50 |
| | value: 0.252 |
| | - type: hallucination_reduction_near_ood |
| | name: Hallucination Reduction (Near-OoD) |
| | value: "84.8%" |
| | --- |
| | |
| | # M-Hood: Models for Mitigating Hallucinations in Object Detection |
| |
|
| | [](https://arxiv.org/pdf/2503.07330) <!-- Replace with actual paper link when available --> |
| | [](https://gricad-gitlab.univ-grenoble-alpes.fr/dnn-safety/m-hood) |
| | [](https://huggingface.co/datasets/HugoHE/m-hood-dataset) |
| |
|
| | This repository contains the official models from the paper **"Mitigating Hallucinations in Object Detection Models: A Revisit to Out-of-Distribution Detection"**. It provides a collection of YOLOv10 and Faster R-CNN models, including both standard ("vanilla") baselines and our novel **fine-tuned models** designed to significantly reduce false positive detections on out-of-distribution (OoD) data. |
| |
|
| | Our work first identifies critical flaws in existing OoD benchmarks and then introduces a novel fine-tuning strategy that teaches detectors to be less overconfident. By training on specially curated **"proximal OoD"** samples, our models learn to suppress objectness scores for unfamiliar objects, leading to an **88% reduction in overall hallucination error** on the BDD-100K benchmark when combined with an OoD filter. |
| |
|
| | ## π― Key Features |
| |
|
| | - **Novel Fine-Tuning for Hallucination Mitigation**: Models specifically fine-tuned to reduce false positives on novel objects, enhancing safety and reliability. |
| | - **Vanilla vs. Fine-tuned Comparison**: Provides both baseline and improved models to clearly demonstrate the effectiveness of our approach. |
| | - **Dual Architecture Support**: Includes both the real-time **YOLOv10** and the high-accuracy **Faster R-CNN**. |
| | - **Multi-Dataset Scope**: Models trained on **BDD 100K** for autonomous driving and **Pascal VOC** for general object detection. |
| | - **Corrected Benchmark Datasets**: Accompanied by new, carefully curated OoD test sets (**Near-OoD**, **Far-OoD**) that address the flaws in previous benchmarks. |
| |
|
| | ## π¬ The M-Hood Approach: How It Works |
| |
|
| | Object detectors often "hallucinate" and produce high-confidence false predictions when faced with novel objects not seen during training. Our research tackles this in two ways: |
| |
|
| | 1. **Benchmarking Revisited**: We found existing OoD test sets were flawed, with ~13% of supposedly "OoD" images containing in-distribution (ID) objects. This leads to inaccurate performance evaluation. We created new, clean OoD benchmarks for reliable assessment. |
| |
|
| | 2. **Proximal OoD Fine-Tuning**: Our core contribution is a fine-tuning strategy that makes the detector itself more robust. |
| | - We create a dataset of **"proximal OoD"** samplesβobjects that are semantically similar to the training classes but are not part of them (e.g., using 'deer' as a proximal OoD sample for a model trained on 'horse' and 'cow'). |
| | - We fine-tune the original models on a combined dataset of original ID data and these new proximal OoD samples. |
| | - During this process, the proximal OoD samples are treated as **background**, effectively teaching the model to suppress its predictions for these and other similar novel objects. |
| |
|
| | The result is a model that is inherently more conservative and less likely to hallucinate, without significantly compromising its performance on the original task. |
| |
|
| | ## π Performance Highlights |
| |
|
| | Our fine-tuning method drastically reduces the number of hallucinated detections on OoD datasets. The tables below show the number of false predictions (lower is better) on our **Near-OoD** benchmark. |
| |
|
| | #### YOLOv10 on BDD-100K (Near-OoD Hallucination Counts) |
| |
|
| | | Model Configuration | Hallucination Count | Reduction | |
| | |---------------------|---------------------|-----------| |
| | | Original (Vanilla) | 708 | - | |
| | | **Ours (Fine-tuned)** | **145** | **-79.5%**| |
| | | Original + KNN Filter | 297 | -58.1% | |
| | | **Ours + KNN Filter** | **78** | **-89.0%**| |
| |
|
| | #### Faster R-CNN on BDD-100K (Near-OoD Hallucination Counts) |
| |
|
| | | Model Configuration | Hallucination Count | Reduction | |
| | |---------------------|---------------------|-----------| |
| | | Original (Vanilla) | 2,595 | - | |
| | | **Ours (Fine-tuned)** | **395** | **-84.8%**| |
| | | Original + KNN Filter | 1,272 | -51.0% | |
| | | **Ours + KNN Filter** | **270** | **-89.6%**| |
| |
|
| | ## ποΈ Model Collection |
| |
|
| | ### YOLOv10 Models |
| |
|
| | | Model | Dataset | Training Type | Size | Description | Download | |
| | |-------|---------|---------------|------|-------------|----------| |
| | | **yolov10-bdd-vanilla.pt** | BDD 100K | Vanilla | 62MB | Baseline real-time model for autonomous driving. | [Download](./yolov10-bdd-vanilla.pt) | |
| | | **yolov10-bdd-finetune.pt** | BDD 100K | **Fine-tuned** | 62MB | **Robust** model with reduced OoD hallucinations. | [Download](./yolov10-bdd-finetune.pt) | |
| | | **yolov10-voc-vanilla.pt** | Pascal VOC | Vanilla | 63MB | Baseline model for general purpose object detection. | [Download](./yolov10-voc-vanilla.pt) | |
| | | **yolov10-voc-finetune.pt** | Pascal VOC | **Fine-tuned** | 94MB | **Robust** general purpose model for OoD scenarios. | [Download](./yolov10-voc-finetune.pt) | |
| |
|
| | ### Faster R-CNN Models |
| |
|
| | | Model | Dataset | Training Type | Size | Description | Download | |
| | |-------|---------|---------------|------|-------------|----------| |
| | | **faster-rcnn-bdd-vanilla.pth** | BDD 100K | Vanilla | 315MB | High-accuracy baseline for autonomous driving. | [Download](./faster-rcnn-bdd-vanilla.pth) | |
| | | **faster-rcnn-bdd-finetune.pth**| BDD 100K | **Fine-tuned** | 158MB | **Robust** high-accuracy model for OoD scenarios. | [Download](./faster-rcnn-bdd-finetune.pth) | |
| | | **faster-rcnn-voc-vanilla.pth** | Pascal VOC | Vanilla | 315MB | High-accuracy baseline for general object detection. | [Download](./faster-rcnn-voc-vanilla.pth) | |
| | | **faster-rcnn-voc-finetune.pth**| Pascal VOC | **Fine-tuned** | 158MB | **Robust** high-accuracy general purpose model. | [Download](./faster-rcnn-voc-finetune.pth) | |
| |
|
| | *(Note: The KITTI models mentioned in the old README are not explicitly detailed in the provided paper text, so I have focused on BDD and VOC, which are central to the experiments shown.)* |
| |
|
| | ## π Quick Start |
| |
|
| | ### YOLOv10 Usage |
| |
|
| | ```python |
| | from ultralytics import YOLO |
| | |
| | # Load our robust, fine-tuned YOLOv10 model |
| | model = YOLO('yolov10-bdd-finetune.pt') |
| | |
| | # Run inference |
| | results = model('path/to/your/image.jpg') |
| | |
| | # Process results |
| | for result in results: |
| | boxes = result.boxes.xyxy # bounding boxes |
| | scores = result.boxes.conf # confidence scores |
| | classes = result.boxes.cls # class predictions |
| | ``` |
| |
|
| | ### Faster R-CNN Usage |
| |
|
| | ```python |
| | import torch |
| | import torchvision |
| | from torchvision.models.detection import fasterrcnn_resnet50_fpn |
| | |
| | # NOTE: The provided .pth files are state_dicts. |
| | # You need to load them into a model instance. |
| | # Example for a vanilla VOC model: |
| | num_classes = 21 # 20 classes + background |
| | model = fasterrcnn_resnet50_fpn(weights=None, num_classes=num_classes) |
| | model.load_state_dict(torch.load('faster-rcnn-voc-vanilla.pth')) |
| | model.eval() |
| | |
| | # Run inference on a pre-processed image tensor |
| | with torch.no_grad(): |
| | predictions = model(image_tensor) |
| | |
| | # Process results |
| | boxes = predictions[0]['boxes'] |
| | scores = predictions[0]['scores'] |
| | labels = predictions[0]['labels'] |
| | ``` |
| |
|
| | ## π Citation |
| |
|
| | If you use our models, datasets, or methodology in your research, please cite our papers. |
| |
|
| | For the IROS 2025 conference version, which primarily focuses on YOLO models and represents an earlier conference publication, please cite: |
| | ``` |
| | @inproceedings{he2025mitigating, |
| | title={Mitigating Hallucinations in YOLO-based Object Detection Models: A Revisit to Out-of-Distribution Detection}, |
| | author={Weicheng He and Changshun Wu and Chih-Hong Cheng and Xiaowei Huang and Saddek Bensalem}, |
| | booktitle={Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)}, |
| | year={2025}, |
| | note={Accepted to IROS 2025}, |
| | eprint={2503.07330v2}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CV}, |
| | url={https://arxiv.org/abs/2503.07330v2} |
| | } |
| | ``` |
| | For the journal version, which expands the methodology to Faster-RCNN and RT-DETR, includes an automated data curation pipeline, and provides an in-depth analysis of the approach, please cite: |
| | ``` |
| | @inproceedings{wu2025revisiting, |
| | title={Revisiting Out-of-Distribution Detection in Real-time Object Detection: From Benchmark Pitfalls to a New Mitigation Paradigm}, |
| | author={Changshun Wu and Weicheng He and Chih-Hong Cheng and Xiaowei Huang and Saddek Bensalem}, |
| | year={2025}, |
| | eprint={2503.07330v3}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CV}, |
| | url={https://arxiv.org/abs/2503.07330v3}, |
| | } |
| | ``` |
| |
|
| | Please also consider citing the original works for the model architectures and datasets used. |
| |
|
| | ## π License |
| |
|
| | This work is released under the MIT License. |