Search before asking
Bug
Describe the bug
The current implementation of MeanAverageRecall computes mAR@K by selecting the top-K predictions across all images in the dataset, rather than selecting the top-K predictions per image. According to the COCO evaluation protocol, mAR@K should be calculated by considering the top-K highest-confidence detections for each image.
Average Recall (AR): ARmax=K AR given K detections per image
This issue occurs because, in the concatenation step below, all detection results are merged together without keeping track of which image each detection came from. As a result, the subsequent selection of top-K predictions is performed globally across the entire dataset, rather than per image.
|
concatenated_stats = [np.concatenate(items, 0) for items in zip(*stats)] |
|
recall_scores_per_k, recall_per_class, unique_classes = ( |
|
self._compute_average_recall_for_classes(*concatenated_stats) |
|
) |
Proposed Solution
To address this issue, I have modified the _compute and _compute_average_recall_for_classes functions so that only the top-K detections per image are considered when calculating mAR@K, in accordance with the COCO evaluation protocol.
In both functions, instead of simply concatenating all detections, I've modified the process to first filter the statistics by confidence score before concatenating them and calculating the confusion matrix. I will submit a pull request with these changes shortly.
Environment
- Supervision 0.26.1
- OS: Ubuntu 24.04
- Python: 3.12.3
Minimal Reproducible Example
No response
Additional
No response
Are you willing to submit a PR?
Search before asking
Bug
Describe the bug
The current implementation of MeanAverageRecall computes mAR@K by selecting the top-K predictions across all images in the dataset, rather than selecting the top-K predictions per image. According to the COCO evaluation protocol, mAR@K should be calculated by considering the top-K highest-confidence detections for each image.
This issue occurs because, in the concatenation step below, all detection results are merged together without keeping track of which image each detection came from. As a result, the subsequent selection of top-K predictions is performed globally across the entire dataset, rather than per image.
supervision/supervision/metrics/mean_average_recall.py
Lines 222 to 225 in deb1c9c
Proposed Solution
To address this issue, I have modified the
_computeand_compute_average_recall_for_classesfunctions so that only the top-K detections per image are considered when calculating mAR@K, in accordance with the COCO evaluation protocol.In both functions, instead of simply concatenating all detections, I've modified the process to first filter the statistics by confidence score before concatenating them and calculating the confusion matrix. I will submit a pull request with these changes shortly.
Environment
Minimal Reproducible Example
No response
Additional
No response
Are you willing to submit a PR?