Metrics reloaded: recommendations for image analysis validation

Warning

This publication doesn't include Faculty of Arts. It includes Faculty of Informatics. Official publication website can be found on muni.cz.

Authors	MAIER-HEIN Lena REINKE Annika GODAU Patrick TIZABI Minu D BUETTNER Florian CHRISTODOULOU Evangelia GLOCKER Ben ISENSEE Fabian KLEESIEK Jens KOZUBEK Michal REYES Mauricio RIEGLER Michael A WIESENFARTH Manuel KAVUR A Emre SUDRE Carole H BAUMGARTNER Michael EISENMANN Matthias HECKMANN-NOETZEL Doreen RAEDSCH Tim ACION Laura ANTONELLI Michela ARBEL Tal BAKAS Spyridon BENIS Arriel BLASCHKO Matthew B CARDOSO M Jorge CHEPLYGINA Veronika CIMINI Beth A COLLINS Gary S FARAHANI Keyvan FERRER Luciana GALDRAN Adrian BRAM van Ginneken HAASE Robert HASHIMOTO Daniel A HOFFMAN Michael M HUISMAN Merel JANNIN Pierre KAHN Charles E KAINMUELLER Dagmar KAINZ Bernhard KARARGYRIS Alexandros KARTHIKESALINGAM Alan KOFLER Florian KOPP-SCHNEIDER Annette KRESHUK Anna KURC Tahsin LANDMAN Bennett A LITJENS Geert MADANI Amin MAIER-HEIN Klaus MARTEL Anne L MATTSON Peter MEIJERING Erik MENZE Bjoern MOONS Karel G M MUELLER Henning NICHYPORUK Brennan NICKEL Felix PETERSEN Jens RAJPOOT Nasir RIEKE Nicola SAEZ-RODRIGUEZ Julio SANCHEZ Clara I SHETTY Shravya MAARTEN van Smeden SUMMERS Ronald M TAHA Abdel A TIULPIN Aleksei TSAFTARIS Sotirios A BEN Van Calster VAROQUAUX Gael JAEGER Paul F
Year of publication	2024
Type	Article in Periodical
Magazine / Source	NATURE METHODS
MU Faculty or unit	Faculty of Informatics
Citation
web	https://www.nature.com/articles/s41592-023-02151-z
Doi	https://doi.org/10.1038/s41592-023-02151-z
Keywords	HEALTH; SEGMENTATION; CRITERIA
Attached files	MaierHein_NatMeth_2024.pdf
Description	Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint — a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.
Related projects:	National research infrastructure for biological and medical imaging