Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Camila Gonzalez, Karol Gotkowski, Andreas Bucher, Ricarda Fischbach, Isabel Kaltenborn, Anirban Mukhopadhyay

Abstract

Automatic segmentation of lung lesions in computer tomography has the potential to ease the burden of clinicians during the Covid-19 pandemic. Yet predictive deep learning models are not trusted in the clinical routine due to failing silently in out-of-distribution (OOD) data. We propose a lightweight OOD detection method that exploits the Mahalanobis distance in the feature space. The proposed approach can be seamlessly integrated into state-of-the-art segmentation pipelines without requiring changes in model architecture or training procedure, and can therefore be used to assess the suitability of pre-trained models to new data. We validate our method with a patch-based nnU-Net architecture trained with a multi-institutional dataset and find that it effectively detects samples that the model segments incorrectly.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87234-2_29

SharedIt: https://rdcu.be/cyl8p

Link to the code repository

https://github.com/MECLabTUDA/Lifelong-nnUNet

Link to the dataset(s)

https://covid-segmentation.grand-challenge.org/

https://zenodo.org/record/3757476

https://mosmed.ai/datasets/covid19_1110/

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a new approach for detection of neural network failures. Specifically, it focuses on COVID pneumonia classification, using the Mahalanobis distance between compressed representations of the data to detect OOB instances.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Deep neural network failure detection is very important for increased clinical translation and has received a lot of attention recently. Most work has focused on looking at the variance or entropy associated with ensembled predictions, but this paper takes a completely different approach and looks at downsampled latent space distances. The authors show that their approach does a great job of avoiding false negative predictions while also minimizing false positives. Importantly, there is a comparison with prior methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The main weakness is the figures. There should have been a figure showing the uncertainty image and how that is converted to an uncertainty score. Fig. 1 seems totally wrong. z1 is further from z2 in a Euclidean sense? Maybe that is true in the high dimensional space, but it clearly ins’t true in the 2D space of the figure. What are the axes of that figure?

I also would have liked to see a comparison with other distances such as Euclidean, which would have proven one of the hypotheses.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The method seems simple enough, and they claim that they will release the nnUnet code on acceptance. However, it is unclear if the sklearn code will also be released. Open source datasets are used, but release of the annotations is not mentioned, so it will not be clear if those will be available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

-Figure 2 is not very compelling or easy to understand.

It seems like this paper applies performs voxel-wise classification by using a encoder+linear layer in sliding window approach, which is not how a fully-convolutional network such as a U-Net typically works. Why is it called nnUNet if it isn’t fully-convolutional?
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

A new and simple method for failure detection will be a welcome addition to the current literature.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

The authors present a method that detects out-of-distribution (OOD) data for a trained nnU-Net to avoid failing silently. The nnU-Net training does not have to be adapted thus the approach can be integrated seamlessly. They evaluate the method on publically available data and compare it to other state-of-the-art techniques.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- evaluation of publically available data
- comparison to state-of-the-art methods
- the authors plan to make the code publically available
- highly relevant topic: Trained models can fail on OOD data, that is not a big deal if the model also presents the uncertainty, but it is dangerous in the clinical practice if the method fails silently
- very good structure of the paper -nnU-Net is a very good approach itself. Extensions make the system even better
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

I do not see a major weakness.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- nnU-net code is open source and the authors plan to make the proposed method of the paper available after acceptence
- they use mainly publicly available data in this paper
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The authors write “… we divide into 160 training cases to train the nnU-Net architecture, 4 validation cases and 35 cases for testing”. I wonder if the subsets were randomly selected. Since there a only 4 validation data sets out of a heterogeneous data set the selection of the 4 validation data sets might have a strong influence. it would be nice to learn something about the experience of the authors in this matter.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Failing silently is a major issue that have to be adressed. There are several approaches in literature, but this one seems to be very promissing. It is integrated in a state-of-the art deep learning architecture. The paper has a great structure and is very well written.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

2
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

This paper describes a method to detect out-of-distribution (OOD) data to understand and enhance the generalizability of segmentation methods when using data from different sites.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is very well written, the method is very relevant and general.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The generalizability could be better demonstrated by combining the proposed method with further methods than nnU-Net.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The model, datasets, and evaluation methods are clearly described.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- Table 1 summarizes results on the “ID validation data”, but according to Sec. 3, there are just 4 images in this set (“we divide into 160 training cases to train the nnU-Net architecture, 4 validation cases and 35 cases for testing”). Is it a typo? It would make more sense if the data for Table 1 contains images from the other three datasets (Mosmed, Radiopedia, and in-house dataset).
- Fig. 1(b) demonstrates that using Euclidean distance, D(z_1)>D(z_2), while using Mahalanobis distance, Dm(z_1)<Dm(z_2). The case of Mahalanobis distance is only displayed inside the image, there is no description in either the figure caption or in the text.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Dealing with datasets from different sites is becoming more and more necessary, and it is essential to understand the relationships between the different data distributions. The proposed method is a very relevant contribution.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper covers a very important topic to the community. All reviewers highlight the clarity and solidity of the work. Comments made by reviewers regarding figures and explanation details should be taken into account
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Author Feedback

We thank the reviewers and area chair for their thoughtful comments and for appreciating how our flexible approach for out-of-distribution detection can increase the usability of segmentation models in multiple sites. As several reviewers have highlighted, this is a timely topic for our community. We hope that our contribution to the popular nnU-Net framework results in both researchers and clinicians using our method. We will incorporate your valuable feedback when preparing the camera-ready version of the manuscript by improving the figures and captions and clarifying ambiguous statements.

back to top

Detecting when pre-trained nnU-Net models fail silently for Covid-19 lung lesion segmentation