Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Jialin Shi, Ji Wu

Abstract

Despite the success of deep learning methods in medical image segmentation tasks, the human-level performance relies on massive training data with high-quality annotations, which are expensive and time-consuming to collect. The fact is that there exist low-quality annotations with label noise, which leads to suboptimal performance of learned models. Two prominent directions for segmentation learning with noisy labels include pixel-wise noise robust training and image-level noise robust training. In this work, we propose a novel framework to address segmenting with noisy labels by distilling effective supervision information from both pixel and image levels. In particular, we explicitly estimate the uncertainty of every pixel as pixel-wise noise estimation, and propose pixel-wise robust learning by using both the original labels and pseudo labels. Furthermore, we present an image-level robust learning method to accommodate more information as the complements to pixel-level learning. We conduct extensive experiments on both simulated and real-world noisy datasets. The results demonstrate the advantageous performance of our method compared to state-of-the-art baselines for medical image segmentation with noisy labels.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_63

SharedIt: https://rdcu.be/cyhMG

Link to the code repository

https://github.com/shijial/PINT_code

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a new method for robust medical image segmentation under label noise. The main contributions are: (1) A novel pixel-wise noise estimation method and robust learning strategy is developed. (2) A novel image-level noise tolerant learning strategy is developed. (3) Unlike most existing works, the proposed method explores both pixel-wise and image-level noise estimation, and combines these two-level information for more robust segmentation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Overall the proposed method is novel. This method consists of two phases. In the first phase, a novel pixel-wise noise estimation method is introduced and a robust learning strategy is employed. Given that only relying on this first phase training is not enough and tends to ignore the hard examples, a second phase training is introduced. This phase is similar to the first phase, but works on the image level. Through utilizing both phases, the proposed method achieves better performance than several state-of-the-art methods.
2. The experimental setups are reasonable, and the results well support the claims made in the paper. For example, (1) For each experiment, each model is repeatedly run multiple times and the mean and standard deviations are reported. (2) Both synthetic and real-world datasets are utilized. (3) Illustrative figures, especially the quantitative results, are helpful in understanding the properties of the proposed method.
3. This paper is well written, with clear motivation and organization.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Overall the proposed method is novel. But there are a few operations/ideas that are not very novel. For example, in the pixel-wise loss, both the original noisy labels and pseudo pixel-wise labels are utilized. This idea is also adopted in some previous works, e.g., Yi et al. (2019). Also, the idea of using mean-teacher strategy to deal with label noise was also adopted in existing works, such as Nguyen et al. (2020).
Reference: [1] Kun Yi and Jianxin Wu. Probabilistic End-to-end Noise Correction for Learning with Noisy Labels. CVPR, 2019. [2] Duc Tam Nguyen, et al. SELF: Learning to Filter Noisy Labels with Self-Ensembling. ICLR, 2020.
1. In this paper, it is claimed that “Image-level robust learning can be regarded as the complement to pixel-level robust learning”. One interesting question is: can the pixel-level robust learning be regarded as the complement to image-level learning? In other words, can we change the order of the two phases? That is, in the first phase, we use image-level robust learning; then in the second phase, we use pixel-level robust learning. It would be better to provide extra ablation study on the order of these two phases.
2. In the experiment, the proposed method is compared to several baselines. However, all these methods are from/before year 2019. It is suggested to include some other methods which were published in 2020 or later.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

This paper provides details regarding the experimental setup, involving datasets, training strategy, hardware and software. The synthetic dataset is publicly available, but the real-world dataset is not. No code is found in the paper submission. Also, no validation set is used (only training and testing sets are used). In the experiment, the model is run multiple times to produce the mean and standard deviation values. No statistical significance analysis is provided.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. It is suggested to provide extra experiments on the order of the two phases.
2. For the baseline methods, it is suggested to include a few more recent methods which were published in 2020 or later.
3. Some typos need to be fixed. For example, “is equals to” -> “is equal to”.
4. Some descriptions are unclear and need to be clarified. For example, (1) On page 5, “We randomly crop 112×112×80 sub-volumes as the inputs.” Does this mean we also randomly crop the testing data? (2) The high learning rate and early-stopping strategies are utilized for the proposed method. Are these strategies also applied to the baseline methods?
5. It is mentioned that “all hyper-parameters are empirically determined based on the validation performance of LA dataset.” I think this means the hyperparameters are tuned on the testing data since no validation data are offered in the experiments. This is not a standard operation in machine learning because we cannot tune the parameters on testing data, but instead should choose the parameters on validation data.
6. In table 1, the rightmost vertical line does not reach the first row. It would be better to make it also split the first row, between 50% and 75%.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall the proposed method in this paper is novel (although some of the ideas have been employed in previous works). In addition, the experimental settings are technically sound, and the results are convincing. Finally, this paper is well written with clear motivation and organization.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

6
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

-Integrate the proposed method in a generic framework and validate in a public dataset -Proposed a method to improve segmentation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

-The paper tries to address an issue that is crucial in biomedical segmentation.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

-Soundness of methodology. The authors used uncertainty map to guide the robust learning with a rule of changing weight regarding its uncertainty. i) The proposed method suffers from the problem like “detecting the noise of the signal (label in this problem formulation) without seeing the signal (label)”. The authors proposed robust learning method simply guided by uncertainty map. However, as shown in Fig. 1, the generation of uncertainty map is dependent of the observation of noisy label. Although targeting on distilling information from noisy label, the guidance from uncertainty map is not related to real noisy label. Notice that the distribution of label noise is undetermined (e.g. with a unknown width of dilation or completely random pattern from non-expert annotator). Simply predicting the usefulness of pixel (and its label) from original image without seeing actual noisy label is not theoretically plausible. The third row in the right column of Fig. 2 also indicates that the noise variance can be completely uncorrelated to uncertainty map. ii) It is not clear how image-level robust learning could help to highlight “clean” pixel with high uncertainty. The weight of image is based on average of all pixel-wise uncertainty within an image. Without seeing image-level noise label quality, it is still possible that the accurate boundary label is not counted in image-level robust learning module. For example, in the right column of Fig. 2, on image-level, row 3 has a smaller value of averaged uncertainty than row 1. However, the overall labeling quality (regarding noise variance) of row 3 is better than row 1.

-Lack of novelty: the design of inputting image with Gaussian noise to evaluate uncertainty shares similar idea from Laine et al ICRL 2017.

-Converging issue. With different loss functions in two serial phases, this framework could be hard to converge.

-Incomplete comparison. The results in Table 2 only compares proposed method with image-level robust learning. It is also important to compare with other pixel-level robust learning.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

-The experimental setup is based on a dataset that is publicly avaliable. -Code are not open to public yet.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

-Lack of clarity: i) Confused use of term. The technical part is hard to follow because the authors used a lot terms, e.g. hard label, soft label, clean pixel, clean label, noisy label, noisy pixel(s). The related terms are not clearly defined. ) ii) Noise rate. It is important to define whether noise rate in experimental setup is on image level or pixel level. iii) Synthetic input. What is the mean and std of the gaussian noise in synthetic input. -Algorithm framework. The reviewer suggests the authors to produce an overall framework to enhance the readability of this paper. -Parameter tuning. It is suggested that the authors to provide more details on how parameters are tuned. -3D visualization. As the framework is implemented on 3D dataset and 3D segmentation. It will be great if the authors could provide a 3D visualization on the improvement of segmentation performance.
Please state your overall opinion of the paper

reject (3)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The major factor that led to the overall score of this paper is the soundness of methodology. It is not convincing to propose a robust learning scheme solely based on the evaluation of input pixel without checking actual noisy label and its characteristics.
What is the ranking of this paper in your review stack?

8
Number of papers in your stack

8
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

In this paper, the authors proposed a novel deep learning based segmentation method. The proposed method integrated both pixel-level and image-level information to solve the noisy label problem. The proposed method has been evaluated on a publicly available dataset and outperformed several state-of-the-art segmentation methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The strengths of the paper are as follows.
1. The paper is well-written. The presentation is clear.
2. The authors selected a classic medical image analysis problem.
3. The techniques use in this paper are reasonable.
4. The performance of the proposed method is promising.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The main weakness of the paper are as follows.
1. The segmentation task is a classic task is not very attracting to the reviewer.
2. The methods used for comparison are limited.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The dataset used in this study is publicly available. The detail of the implementation is presented. The authors promised to make their code publicly available. So, the reproducibility of the paper is high.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
The comments/suggestions from the reviewer are as follows.
1. The author should explain the exponential moving average (EMA) in detail or provide the citation.
2. On page 7, “Compared to the baselines” should be “Compared with the baselines”.
3. To show the performance of the propose framework, the authors are suggested to compare more state-of-the-art method. Since there are only two methods are used for comparison. Especially, the V-Net has been published 5 years ago.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This submission could be a high quality paper, if the author can further refine the manuscript. The only concern from the reviewer is that the clinical application. Since segmentation is a classic problem and there would be lot of segmentation paper. So the reviewer is little bit picky.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposes a method to address pixel-level label noise in image segmentation problem. The proposed method is a two-stage method using both pixel-level information and image-level information, leading to a strong performance.

This paper received diverging scores. On the positive side, R1 and R3 think the paper is well-written. The novelty of the method is on how label noise is addressed in pixel level, as well as how image level information is used to further re-calibrate pixel-level prediction. Experimental results are strong. But newer baseline methods after 2019 are recommended.

On the negative side, R2’s major concern is the validity of the approach: without knowing which pixels are noisy, and also considering the potential heterogeneity of the pixel label noise, the rationale of the method is not quite convincing. The relationship between noise-labeled pixels and uncertainty map needs to be elaborated. Comparing with other pixel-level robust learning is also requested (but no actual citation is provided).

The authors are invited to provide rebuttal and address the reviewers’ concerns carefully.

Note the authors may not submit additional experimental results in the rebuttal. The rebuttal is only meant for clarification purpose.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Author Feedback

1 Newer baseline [R1 and R3] When choosing the baselines, we gave priority to more typical works including the ‘Reweighting’ and ‘Pick-and-learn’. They happen to be articles of 2019. We agree that newer baselines in 2020 are better. We will adjust our manuscript to include the newer baseline.

2 The validity of method [R2] 2.1 Relationship of noise-labeled pixels and uncertainty maps. The previous studies [Liu et al. NIPS2020, Steven et al. ICLR2020] have verified that deep networks tend to learn clean samples faster than noisy samples when trained with noisy labels. Our pixel-wise noise estimation is proposed based on agreement maximization principle. The motivation is that the predictions under different perturbations for the same input would agree on the relative clean pixel-wise labels, and it is unlikely for these predictions to agree on incorrect pixel-wise labels. Concretely, if pixel-wise label tends to be clean, the pixel is easier to be predicted and we usually obtain the same predictions. The mean prediction probability distribution is likely to be peaky, which means a small entropy and a small uncertainty. In contrast, if a pixel-wise label tends to be noisy, we usually obtain different predictions. The mean prediction probability distribution is likely to be flat, leading to a large entropy and a high uncertainty. Therefore, the generation of uncertainty map is guided by prediction agreement of clean pixels and prediction disagreement of noisy pixels. Moreover, multiple predictions come from the auxiliary networks. Based on EMA of mean-teacher [14], auxiliary networks are also updated with the supervision of original noisy labels and pseudo labels. Thus, our pixel-wise robust learning guided by uncertainty maps is theoretically feasible. We can adjust our manuscript to clarify this. The relationship between pixel-wise label noise and uncertainty is experimentally verified in section 3.2. We observe that the noise usually exists in the areas with high uncertainty. The third row of Fig. 2 is an example to demonstrate ‘there are also some clean pixels showing high uncertainty when they lie in the boundaries.’ Thus, it is necessary to propose image-level robust learning as complement.

2.2 The validity of image-level robust learning The image-level robust learning is not used to estimate absolutely accurate boundary labels or highlight all clean pixels with high uncertainty. Instead, it aims to further distill effective supervision information from image-level data even if some pixels involved have noisy labels. The third row of Fig. 2 is the random cropped example to demonstrate the necessity of adding image-level learning. Because of shape variances among slices in 3D segmentation, we do not adopt the strategy of adjusting weights among slice-level data based on absolute uncertainty values. Thus, there is no need to compare the uncertainty values between the first row and third row of Fig.2. Instead, we adjust the contributions of pseudo label and original noisy label guided by image-level uncertainty for each image. The image-level loss is constructed as Eq.5. If the image-level uncertainty is large, the original noisy label contributes less. Conversely, the original noisy label could provide more meaningful information with a large weight even if some pixels are noisy. With this strategy, we distill more information to improve the performance. The experimental results of PINT (pixel-wise and image-level learning) and PNT (pixel-wise learning only) have verified the effectiveness of adding image-level robust learning.

2.3 Incomplete comparison We adopt ‘Reweighting’[11] as the method for pixel-level robust learning. It achieves average Dice 69.31% and average ASD 2.11 voxels on real-world CTV dataset. Due to the limited space, we do not present the result in Table 2. We will update Table 2 to contain it in the final version. Our code will be publicly available upon acceptance and more details could be found.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Overall I think the paper has a novel and convincing approach that is beyond the existing machine learning literature on label noise. The concern regarding the validity (raised by R2) were carefully addressed in the rebuttal. I agree that image level label can be very good additional information to re-calibrate pixel-level label noise. The paper should be published in MICCAI.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper tackles an important problem in medical image segmentation that the provided pixel-level & image level labels contain noises. The proposed approach is based on robust pixel-level and image-level loss based on uncertainty, which is estimated by the entropy of predictions. The proposed approach is evaluated on synthetic and real datasets, and show improvements over baseline methods. The reviewers concern on the comparisons with new methods, the soundness of methodology, etc. The responses clarified on the validity of uncertainty and the robust training loss. Overall, the paper proposed a novel method for segmentation robust to image and pixel noisy labels, which is important in real applications. The authors are suggested to revise the paper considering the comments of reviewers.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Image-level and pixel-level uncertainty were employed to deal with noisy annotations in image segmentation, the idea is new for medical image computing. The rebuttal clarified rationale of the proposed method. For comparison with more recent works, the authors have promised to include this in the final version. Overall this paper has a good quality, but more comparison would make it more convincing.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

back to top

Distilling effective supervision for robust medical image segmentation with noisy labels