Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Shuailin Li, Zhitong Gao, Xuming He

Abstract

Learning segmentation from noisy labels is an important task for medical image analysis due to the difficulty in acquiring high-quality annotations. Most existing methods neglect the pixel correlation and structural prior in segmentation, often producing noisy predictions around object boundaries. To address this, we adopt a superpixel representation and develop a robust iterative learning strategy that combines noise-aware training of segmentation network and noisy label refinement, both guided by the superpixels. This design enables us to exploit the structural constraints in segmentation labels and effectively mitigate the impact of label noise in learning. Experiments on two benchmarks show that our method outperforms recent state-of-the-art approaches, and achieves superior robustness in a wide range of label noises. Code is available at https://github.com/gaozhitong/SP_guided_Noisy_Label_Seg.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_50

SharedIt: https://rdcu.be/cyhMt

Link to the code repository

https://github.com/gaozhitong/SP_guided_Noisy_Label_Seg

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper, the authors proposed superpixel-guided iterative learning from noisy labels for medical image segmentation, which is composed of two stages: network update stage using noise-aware training of segmentation network and label refinement stage using noise label refinement. The proposed method was validated on two public datasets: ISIC and JSRT.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is well written and well designed for experiments.
    • Using iterative learning, two segmentation networks (same architecture with different initialization) were trained, and from two results of segmentation, noisy labels were selected and refined. In this paper, authors proposed a novel idea by incorporating superpixel representation into segmentation networks and label refinement, based on the assumption that “the pixels share similar ground truth labels in each superpixel, which enable us to enforce the structural constraints on the label masks and better preserve object boundaries”.
    • Authors evaluated their method on two public datasets, ISIC and JSRT. Results showed the improvements for Dice score comparing Co-teaching, Tri-network, JoCoR methods.
    • Also, the authors showed how superpixel, selection of unreliable pixel, and label refinement had an effect on the improvement for segmentation from noisy labels using ablation study.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    None.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    None.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • There were no descriptions for pre-trained U-Net, which is important for superpixel generation.
    • If the pre-trained U-Net is trained for specific dataset (X-ray chest), superpixel could not be generalized into another specific domain (I,e, X-ray bone).
  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Accept. Authors proposed a novel method for segmentation from noisy labels using superpixel. Their results showed the improvements compared with other methods, and in ablation study, authors confirmed the usefulness of superpixel idea.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    To train the segmentation network with the noisy label data, the authors propose a superpixel-guided iterative learning. Through the proposed method, they maintain the segmentation performance even though the network is trained with noisy data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Strong Evaluation: The experimental results show that the proposed method performs well with noisy data. Especially, compared to recently proposed method, the proposed method shows superior results. Also, this paper conducts experiments with two medical segmentation benchmark datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The technical novelty of the paper is somewhat limited.
    2. Network design of the proposed method is less persuasive. Please refer to the section 7 for details.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors did not provide the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. This paper proposes a training strategy by adopting a superpixel representation and iterative learning scheme. However, using superpixel representation for image segmentation has been conducted R1. In R1, they use the superpixel representation to handle the same purpose (training with noisy label data). It is recommended to clarify the novelty of the paper and the difference from the prior work R1.

    2. This paper adopts a superpixel representation to handle the noisy label. However, the proposed method could be highly affected by the performance of the superpixel algorithm. The superpixel extraction algorithm also could be vulnerable to images with uncertain boundaries. Therefore, it is unclear whether it is appropriate to use superpixels in medical image domain.

    3. How to get the final predictions at inference time? Is network 1 or network 2 or both networks used to get the final segmentation result?

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think that there are some unclear parts to prove the technical novelty of the paper. This is why my preliminary rating is “borderline reject”.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    This paper proposes a segmentation training framework that provides reasonable segmentation even there are noise in the annotations and meanwhile correct the noisy annotation iteratively along the training process. Both noise-aware segmentation, and noisy label correction are boosted by a superpixel representation to incorporate structural prior information from the image.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • novel idea to present the co-training framework that enables noise-aware segmentation and noisy label correction, also novel idea to incorporate superpixel representation to boost the training efficacy
    • good design of the stopping criteria to make the whole framework self-contained
    • good performances are achieved on comprehensive experiments including comparison with state-of-the-art approaches, and ablation studies
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • lack of experiments on real datasets, all experiments are based on simulated noise, which may not be representative enough for labeling errors in reality.
    • lack of visual result on JSRT dataset, x-ray is a more representative medical image modality, unlike the ISIC dataset, there is no clear distinction of foreground and background, the x-ray image itself may also have intensity-based noise, causing issue of the SLIC superpixel parcellation
    • lack of clarity whether the superpixel representation is static or dynamic along the training process, it is confusing when the authors mention “superpixel pooling layer”
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method description is clear, should be possible to reproduce

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • It would be nice to have 1-2 more sentences in the manuscript to explain the terms “co-teaching”, “small loss”, and “multiview learning” even though they can be learned from the references. Specifically, without explanation of co-teaching concept, the two parallel network setting is a bit confusing.
    • the authors should declare limitation of their method. Although this method looks promising in the simulated experiments in this study, it is unknown whether it is still effective on realistic medical image segmentation tasks. many factors to consider (1) whether SLIC still works given image noise and bad image quality (quite common in CT, MR, X-Ray, US), (2) whether the simulated noise patterns are representative to realistic annotation errors
    • the authors should also demonstrate the efficacy of the label refinement by comparing the final corrected labels with the original labels before adding noise
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is a well designed training framework for a very common but important issue about the annotation quality. The results are convincing with comprehensive experiments. The major issue is whether it can be adapted to more general medical image modality and more common use cases.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Reviewers acknowledges the importance and the overall quality of the work. Although, the overall feedback is quite positive, reviewer #3 did point out a novelty issue with a previous work. Please address this issue in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4




Author Feedback

We thank the reviewers for their overall positive comments. Below we first address their shared concerns followed by the individual ones. We will release our code, add suggested references and discuss limitations in the revised version.

  • R3, Meta: Our novelty vs. the prior work

We thank the reviewer for pointing out the prior work. The major difference between our method and Li et al. lies at the network training strategy. Li et al. adopt a standard training scheme and only use superpixel representation to correct noisy labels. In contrast, we employ superpixel representation for both noise-aware training and label refinement stages, in which the former allows us to mitigate the label noise more effectively. Our ablation study shows that such training design improves the performance of ISIC by a large margin (3.8% in dice) in the high-noise setting (See Table 2 row #3).

  • R3, R4: The impact of superpixel quality

Superpixel representation has been shown effective for different medical image modalities in literature, e.g. [1] for CT, [2] for MR and [3] for US images. To tackle X-ray image segmentation, we also incorporate learned deep features of X-ray images to improve superpixel quality. Empirically, our superpixels achieve undersegmentation errors lower than 0.32 for 800 superpixels per image, which is comparable to the natural image setting (See [1] in main paper). Moreover, our method makes superpixel selection in the noise-aware learning, and can potentially discard inaccurate superpixels to achieve better robustness. Particularly, we use 100 superpixels per image in the skin lesion task, which has a higher undersegmentation error (1.0), and our method remains effective.

[1] Qin, et al. “Superpixel-based and boundary-sensitive convolutional neural network for automated liver segmentation.” Physics in Medicine & Biology 63.9 (2018) [2] Tian, et al. “Superpixel-based segmentation for 3D prostate MR images.” IEEE transactions on medical imaging 35.3 (2015) [3] Daoud, et al. “Automatic superpixel-based segmentation method for breast ultrasound images.” Expert Systems with Applications 121 (2019)

  • R2: Pre-trained U-Net in superpixelization

We pre-train a separate U-net for each segmentation task and noise setting, and adopt the JoCoR strategy ([18] in main paper) in the pretraining.

  • R3: Predictions at inference time

The performances of two networks are very similar with a mean difference of 0.19% in dice, and hence we report the performance of network 1 as the final result.

  • R4: superpixel is static?

Our superpixel representation is static. We will rename “superpixel pooling layer” to avoid confusion.

  • R4: Simulated label noise

Simulated label noises have been used as an effective surrogate in the literature as no such public dataset exists. We follow such convention to overcome the challenge of collecting noisy medical annotations. In order to achieve better simulation, we also significantly enlarge noise patterns by adding affine transformations of GT masks. We agree that exploring realistic label errors is an important topic but unfortunately it requires non-trivial effort to build a dataset, and will leave it to the future work.

  • R4: The efficacy of label refinement

We report the dice (%) of final corrected labels on the training set below. Each cell of the table displays the dice of corrected labels and dice of original noisy labels in the parenthesis, which shows the improvement of our label refinement.

Noise settings ISIC-lesion JSRT-Lung JSRT-heart JSRT-clavicle
\alpha=0.3,\beta=0.5 92.99 (92.10) 96.10 (91.73) 94.21 (92.15) 92.30 (92.02)
\alpha=0.5,\beta=0.5 90.25 (86.87) 95.10 (86.23) 93.38 (86.83) 89.66 (86.58)
\alpha=0.7,\beta=0.7 84.07 (73.17) 92.62 (73.05) 89.30 (73.20) 82.21 (73.29)
  • R4: Other suggestions

We will add visual results on JSRT dataset in the supplementary, and explain the suggested terms e.g., “co-teaching”, etc. in the revised version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal gave satisfactory explanation on the relation to a related work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    8



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper use superpixel representation for combating label noise in image segmentation task. The superpixel representation effectively leverages spatial prior into the learning framework. The novelty issue raised by R3 was well addressed in the rebuttal. Overall, the idea is novel and the paper is well written.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    7



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Basically, the reviews are positive and consistent. As summarized by the primary AC, the importance and the overall quality are recognized. The authors’ response clarifies the novelty issue and convinces me. The extra experimental results on the efficacy of label refinement are not referred since this is NOT allowed with MICCAI rebuttal rules. In summary, I agree to accept this paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4



back to top