Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Xiangde Luo, Wenjun Liao, Jieneng Chen, Tao Song, Yinan Chen, Shichuan Zhang, Nianyong Chen, Guotai Wang, Shaoting Zhang

Abstract

Gross Target Volume (GTV) segmentation plays an irreplaceable role in radiotherapy planning for Nasopharyngeal Carcinoma (NPC). Despite that Convolutional Neural Networks (CNN) have achieved good performance for this task, they rely on a large set of labeled images for training, which is expensive and time-consuming to acquire. In this paper, we propose a novel framework with Uncertainty Rectified Pyramid Consistency (URPC) regularization for semi-supervised NPC GTV segmentation. Concretely, we extend a backbone segmentation network to produce pyramid predictions at different scales. The pyramid predictions network (PPNet) is supervised by the ground truth of labeled images and a multi-scale consistency loss for unlabeled images, motivated by the fact that prediction at different scales for the same input should be similar and consistent. However, due to the different resolution of these predictions, encouraging them to be consistent at each pixel directly has low robustness and may lose some fine details. To address this problem, we further design a novel uncertainty rectifying module to enable the framework to gradually learn from meaningful and reliable consensual regions at different scales. Experimental results on a dataset with 258 NPC MR images showed that with only 10% or 20% images labeled, our method largely improved the segmentation performance by leveraging the unlabeled images, and it also outperformed five state-of-the-art semi-supervised segmentation methods. Moreover, when only 50% of images were labeled, URPC achieved an average Dice score of 82.74% that was close to fully supervised learning. Code is available at: https://github.com/HiLab-git/SSL4MIS

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_30

SharedIt: https://rdcu.be/cyl2A

Link to the code repository

https://github.com/HiLab-git/SSL4MIS

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a new unlabeled regularization technique named Uncertainty Rectified Pyramid Consistency (URPC) for image segmentation problem. The method is evaluated on nasopharyngeal carcinoma segmentation. The results show competitive performance to other semi-supervised learning methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The pyramid consistency idea is new, and seems to provide efficient regularization effect from unlabeled data compared to the baseline network.
- The paper is well written. The details of the method and results are clearly presented.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Although the pyramid consistency idea is new, I do not find the motivation convincing. Basically, the consistency is a bias introduced by authors, which forces multi-scale predictions to be consistent. This bias may provide regularization effect but it’s arbitrary. For example, can I force the multi-scale predictions to be inconsistent rather than consistent? This also introduces a bias and may provide similar regularization. We do not know which bias is better until we test it since this bias is not based on causal observation or deduction. For example, the transformation consistency (like [2,11]) is based on the observation that images with different transformations should preserve similar anatomical information, while for now I do not see a similar observation in this work (perhaps similar to the motivation of deeply supervision). In my opinion, the arbitrary motivation of the pyramid consistency limits its novelty.
- In the ablation study, the hyperparameter tuning also looks quite arbitrary. I do not think S=4 works for other cases (like larger or smaller networks, different applications). There looks no guide for tuning but only with trial and error. Since this paper is focused on methodological contribution, the vague tuning strategy hinder the generalization ability of the proposed method.
- The motivation of uncertainty rectification and uncertainty minimization are vague. I do not find the difference between them, since they all encourage the multi-scale predictions p_s to be closer to the average prediction p_c but in different loss functions. In the results (Table 1), the uncertainty minimization does not seem to provide additional value to uncertainty rectification.
- Compared to previous method (DAN), the improvements are very limited (Table 2). This also raises questions whether the introduced bias makes sense in this network and application.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The details about the algorithm and dataset are well presented in the paper. The authors also promise to provide source codes, which ensures good reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

This paper proposed a new idea in semi-supervised learning. However, in my opinion, there lacks a insightful thinking about the method, e.g., what observation leads the authors to choose this bias but not others? Do similar biases (consistency or inconsistency) provide similar regularization effects? How could the proposed method and tuning strategy generalize to other networks/datasets? As a reader, I would be more eager to see such discussions.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed semi-supervised method is new but is quite arbitrary in motivation and tuning strategy. The paper needs to elaborate on these parts. In my view, there is still much room for improvement and could be a potential nice work. Therefore I give a borderline reject decision.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper proposes a multi-scale consistency loss for semi-supervised learning. By estimating uncertainty, the consistency loss is further re-calibrated to reduce the noise, i.e. voxels on the edge are more likely to produce noises, thereby should make less influence to the semi-supervised learning process.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Despite the straight-forward idea of the consistency term, this paper proposes reasonable follow-ups (UR, UM) to address potential flaws of it.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The follow-ups (UR, UM) had unconsistent impact on the DSC/ASD scores, therefore it’s not clear whether these terms are important. For example, adding UR to GTVnx gave the best results, while adding UR+UM to GTVnd gave much better results than UR-only. Therefore, the claim “From the last section of Tab. 1, we can see that both uncertainty rectifying (UR) term and uncertainty minimization (UM) term boost the model performance” is not always valid and there should be some discussion here.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The author provided code as well as a document to help splitting the data into train, val and test sets. However, I did not find the download link of the dataset claimed to be available, and it’s not made clear whether the dataset is to be released upon acceptance. The readme.md does not tell how to download the dataset. (If other reviewers find the link, this shouldn’t be a problem.)

In addition, the method proposed is somehow a general solution to semi-supervised learning. It’s not explained why the authors did not choose to evaluate it on open datasets used before.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The authors should have made it clear that L_{pyc} is an initial idea to be further improved and is therefore not used in their final solution. Also highlight in the ablation study that L_{pyc} is applied to models named (S=2,…,5).
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The basic idea of multi-scale consistency and being aware of fixing potential flaws of it should be a good addition to the literature.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

This paper proposes a novel semi-supervised learning method. The network leverage the multi-scale prediction for consistency analysis. The experimental results show the proposed method outperforms state-of-the-art methods and close to a fully supervised approach.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Consistency analysis is applied to multi-scale predictions, which exploited the multi-scale information for better information learning.
- Well organization and easy to read.
- Extensive experiments to validate the proposed method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The multi-scale information analysis is similar to deeply supervision, which is an old concept. Author exploits the prediction uncertainty by employing KL-divergence on network averaged predictions rather than commonly used MC method, and claims the proposed one is more time efficient. However, these two methods cannot be compared directly, as they have different usage of the inference: one is considering the uncertainty of multi-scale prediction, while another one exploited the prediction stability by removing some nodes inside the CNN. The conventional MC method tries to generalize the network to ensure the unlabeled data can be exploited properly, but the proposed method gives me a feeling that the unstable prediction is manuplated to overfit the prediction. Maybe a in-depth discussion is required.
- Why Eqn.(5) has same unity weights for two component, it is not reasonable.
- I don’t the less training time is a key advantage for the network, because the method is designed without MC propagation, the only comparable condition is the case the method also have MC procedure but still faster.
- The result in Table 1 and 2, it is strange that the std has a larger value than mean by considering ASD as the metric, do you have a negative value? Or maybe ASD is not a proper metric here.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Good
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

As discussed in the above weakness points, there are some limitations of the paper, which can be improved by more discussions and comparisons.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Nice idea to exploit multi-scale information, maybe include MC with knowledge distillation method could improve the performance
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

7
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

Overall, the paper is well-written. The idea of using multi-scale consistency and uncertainty as part of the supervision for semi-supervised learning is very interesting. The experiments is also well-organized. The paper is only tested on one dataset which is its main weakness. The reviewers raise very good points on the weakness but the meta-reviewer does not think they are strong enough to reject this paper. Will be a good improvement to include some in future version.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

Author Feedback

Sincerely thanks to all reviewers and meta-review for their positive and constructive comments. We believe that the constructive feedback will help us improve the quality of the paper and promote further studies on this semi-supervised segmentation topic. First, we apologize for the motivation and ablation study settings. They will be clearly clarified for better understanding. Second, we will evaluate the proposed method on more datasets. In addition, we will provide the code of the proposed and all the comparisons, and more than one examples on public datasets. Finally, we will provide more details and descriptions, and discussion in the next version.

back to top

Efficient Semi-Supervised Gross Target Volume of Nasopharyngeal Carcinoma Segmentation via Uncertainty Rectified Pyramid Consistency