Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Yicheng Wu, Minfeng Xu, Zongyuan Ge, Jianfei Cai, Lei Zhang

Abstract

Semi-supervised learning has attracted great attention in the field of machine learning, especially for medical image segmentation tasks, since it alleviates the heavy burden of collecting abundant densely annotated data for training. However, most of existing methods underestimate the importance of challenging regions (e.g. small branches or blurred edges) during training. We believe that these unlabeled regions may contain more crucial information to minimize the uncertainty prediction for the model and should be emphasized in the training process. Therefore, in this paper, we propose a novel Mutual Consistency Network (MC-Net) for semi-supervised left atrium segmentation from 3D MR images. Particularly, our MC-Net consists of one encoder and two slightly different decoders, and the prediction discrepancies of two decoders are transformed as an unsupervised loss by our designed cycled pseudo label scheme to encourage mutual consistency. Such mutual consistency encourages the two decoders to have consistent and low-entropy predictions and enables the model to gradually capture generalized features from these unlabeled challenging regions. We evaluate our MC-Net on the public Left Atrium (LA) database and it obtains impressive performance gains by exploiting the unlabeled data effectively. Our MC-Net outperforms six recent semi-supervised methods for left atrium segmentation, and sets the new state-of-the-art performance on the LA database.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_28

SharedIt: https://rdcu.be/cyl2y

Link to the code repository

N/A

Link to the dataset(s)

http://atriaseg2018.cardiacatlas.org

https://wiki.cancerimagingarchive.net/display/Public/Pancreas-CT

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes a mutual consistency network for semi-supervised segmentation. Experiments on a publicly available left atrium segmentation dataset show the superiority of the proposed method compared to existing state-of-the-art methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- A good visual explanation of the decrease in epistemic uncertainty with the increase in dataset size. This is a good motivation for using epistemic uncertainty in a semi-supervised segmentation task.
- Comparison against multiple state-of-the-art methods is commendable.
- Use of cyclic loss function for segmentation is novel.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- A big part of the method section is spent on estimating model uncertainty using entropy. It is not clear how these uncertainties are used during the training. Especially since the first line of section 2.2 mentions “Based on the estimated uncertainty information, we then …”. As the main motivation of the paper is to utilize highly uncertain areas for segmentation to guide network training, it makes the contribution questionable.
- In the paper, the publicly available dataset was divided into training and validation set manually. While comparing this against the state-of-the-art methods, results reported in the respected papers were taken directly. How was it made sure that all methods use the same data split?
- In the ablation study, consistency between sPL_A and sPL_B is evaluated. It is not clear how this is useful as consistency loss between these two is not part of the main loss function Eq.(4)
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- The paper should be reproducible as all hyperparameters and training procedure is explained in sufficient detail.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- Why MSE loss was used for consistency training? Why not a soft-BCE loss?
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposes a good application of consistency training, but some of the details are missing as mentioned in the weakness section. Addressing those would lead to changing the overall opinion of the paper towards accept.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper proposed a CNNs-based method that utilized uncertainty information and cycled pseudo label scheme for semi-supervised learning. Their method was evaluated in left atrium segmentation task and achieved superior performance in comparison with state-of-the-art methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) This paper adopted an effective method for semi-supervised learning. The method was based on previous studies such as uncertainty measurement, entropy regularization, probability recalibration by temperature, and mutual consistency. Although (novelty) the proposed method is kind of incremental, the reviewer believes that it does have value for the design of such an effective training framework by introducing uncertainty information and mutual consistency learning. 2) The evaluation experiments are comprehensive, with performance comparison with 6 state-of-the-art methods and ablation study on each constituting component. Strong evaluation makes the proposed method convincing.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1) Many studies have been conducted on semi-supervised learning. As mentioned by the authors as well, Refs [13, 14, 17, 20] are some of the studies that more or less enlightened the authors for method development. Therefore, the proposed method may seem incremental by making the best of previous methods.

2) Apart from the state-of-the-art methods in left atrium segmentation, it would be better if more state-of-the-art semi-supervised methods are introduced for introduction/comparison.

[13] Xia, Y., et al.: 3d semi-supervised learning with uncertainty-aware multi-view cotraining. In: WACV 2020. pp. 3646–3655 (2020)

[14] Xie, Q., Dai, Z., Hovy, E., Luong, M.T., Le, Q.V.: Unsupervised data augmentation for consistency training. arXiv preprint arXiv:1904.12848 (2019)

[17] Yu, L., Wang, S., Li, X., Fu, C.W., Heng, P.A.: Uncertainty-aware self-ensembling model for semi-supervised 3d left atrium segmentation. In: Shen D. et al. (eds) MICCAI 2019. pp. 605–613. Springer, Cham (2019). https://doi.org/10.1007/9783-030-32245-8 67

[20] Zheng, Z., Yang, Y.: Rectifying pseudo label learning via uncertainty estimation for domain adaptive semantic segmentation. International Journal of Computer Vision pp. 1–15 (2021)
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Open public dataset.

Codes are yet to be released.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

1) In Table 1, are these state-of-the-art methods implemented by the authors themselves? If yes, are the hyper-parameters kept the same with previous studies? If no (namely, the authors quote results from previous studies), do these methods use the same training/evaluation set (such as the split of labeled/unlabeled samples)?

2) If the diversity of segmentation models improved the diversity features, what is the performance if more sub-models are introduced for feature learning with mutual consistency constraint? How about directly designing a teacher-student model with mutual prediction consistency?

3) From the qualitative results in Fig.3 and Fig. S1, it seems the proposed method has a lower sensitivity than other methods where small branches/regions are missing in results. It would be better if TPR/FPR are reported.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Based on the overall measurement in motivation, method novelty, evaluation experiments, analysis, and paper writting.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

This paper proposed a mutual consistency net (MC-Net) for semi-supervised left atrium segmentation. In particular, by enforcing the consistency of two slightly different segmentation results obtained by two decoder networks with two different up-sampling Conv kernels, the method achieved promising results on the benchmark LA dataset.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed loss function is an interesting study for the left atrium segmentation, the experimental results are competitive.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Lack of explanation for the rationality of network architecture. The improvement is not really impressive against the SOTA. See below the 7th point for details.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The code is not available to check the reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Questions:

(1) What if using bicubic and nearest-neighbour for the upsampling within two decoders? What’re the principles to determine the up-sampling? The authors should clarify this.

(2) Is the performance sensitive to the neural network architecture? For example, does the performance really dominated by the proposed loss function? I am curious that if the performance is mainly due to the inductive bias from the neural architecture? An ablation study on the architecture is helpful to illustrate the importance of the proposed loss.

(3) The distribution consistency (L_c) is evaluated by MSE, why not directly apply the KL divergence to measure the distribution consistency?

(4) What is the impact of lambda?

(5) In table 1, the number of network parameters and the reasoning time should be included.

(6) The improvement against the SOTA methods is not really impressive.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The advantages/potentials of the proposed method were not presented well.
What is the ranking of this paper in your review stack?

4
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

Recommend for rebuttal. The main weakness of this paper are limited novelty, only tested on one dataset and the fair comparison with the SOTA methods. Cycle-consistency has been a well-verified mechanism for semi-supervised learning. Even if the paper focuses on left atrium segmentation, the proposed algorithm has no special design for LA and should be able to be verified with other tasks. Evaluation on one task cannot convince the reader its usefulness. Finally, two reviewers raised concern on if fair comparison is achieved. The AC also has similar concern.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

Author Feedback

We thank AC and all reviewers for their constructive comments. This paper receives 3 reviews with scores of 5, 8 and 5 from R2, R3 (both very confident) and R5 (not absolutely certain), respectively. We are encouraged by the positive comments: 1) Results against SOTA are commendable (R2), strong (R3) and competitive (R5); 2) Using uncertainty or cyclic loss is of good motivation/novel (R2), effective (R3) and interesting (R5). R2 requires to clarify details before acceptance, and R5 has some doubts on our method. Below we first address the major issues summarized by AC.

Q1: Technical Novelty (AC) Although we agree with AC that the cycle-consistency is common, applying such cycled consistency to exploit highly uncertain regions in SSL is new. This is the key insight of this paper, i.e. forcing the two decoders to generate consistent and low-entropy outputs at highly uncertain regions (e.g. small branches or adhesive edges around the target, see Fig. 1). It is fundamentally different from the existing uncertainty-based SSL works, which throw away highly uncertain regions. Moreover, a common consistency measurement between the two decoder outputs won’t give us SOTA performance, while our carefully designed cycled consistency between probability outputs and pseudo labels can. Thus, it is not an incremental contribution. All reviewers recognize our method is novel/interesting.

Q2: Only tested on one dataset (AC) The LA dataset is a popular public dataset for SSL. Most of the existing models [2, 6, 17] were only evaluated on LA. Since AC requested, we further evaluated our model on the public NIH Pancreas CT dataset (3D data). All experiments are conducted on the public benchmark (see https://github.com/HiLab-git/DTC). Our proposed MC-Net also achieves the highest dice:

Model-Labeled/Unlabeled Data-Dice(%)

UA-MT [17]-10%/90%-68.70 SASSnet [6]-10%/90%-66.52 DTC [7]-10%/90%-66.27 Our MC-Net-10%/90%-68.94

UA-MT [17]-20%/80%-76.75 SASSnet [6]-20%/80%-77.11 DTC [7]-20%/80%-78.27 Our MC-Net-20%/80%-79.05

Q3: Fair Comparisons (R2&R3&R5) Thanks to [17], all models in this paper are implemented based on the public benchmark (see https://github.com/yulequan/UA-MT) (1) As we stated in section 3.2 (Page#6), all experiments were conducted on the same environments with fixed random seeds as [2,6,7,12,17]. (2) The related results of other methods are taken directly from the published papers. The data split is fixed on LA and hypermeters such as lambda in Eq.(4) are identical in all experiments. Therefore, we assure that we did the fair comparisons.

Q4: How uncertainties are used? (R2) We simplify the MC-Dropout and use the prediction discrepancies of two decoders to represent the uncertainty (see section 2.1, Page #4). Then, we transform the uncertainty into the loss Lc for training.

Q5: Improvement against SOTA is not impressive (R5). SSL is a challenging topic. If you check the SOTA improvements from MICCAI’19 to MICCAI’20 to AAAI’21 in Table 1, our gains are comparable. Moreover, our idea and loss are general and can benefit many existing methods.

Q6: Performance is due to architecture? (R5) We have a detailed ablation study in Table 2. We can clearly see the proposed two different decoders plus the proposed CPL loss do improve the performance, with the same network architecture.

Q7: Using other types of decoders (R5) and more decoders (R3) It is interesting to try different types of decoders and more decoders, for which we will leave for our future work.

Q8: Ablation about consistency (R2) The ablation is to show a simple consistency between sPL_A and sPL_B is inferior to the proposed CPL.

Q9: Other consistency loss (R2&R5) We could try other consistency loss such as soft-BCE or KL. This is not the main point. So far, a simple MSE loss already demonstrates the effectiveness of our framework.

Q10: Other issues (R3&R5) We will address other minor issues such as additional references and more metrics in the final version.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The author proposed a semi-supervised algorithm based on the cycle-consistency. The paper is generally well written, with comprehensive experiments and good performance on LA dataset in different settings. The main weakness is the original version of the paper only include the evaluation on one dataset. The authors added the number on another dataset in the rebuttal. The novelty of the paper is somehow incremental but has its value to other researchers. It is relatively easy to implement given its performance gain. Recommend for accept and ask the author to include the results on Pancreas data in the supplementary if finally accepted.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This work proposed a segmentation approach which uses uncertainty information during the training process. There were some concerns brought up in the reviews with regards to fairness of the evaluation (based on training / test splits for competing approaches) and performance on other tasks than for left atrium segmentation, which were addressed in the rebuttal. Specifically, the rebuttal confirmed that the training / test splits are comparable (hence the comparisons should be fair) and that the method can also lead to good performance on other tasks.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

My concern lies on the novelty (methodology or application wise) and solidity of validation. I could not identify the important methodological contribution from this work. Though the work was aimed to focus on the challenging task of LA segmentation, the method was not specifically designed with consideration of the characteristics of segmentation or clinical applications of LA (e.g. scar quantificaiton). The proposed method was a general SSL method, which should be validated and compared with SOTA methods on various tasks.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

15

back to top

Semi-supervised Left Atrium Segmentation with Mutual Consistency Training