Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Lei Zhu, Kaiyuan Yang, Meihui Zhang, Ling Ling Chan, Teck Khim Ng, Beng Chin Ooi

Abstract

Multi-modal learning using unpaired labeled data from multiple modalities to boost the performance of deep learning models on each individual modality has attracted a lot of interest in medical image segmentation recently. However, existing unpaired multi-modal learning methods require a considerable amount of labeled data from both modalities to obtain satisfying segmentation results which are not easy to obtain in reality. In this paper, we investigate the use of unlabeled data for label-efficient unpaired multi-modal learning, with a focus on the scenario when labeled data is scarce and unlabeled data is abundant. We term this new problem as Semi-Supervised Unpaired Multi-Modal Learning and thereupon, propose a novel deep co-training framework. Specifically, our framework consists of two segmentation networks, where we train one of them for each modality. Unlabeled data is effectively applied to learn two image translation networks for translating images across modalities. Thus, labeled data from one modality is employed for the training of the segmentation network in the other modality after image translation. To prevent overfitting under the label scarce scenario, we introduce a new semantic consistency loss to regularize the predictions of an image and its translation from the two segmentation networks to be semantically consistent. We further design a novel class-balanced deep co-training scheme to effectively leverage the valuable complementary information from both modalities to boost the segmentation performance. We verify the effectiveness of our framework with two medical image segmentation tasks and our framework outperforms existing methods significantly.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_37

SharedIt: https://rdcu.be/cyl2H

Link to the code repository

https://github.com/nusdbsystem/SSUMML

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a semi-supervised learning setting for unpaired multi-modality medical image segmentation to reduce the annotation effort. The proposed method utilizes GAN to translate images between two modalities and adopt the deep co-training strategy to utilize the unlabeled data.

The proposed method was evaluated on public cardiac image segmentation and abdominal multi-organ segmentation tasks and its performance outperforms other baseline methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The whole pipeline is reasonable and the whole paper is well-written.
2. The authors conduct an extensive evaluation on two public datasets and the proposed method achieves better performances.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. In genral, the whole pipeline is clearly demonstrated, while some important technical details are unclear. For example, how to train the image translation module and the image translation module is pre-trained or trained with the segmentation network in an end-to-end manner.
2. It seems that the dataset split (train vs. test and labeled vs. unlabeled) is conducted on image slice level. However, this setting maybe unfair and leaks the testing data information. And for the annotation issue, it is very common to only annotate some scenes/volumes while it is uncommon to annotation some slices in each volume.
3. It seems that the proposed co-training scheme is redundant with the semantic consistency scheme.
4. The proposed problem setting and method is very similar to semi-supervised domain adaption (or semi-supervised multi-modality learning). The authors should compare their method with these methods, like
Li, Kang, et al. “Dual-Teacher: Integrating Intra-domain and Inter-domain Teachers for Annotation-efficient Cardiac Segmentation.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2020.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

There is no public code for this project. And the whole pipeline includes many hyperparameters and the training of GAN. it may be difficult to reproduce the results only from this paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. From Fig. 3, we can see that the performance difference for different \lambda is about 2-3%. Therefore, it is not very confusing to argue that “our method is generally robust to the change of λsc “.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Some issues about method design/experiment setting/comparison should be solved.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper proposes an approach for semi-supervised unpaired multi-modal learning. The method first applies CycleGAN for image translation between CT and MRI data, and regards an image and its cross-modality translation as two different views of the same object. Two segmentation models for each modality are separately trained and a consistency regularization and co-training algorithm are proposed to optimize the segmentation models with the unlabeled data. Validation is performed on public datasets with two segmentation tasks.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Although co-training has already been a well-established semi-supervised method, this paper leverages good practices from prior works and applies to unpaired multi-modal learning.
- Effectiveness is shown with two different segmentation tasks.
- This paper is easy to follow.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The main weakness of this work is that the method is not compared with strong and sufficient baselines, thus the claimed effectiveness is not very convincing.
- Authors compare their methods mainly with different multi-modal approaches, which are not specifically designed for semi-supervised setting, thus it is as expected that the proposed method outperforms those approaches. Only one simple semi-supervised method proposed in 2013 is considered in the comparison, which is insufficient to demonstrate the superiority of the proposed method. Authors are suggested to compare with other state-of-the-art semi-supervised approaches. Also, authors should also consider previous multi-modal learning method presented in [1], which uses similar design of image translation for obtaining different views and may serve as stronger and more relevant baselines than [3][9].
- From the results, it is difficult to know how helpful the semi-supervised unpaired multi-modal learning is. The models trained with 100% labels should be included to know the performance gap when only 0.5% and 2.5% labels are used. A curve of the performance with the increase in the percentage of labeled data should be shown. Another interesting baseline would be training “ST-single” with double percentage of labels. For example, training “ST-single” with 1% labels of single modality and comparing to the proposed method trained with 0.5% labels of two modalities, which can demonstrate the benefits of unpaired multi-modal learning.
- Authors propose a class-balanced deep co-training scheme, but whether the selection of \alpha% for each class instead of the all pixels contributes to the performance is not experimentally shown. Also, the “class-balanced” claim is misleading. If the classes are imbalanced in an image, for example, spleen is much smaller than liver, with the presented pseudo-label selection strategy, the classes would still remain imbalanced.
[1] Li, Kang, Lequan Yu, Shujun Wang, and Pheng-Ann Heng. “Towards cross-modality medical image segmentation with online mutual knowledge distillation.” In Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 01, pp. 775-783. 2020.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Reproducibility is satisfactory.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

One problem of the experimental setting is that the labeled/unlabeled data are split in the image level instead of the volume level. Although the 3D MRI/CT are converted into slices to train 2D networks, the more practical split of the labeled/unlabeled data should still be in the volume level of different subjects. Is the evaluation also performed slice by slice? The evaluation metrics should be calculated in the volume level.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method seems work well for the semi-supervised unpaired multi-modal learning, but the comparison baselines are not strong/sufficient enough. The technical contribution of this paper is tangible.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The paper presents a semi-supervised learning method for unpaired multi-modal medical segmentation tasks. Built upon the image translation models, the authors propose a class-balanced deep co-training method, which achieves superior results than previous state-of-the-arts on fully-supervised multi-modal learning methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The contributions are well illustrated and quite clear. The paper is well organized and easy to follow
- The superiority of the proposed approach is demonstrated on different unpaired multi-modal segmentation tasks.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Missing references: for deep co-training [a], there have been multiple applications [b,c] which use multiple learners to learn representation from different views and make the final prediction by mining the consensus information among these different learners. These literatures should also be included in the introduction or experiment part.
- For the comparison, image translation seems to be not included in all of the compared methods (e.g., X/Y-shape, ST-joint), which makes the comparison not completely fair. For a fairer comparison, please include translated images as augmentation for all the compared methods, to guarantee that the input data are the same.
- For the SC loss, I am curious about whether other loss terms (e.g., contrastive loss) could outperform KL divergence loss. It would be interesting to see some discussion related to this part.
[a] Qiao, Siyuan, et al. “Deep co-training for semi-supervised image recognition.” Proceedings of the european conference on computer vision (eccv). 2018. [b] Zhou, Yuyin, et al. “Semi-supervised 3D abdominal multi-organ segmentation via deep multi-planar co-training.” 2019 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, 2019. [c] Peng, Jizong, et al. “Deep co-training for semi-supervised image segmentation.” Pattern Recognition 107 (2020): 107269.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The code is not included in the submission. I strongly recommend the authors to release the code in the next version.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

please see above.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall the paper presents an important problem and also designs a viable solution by using generative models and deep co-training. Some minor issues need to be addressed, such as adding references & experimental comparison. Other than that, I feel this could be one important study along the semi-supervised learning direction. Therefore I recommend acceptance.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The authors proposed a semi-supervised learning co-training network for unpaired multi-modality medical image segmentation to utilize the unlabeled dataset. They evaluated their methods on two public datasets, i.e., cardiac images and abdominal multi-organ images, and claimed better performance than other baseline methods. However, there are three main points raised by reviewers needed to considers: 1) the image format (2D slice instead of 3D volume) used in the training/test stage. 2) the comparison experiments are not convincing enough as almost no semi-supervised methods were employed for comparison. 3) the parameter experiments of the deep co-training scheme seem to be confusing. Therefore, my recommendation for this paper is “Invite for Rebuttal”. Note that the purpose of the rebuttal is to provide clarification or to point out misunderstandings instead of promising additional experiments.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

Author Feedback

We thank all the reviewers for their valuable comments. The main concerns are addressed as follows:

*1. Image format in training/test stage (Weakness 2 by Reviewer#1 and Question from Reviewer#2):

We would like to clarify that our training and test data have been split at volume level, not on image slice level, so there is no test data information leakage to training data.

Similarly, for labeled and unlabeled data in training stage, the split has also been at volume level.

We also want to highlight that there is no information leakage from labeled set to unlabeled set because when we select certain slices in a volume to form labeled set, the remaining slices from the same volume will not be used as unlabeled data.

Lastly, the evaluation metrics were calculated at volume level.

Thanks for pointing out the confusion, we will clarify the above information in the revision.

*2. Volume vs slice level annotation (Weakness 2 by Reviewer#1):

We want to clarify that our method does not impose any annotation restriction, and doctors are free to annotate either by volume or by ad hoc slices, which are both practised in clinical settings. In fact, one of our contributions is that when doctors do label by ad hoc slices, our method would still be performant when only a small amount of slices are labeled, effectively reducing the annotation burden.

*3. Insufficient comparison experiments with semi-supervised methods (Weakness 4 by Reviewer#1 and Weakness 1 by Reviewer#2):

First, we clarify the selection of ST-single and ST-joint as our semi-supervised baseline methods.

Although the semi-supervised learning method proposed in the ST-single/ST-joint paper is generic, we should have mentioned that we intentionally implemented ST-single/ST-joint with several improvements. For example, ST-joint employs modality specific batch normalization layers to reduce modality difference, so that it can effectively leverage shared cross-modality information for semi-supervised learning. Thus we felt the implemented ST-single and ST-joint to be reasonable semi-supervised baselines, where they represent semi-supervised learning with different numbers of modalities.

Second, we have performed the following new experiments on the two works from Li Kang et al., as requested, for more complete and stronger comparison. Here are the test dice scores on cardiac segmentation:

Li Kang (MICCAI 2020): 0.5% data: MRI 70.8, CT 75.7; 2.5% data: MRI 81.9, CT 84.9;

Li Kang (AAAI 2020): 0.5% data: MRI 70.9, CT 79.6; 2.5% data: MRI 82.3, CT 85.6;

Our method outperforms Li Kang’s methods by average 5.3% and 4.0% respectively. We reason this is because they do not fully utilize all available datasets, as unlabeled source data is not used for training segmentation networks by them. In addition, while our method optimizes performance for both modalities, Li Kang’s two methods only optimize performance for target modality, thus they may not be able to fully leverage on the multi-modality information.

Interestingly, ST-single and ST-joint also give comparable or stronger results compared with Li Kang’s two methods.

Thanks for the suggestions, we will add both Li Kang’s methods in the revision.

*4. Redundant co-training (Weakness 3 by Reviewer#1):

Semantic Consistency (SC) and co-training are complementary instead of redundant as justified in our ablation study in Table 2. Co-training learns discriminative features for unlabeled data while SC is for regularization with scarce labeled data.

*5. Robustness of lambda_sc (Question from Reviewer#1):

Our sensitivity analysis in Fig3 is meant to convey that even when we scale lambda_sc by 100 times, the performance only fluctuates by 2-3%, thus we felt our method is relatively robust to hyper-parameter change.

*6. Training details and Code release (Reviewer#1 and Reviewer#3):

Our entire framework, including image translation module, is trained in an end-to-end manner. We will release our code on GitHub in the revision.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This work proposed a semi-supervised learning co-training network for unpaired multi-modality medical image segmentation. Although the idea of co-training is not novel, all reviewers evaluated the writing and experiments highly enough. The authors also answered the concerns well.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

From the first meta-review: The authors proposed a semi-supervised learning co-training network for unpaired multi-modality medical image segmentation to utilize the unlabeled dataset. They evaluated their methods on two public datasets, i.e., cardiac images and abdominal multi-organ images, and claimed better performance than other baseline methods. However, there are three main points raised by reviewers needed to considers: 1) the image format (2D slice instead of 3D volume) used in the training/test stage. 2) the comparison experiments are not convincing enough as almost no semi-supervised methods were employed for comparison. 3) the parameter experiments of the deep co-training scheme seem to be confusing. Therefore, my recommendation for this paper is “Invite for Rebuttal”. Note that the purpose of the rebuttal is to provide clarification or to point out misunderstandings instead of promising additional experiments.

In the rebuttal, the authors concurred that additional experiments are required to address reviews. These additional experiments cannot be fully peer reviewed, so the paper cannot be accepted.

Here is the relevant section of the rebuttal instructions: “An effective rebuttal addresses reviewers’ criticisms by explaining where in the paper you had provided the requisite information, perhaps further clarifying it. Do not promise to expand your paper to address all the questions raised by the reviewers, as you will not be able to change your article substantially, and in all likelihood you don’t have sufficient room to add to the paper.”
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

17

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper studies the unpaired multi-modality medical image segmentation using a co-training neural network under the semi-supervised setting. In the neural network, GAN is used to translate images acroos modalities and adopts the deep co-training strategy to utilize the unlabeled data. The idea appears reasonable, which is supported by the experimental results reported in the paper. The authors’ rebuttal have largely addressed the questions and concerns raised by the reviewers. Thus I recommend an acceptance to this paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

back to top

Semi-Supervised Unpaired Multi-Modal Learning for Label-Efficient Medical Image Segmentation