Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Ziyuan Zhao, Kaixin Xu, Shumeng Li, Zeng Zeng, Cuntai Guan

# Abstract

The success of deep convolutional neural networks (DCNNs) benefits from high volumes of annotated data. However, annotating medical images is laborious, expensive, and requires human expertise, which induces the label scarcity problem. Especially When encountering the domain shift, the problem becomes more serious. Although deep unsupervised domain adaptation (UDA) can leverage well-established source domain annotations and abundant target domain data to facilitate cross-modality image segmentation and also mitigate the label paucity problem on the target domain, the conventional UDA methods suffer from severe performance degradation when source domain annotations are scarce. In this paper, we explore a challenging UDA setting - limited source domain annotations. We aim to investigate how to efficiently leverage unlabeled data from the source and target domains with limited source annotations for cross-modality image segmentation. To achieve this, we propose a new label-efficient UDA framework, termed MT-UDA, in which the student model trained with limited source labels learns from unlabeled data of both domains by two teacher models respectively in a semi-supervised manner. More specifically, the student model not only distills the intra-domain semantic knowledge by encouraging prediction consistency but also exploits the inter-domain anatomical information by enforcing structural consistency. Consequently, the student model can effectively integrate the underlying knowledge beneath available data resources to mitigate the impact of source label scarcity and yield improved cross-modality segmentation performance. We evaluate our method on MM-WHS $2017$ dataset and demonstrate that our approach outperforms the state-of-the-art methods by a large margin under the source-label scarcity scenario.

# Link to paper

SharedIt: https://rdcu.be/cyhL6

# Reviews

### Review #1

• Please describe the contribution of the paper

The authors propose a UDA method for the very challenging scenario wherein labeled data is scarce on the source domain. The method is based on image-to-image translation networks – nowadays a common methodology for UDA in cross-modal medical imaging – and teacher/student networks for knowledge distillation.

Authors focus on the problem of heart chamber segmentation, comparing the proposed method with other SOTA techniques and conducting an ablation study. The presented results are quite compelling in favor of the proposed method.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The fusion of knowledge distillation and UDA using image-translation is, as far as the reviewer is concerned, a novelty in the field that seems to yield very promising results in the few-shot scenario in the source domain.

Experimental results, even if limited to a very narrow scope of heart MRs, are well-written and compelling.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The methodology in this manuscript is very unnecessarily complex. The reviewer finds that Section 2 could be considerably simplified without sacrificing important details of the proposed method.

Authors should clarify why no results are reported for MT-UDA starting from 16 samples, while SIFA and PnP-AdaNet do have these results shown in Table 1.

• Please rate the clarity and organization of this paper

Satisfactory

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

As long as authors indeed place a link to the code in the final manuscript, as they claimed in the replicability form, there are no reproducibility concerns.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Sections 2.1, 2.2, 2.3, Fig. 2 and some parts of Fig. 1, while apparently technically correct, are unnecessarily confusing. The reviewer strongly recommends authors to rewrite these sections and redesign the figure to improve readability.

The experimental setup could be improved by also including results from CT to MR volumes. The validation of the proposed method would also be much more compelling if other organs in few-shot scenarios were included, such as the CHAOS challenge dataset or other public multiorgan CT/MR datasets.

Implementation details such as architectural details, optimizer choice or hyperparameters could be replaced by a simple link to the code and/or a manuscript with supplementary material, as the authors marked that it would be made available in the reproducibility form. This could give space to a more organized description of the proposed method.

At last, authors must spell check the text for a potential camera ready version. There are multiple instances of errors in conjugation and article use and a few more serious problems with confusing sentences that seem to originate from revisions from an earlier draft of the manuscript (e.g. sentences stitched together, leftover syntax after rewriting, etc).

• Please state your overall opinion of the paper

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the results are compelling and the methodology seems technically correct, the presentation of the method could surely be improved. Mathematical notation and figures are unnecessarily complex and have readability issues.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Somewhat confident

### Review #2

• Please describe the contribution of the paper

This paper solves the problem of unsupervised domain adaptation in a source-domain label scarcity setting. Conventional UDA does not perform very well when source domain annotations are scarce. The authors propose MT-UDA to enforce semantic and structural consistency. There are two teacher models and a student model. The models are trained with data from source and target domains in a semi-supervised manner.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Clear motivation of the problem i.e. UDA in limited source labels setting.
2. For experiments, the authors compared with a lot of baseline models, which is a plus.
3. The proposed method achieved a large performance improvement on MM-WHS 2017 dataset.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The authors should discuss other semi-supervised methods for example disentanglement methods e.g. [1, 2] that also leverages cross-domain information for semi-supervised learning.
2. It is a standard way using DCAM to address appearance shifts at least in [19, 25], isn’t it? The authors should clarify this. Even the two consistency losses, the authors should clarify the novelty. The ‘perturbation-consistency’ loss and ‘structural-consistency’ loss seem incremental in the teacher-student framework. This concern comes to the missing self-contained descriptions of the baseline models.
3. The proposed method requires two-step training 1) DCAM module training for augmentation 2) segmentation and enforcing consistency by training two teacher models and one student model. Comparing to end-to-end models e.g. [1], this is a drawback. Also, the proposed method needs to train a lot of networks that are not used in inferece stage.
4. For experiments, I would expect the authors to also swap the source and target domains for more comprehensive experiments i.e. what about CT scans as the source domain and MRI scans as the target domain.

[1] Chartsias, Agisilaos, et al. “Disentangled representation learning in cardiac image analysis.” Medical image analysis 58 (2019): 101535. [2] Yang, Junlin, et al. “Unsupervised domain adaptation via disentangled representations: Application to cross-modality liver segmentation.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2019.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors clicked almost all ‘Yes’ for questions in the reproducibility response, which is a plus.

The authors use UNet as the backbone and there are clear desciptions about the implementation. I believe this work is reproduciable.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The authors should also discuss/compare with other semi-supervised methods including disentanglement methods, self-supervised methods, etc.

This is not major but I also expect the authors include the experiments of CT as source domain. Moreover, what about using1, 2, 5, etc labeled source-domain scans as training data? This will be useful to demonstrate the performance improvement.

• Please state your overall opinion of the paper

borderline reject (5)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall, the ideas of the losses make sense and well-motivated. The results are also impressive. I am more familiar with semi-supervised learning methods. This paper is based on teacher-student framwork and the authors extend the setting to semi-supervised learning. I would appreciate if the authors include more discussions on semi-supervised approaches. Also, I highly suggest the authors give more self-contained descriptions about the baseline models and clarify the differences between this paper and other methods, and hence clarify the novelty. I also have the concerns on the mentioned drawback of the method i.e. multi-stage training.

• What is the ranking of this paper in your review stack?

4

• Number of papers in your stack

5

• Reviewer confidence

Somewhat confident

### Review #3

• Please describe the contribution of the paper

the paper presents a method for semi-supervised learning for unsupervised domain adaptation. In particular, they extend the mean teacher training, which is a well established semi-supervised learning method to unsupervised domain adaptation in two ways. First they use cylceGAN to transfer the target domain to the source domain and use knowledge distilation (aka student-teacher training) on the source domain, which they call semantic knowledge transfer Second, they use the fact that the output of an image from source domain should have the same output than its transformation in the target domain and vice versa. Based on this assumption they use another student-teacher training to distill this knowledge, which they call structural knowledge transfer.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• The idea of knowledge distilation between domains to preserve anatomical information is interesting but not very novel. However, generally , like the approach.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• the paper is not structured well and not easy to read. Its not easy to identify the contributions of the paper and hard to contrast this method with previous work to identify the novelty.
• Its hard to comment on reproducibility since the authors do not provide mean and std across multiple runs with different seeds. -CycleGAN could hallucinate or remove discriminative features if there is a concept drift between the two domains. How do you think your method is able to cope with that.
• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The implementation details are mentioned. Ablation study is done. However, the paper fails to report mean and std for the obtained scores across multiple runs with different seeds.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
• Why in the structural knowledge transfer objective, the student model is on the source domain and teacher is on target? Why not iterate between the two? -Since the objective of UDA is to have good performance on the target domain, why not apply the semantic knowledge transfer in the target domain as well?
• To me, DCAM is the same as cycleGan. CycleGan also learns to transfer in both directions. Can you clarify how they are different
• I suggest the authors to elaborate more on the related work and the baselines they use and mention how their method differentiates with the baselines. For example its not clear to me what UA-DM does? or what it stands for.
• Can you please explain this sentence? “. We also directly test the U-Net trained on 4 labeled CT scans from the target domain as our lower bound, referred as W/o Adaptation-4.” Do you have labeled data for the target domain?
• Please state your overall opinion of the paper

borderline accept (6)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

There is a lot going on with the paper. The paper has limited novelty but uses a mixture of well known methods in a nice way. The ablation studies support the effectiveness of the contributions.

• What is the ranking of this paper in your review stack?

3

• Number of papers in your stack

5

• Reviewer confidence

Very confident

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper investigates how to efficiently leverage unlabelled data from both source and target domains with limited sources annotations for cross-modality image segmentation. The reviewers recognize the merit of the proposed method while raising concerns requiring to be addressed. The concerns include reporting experimental results under certain setup (such as the same number of samples, swapping domains from CT to MR results), identifying novel contributions clearly from previous work (such as CycleGAN), and differentiation from baselines, etc.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

# Author Feedback

Thanks for the positive feedback! We are encouraged that reviewers found the source-label scarcity scenario very challenging(R1), the motivation clear, the idea well-motivated(R3) and interesting(R6). We are pleased that MT-UDA is recognized to be a novelty in the field(R1) and well-established(R6), achieving very promising(R1) and impressive(R3) results, while a large performance improvement(R3) supports the effectiveness of the contributions(R6). We address reviewer concerns below.

1.Experimental results under certain setup (R1,R3)Different training data sizes Can compare different methods using 16 samples(full), however, SSL settings would be more appropriate since our paper focuses on integrating SSL into UDA under source label scarcity. The architectures like SIFA are tailored specifically for UDA, while commonly used U-Net is employed in MT-UDA. Using 1 training sample, we can achieve 57.6±1.8(dice%), which further demonstrates the superiority of MT-UDA . In future, we will use few scans, e.g., 1, 2 for comparison.

(R1,R3)Swap domains The experiments on MR-CT have shown the effectiveness of MT-UDA. Due to the page limit, we will explore CT-MR in future work.

(R1)Other datasets Like SIFAv1 and v2, we will test MT-UDA on CHAOS in future work.

(R6)Not report mean(std) Can be provided but we report the mean performance across subjects to be consistent with MMWHS challenge and previous work like SIFA.

1. Significance and Novelty (R3,R6)Differentiation from baselines The SOTA UDA methods do not consider source label scarcity and suffer from performance degradation, while we introduce SSL to ease the challenging issue. Different from other SSL methods, we integrate knowledge transfer from both domains in MT to further facilitate UDA and SSL. With extensive experiments, MT-UDA has promising results with very few source labels. We will enrich the descriptions about baselines in the final version.

(R3,R6)Difference between DCAM(ours) and CycleGAN CycleGAN could remove discriminative features when domain shift. As stated in Sec2.1, we use stricter discriminators to differentiate different images, which can better help generate intermediate domains for knowledge transfer in MT. In Fig.3, the structural information can be well kept with DCAM. We will improve the clarity of the contributions.

(R3,R6)Clarity on knowledge transfer We propose semantic knowledge transfer for SSL and leverage unlabelled source data to improve source domain segmentation, while structural knowledge transfer is proposed for UDA, since both domains share the same structural information. With no available target labels, the student model is trained on source domain. If iterating between two domains, the training process may be negatively influenced since teacher model is updated with EMA weights of student model.

(R3)More discussions on SSL methods Thanks for the valuable suggestions. We have studied the two papers which are related to UDA and SSL but source label scarcity in UDA is not included. These methods inspire us on disentangled representation learning, and we will include discussions on more SSL methods, e.g., disentanglement in the final version.

(R3)Influence of two-stage training Thanks for raising the important point, but we believe the trade-off between one-stage and two-stage methods would be outside the scope of the work and does not compromise the performance of MT-UDA. We can study one-stage MT-UDA which can be interesting for further research. Besides, as explained in Sec 2.2, EMA weights are used without the mentioned redundant networks.

(R1, R6)Improve the presentation of the method Apologize to have typos on the description of lower bound in Sec3. No target CT labels are used. The U-Net was trained on MR scans. Thanks for the catch! We will perform a further proofreading to avoid confusing sentences, unnecessarily complex, etc, and simplify implementation details to leave space for a more organized description of methodology.

# Post-rebuttal Meta-Reviews

## Meta-review # 1 (Primary)

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have addressed most of the concerns raised by the reviewers and promise to provide more details on significance and novelty in the final version. Overall, this paper is a good benchmark for exploring UDA with SSL.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

## Meta-review #2

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

From the reviews I get the impression that this is valuable work, obscured by unnecessary complexity and a suboptimal presentation. The reviewers pointed out some specific issues needing clarification, which the authors address in the rebuttal. Some large criticisms such as swapping source and target domains, using different training dataset sizes, and using different datasets are deferred to future work in the rebuttal.

Based on the initially relatively high scores, and the consensus that this is interesting and well-motivated work, I suggest acceptance despite the fact that not all the points in the rebuttal have been addressed. I hope the authors will take the time to improve clarity of the manuscript where pointed out by the reviewers.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

## Meta-review #3

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This manuscript presents a deep semi-supervised learning framework for image segmentation, and it is able to deal with limited data annotation in the source domain. The proposed method seems to be effective in cross-modality cardiac image segmentation. The rebuttal has addressed most of the main concerns from reviewers, such as novelty of the method and experimental setup, and the authors say that they will improve the clarity of the paper.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7