Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Dawood Al Chanti, Diana Mateus

# Abstract

This paper addresses the domain shift problem for segmentation. As a solution, we propose OLVA, a novel and lightweight unsupervised domain adaptation method based on a Variational Auto-Encoder (VAE) and optimal transport (OT) theory. Thanks to the VAE, our model learns a shared cross-domain latent space that follows a normal distribution, which reduces the domain shift. To guarantee valid segmentations, our shared latent space is designed to model the shape rather than the intensity variations. We further rely on an OT loss to match and align the remaining discrepancy between the two domains in the latent space. We demonstrate OLVA’s effectiveness for the segmentation of multiple cardiac structures on the public Multi-Modality Whole Heart Segmentation (MM-WHS) dataset, where the source domain consists of annotated 3D MR images and the unlabelled target domain of 3D CTs. Our results show remarkable improvements with an additional margin of 12.5% dice score over concurrent generative training approaches.

SharedIt: https://rdcu.be/cyl38

# Reviews

### Review #1

• Please describe the contribution of the paper

In this paper, the authors present a very novel approach unsupervised domain adaptation for image segmentation. The approach uses a VAE for generating cross-domain latent space representations and a loss term based on optimal transport theory to make sure there presentation are aligned. Experiments performed using the Multi-Modality Whole Heart Segmentation dataset (with cardiac MR and CT images) show that the proposed technique considerably outperforms the competition at the task at hand despite being rather simple and lightweight.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper is generally clear and very well-written.
2. The proposed approach is very interesting and novel: the introduction of a loss term based on optimal transport theory is quite inspiring.
3. The approach is rather simple and lightweight: the overall structure of the neural network is surprisingly simple and the number of parameters small.
4. Literature review on the subject is ample and well-reported.
5. The methodology is explained in quite some detail, especially considering the space limitations of a MICCAI paper.
6. The experimental set-up is sound and extensive, with many (6+) state-of-the-art methods implemented as competitors.
7. The results are impressive and suggest that a substantial improvement (+12% Dice) over the state-of-the-art was achieved.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The authors spend quite some time discussing Ouyang et al. [17], but no direct comparison against their technique is performed.
2. The “Experimental Settings and Results” is often unclear, with some of the competing methods/implementations barely described. In particular: ⁃ It is unclear how “oracle VAE”, “oracle U-Net” and “VAE-0” work and how they were implemented; ⁃ it is unclear how CycleGAN and AdaOutput were deployed for this task; ⁃ DECM-1 is mentioned only as acronym and with no reference.
3. Minor issues: ⁃ Page 7, “while OLVA-8 achieves better DSC score 79%” is unclear as a sentence; ⁃ I would appreciate for “SOA” to be spelled out, at least the first time it’s used.
• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors used a public dataset. No statements about releases of code are included, but details regarding the architecture of the implemented neural networks and training process are somewhat included. As a consequence, the degree of reproducibility is medium.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. This is a very interesting piece of work and quite an accomplishment in the field of unsupervised domain adaptation.
2. I believe that a better description of the competitors implemented in the experiments is needed to aid the reader. Also, Ouyang et al. [17] seems like a relevant contestant to include (at least in a future potential extension of this work).
3. A thorough analysis of the statistical significance of the achieved results is needed, at least for future extensions.

strong accept (9)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents a very novel and interesting approach for a difficult task like unsupervised domain adaptation. The method has the great advantage of being rather simple and lightweight. An extensive experimental set-up seems to indicate that the proposed approach considerably outperforms many state-of-the-art competitors.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Very confident

### Review #2

• Please describe the contribution of the paper

The paper proposes to align the latent variables of variational autoencoders using the Wasserstein distance. Empirical results on the MM-WHS dataset show superior performance when adapting from MR to CT with 16 training subjects.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) The VAE-style end-to-end training with aligned Wasserstein metric is novel. (2) The proposed model achieves superior adaptation performance with 16 training subjects and competitive results with 1 training subject.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

(1) The writing of section 2.2 “Optimal Transport for Latent Vector Alignment” is rushed and not very clear. The Wasserstein distance has many variations and it is not clear how the authors reach Eq. 4. Moreover, the second half of Eq. 4 is not related to optimal transport at all and somehow the $\beta$ in Eq. 4 is dropped in Eq. 5. (2) The paper argues the proposed VAE approach overcomes the limitations of adversarial methods, but does not present any ablation study to examine alternatives of the optimal transport component of the proposed algorithm. For example, replacing the Wasserstein loss with a MMD or Wasserstein-GAN.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors described the hyperparameters in the paper, but not sure how the EM-style alternative optimization was carried out exactly.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

What does SOA and DECM stand for? Why different baselines are used for different subject numbers (16 and 4)?

borderline accept (6)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The methodological contribution of aligning the latent variables with optimal transport is novel and the paper demonstrate superior performance on the MM-WHS dataset while lacking thorough ablation studies.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

5

• Reviewer confidence

Somewhat confident

### Review #3

• Please describe the contribution of the paper

This manuscript presented an unsupervised domain adaptation method to solve the domain shift problem for segmentation. Specifically, the authors utilized a VAE model and an optimal transport loss to remove the domain shift between the source domain and target domain - Cardiac CT and MRI images.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors evaluated their method on a public dataset and they achieved the best performance compared with other existing methods.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• The novelty is limited. Because the VAE part is following their citation [7], while the OT loss is following [1]. Actually, I don’t think you are correctly using OT theory, please see the following comment.
• OT is a very common optimization problem that includes one transport matrix and one cost matrix. In your case, you haven’t clearly defined the transport matrix. In my point, it’s just a very common distance-based optimization solution, not an OT problem. In addition, the mathematical formulation is questionable. For example, in eq. (3), D denotes the distribution, but I don’t understand what is “D(;;***)”, how you can define a distribution function like that?
• The authors did not give enough detailed information about their network to aid others to reproduce their work. Besides, the authors also should report more details about other methods parameters number when you claim your network is lightweight.
• Please rate the clarity and organization of this paper

Poor

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors didn’t give enough information about their network setting. And the authors didn’t report the valuable comparisons to prove their method is lightweight.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
• Can you emphasize the clinical motivation of unsupervised domain adaptation?
• Can you clearly point out which method is 95 million params, and also list others.
• Fig. 1 should be improved. Currently, it’s not very clear to present what you want to say.
• In Related Works section, the authors listed four points to present the limitations about the existing methods, while the point (1), (3) and (4) were repeated. Please modify.
• I don’t think you correctly used OT to solve your problem, because you didn’t define the transport matrix while only defined a distance function. - It’s a very common distance-based optimization problem.
• Can you give the reason why input three image slices but only utilize the middle one?
• Add another table to show all the methods’ parameter numbers.

reject (3)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
• The technical description and mathematical formulation are questionable, such as OT problem.
• The evaluation is not strong enough since they did not give enough experiment setup information.
• What is the ranking of this paper in your review stack?

3

• Number of papers in your stack

3

• Reviewer confidence

Confident but not absolutely certain

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The authors proposed an unsupervised domain adaptation approach, which employed VAE to learn a domain invariant representation. Besides, they utilized an optimal transport (OT) based-measurement to align the latent vectors from the target and source domains. They evaluate the proposed strategy on a public cardiac image dataset, i.e., MM-WHS dataset, and proved promising improvements comparing other existing methods. Generally, the proposed method is novel, thought more clear explanation will be better in the method section. Besides, it is unclear for me why the authors did not present a same label structures for MR and in Fig. 1. It would be more intuitive to understand that “both modalities share the same label space” by doing this.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

# Author Feedback

We thank the reviewers for their time and encouraging comments. We take advantage of the rebuttal to make some clarifications Optimal Transport (OT) formulation and the experimental set-up.

Regarding the OT, we remind here that we model the domain adaption problem for multi-modal segmentation as the joint optimization of a Variational Autoencoder (VAE) and an Optimal Transport problems (see Sec. 2 for more details). This formulation, as we detail bellow, cannot be reduced to a “distance based optimization” as suggested by R3. While the VAE seeks for a meaningful distribution representing each domain, the alignment of the latent vectors from the two domains is achieved solving a relaxed OT problem (Kantorovich,1942). The objective of this OT is to transfer with minimum transportation cost the probability measures from the source domain $\zeta^{s}$ to the probability measures in the target domain $\zeta^{t}$. To optimize this transfer we define the transportation matrix $\gamma$ (end of Sec 2.2. and Eqs 4 and 5), as well as the transportation cost $\mathcal{D}$ (c.f. Eq 6). We will mention these definitions earlier in the paper to ease the understanding of the OT problem.

Regarding the transition from Eq.3 to Eq.4 knowing that the Wasserstein Distance has many variations (R2), we mentioned in section 2.2, that the transition results from adapting “the Kantorovich OT formulation to the discrete case as in Eq. (4)”, we will add the references in our camera ready version to the following works:

• OT problem (Kantorovich, Leonid V. “On the translocation of masses.” Journal of mathematical sciences 133.4 (2006): 1381-1382.) - “Courty, Nicolas, et al. “Joint distribution optimal transportation for domain adaptation (2017).”

With respect to the functionality of the 2nd term in Eq.4, knowing that it is not related to the OT problem (R2), we confirm this term is not related to the OT problem but to the Kullback Leibler distance minimized for the VAE. In practice, this term is added to the cost function $\mathcal{D}$ which becomes a weighted combination, solved with an alternate the optimization of the two terms as described in section 2.3.

The drop of the $\beta$ term in Eq.(6) mentioned by R2 was a typo, thanks for point it out.

One important clarification on the experimental validation is the lack of comparison to Ouyang et.al. (R1). In fact we do compare against this work in Table-1, where is called DECM. We will make this clear in the paper. We will also clarify the implementation details of the other methods used for comparison as suggested by R1 and R2.

Finally, regarding the Meta-reviewer’s comment on why not presenting the same label structures for MR and CT in Fig. 1, our intention was to show that even if the label space is shared no registration between the two modalities is required.