Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Spyridon Thermos, Xiao Liu, Alison O’Neil, Sotirios A. Tsaftaris

Abstract

Acquiring annotated data at scale with rare diseases or conditions remains a challenge. It would be extremely useful to have a method that controllably synthesizes images that can correct such underrepresentation. Assuming a proper latent representation, the idea of a “latent vector arithmetic’” could offer the means of achieving such synthesis. A proper representation must encode the fidelity of the input data, preserve invariance and equivariance, and permit arithmetic operations. Motivated by the ability to disentangle images into spatial anatomy (tensor) factors and accompanying imaging (vector) representations, we propose a framework termed “disentangled anatomy arithmetic”, in which a generative model learns to combine anatomical factors of different input images such that when they are re-entangled with the desired imaging modality (e.g. MRI), plausible new cardiac images are created with the target characteristics. To encourage a realistic combination of anatomy factors after the arithmetic step, we propose a localized noise injection network that precedes the generator. Our model is used to generate realistic images, pathology labels, and segmentation masks that are used to augment the existing datasets and subsequently improve post-hoc classification and segmentation tasks. Code is publicly available at https://github.com/vios-s/DAA-GAN.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87199-4_15

SharedIt: https://rdcu.be/cyl3R

Link to the code repository

https://github.com/vios-s/DAA-GAN

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes a novel method for cardiac image synthesis based on a Disentangled arithmetic notion. As a preprocessing step images are encoded into disentangled features that represent the anatomy of the heart and the image characteristics. Feature arithmetic is done on the anatomical factors , then Gaussian noise is introduced to them and a generator composes the end image. A discriminator and a pathology classifier are used to aid training
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Interesting and to the best of my knowledge novel framework
- Very well analyzed in terms of experiments, ablations and comparisons to other existing methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Not so much weaknesses but definitely questions that arose.
- The anatomical feature arithmetic, how is it conducted ? Is it simple arithmetic on the feature vectors of the anatomy ? If yes i take it that it is assumed a Euclidean underlying manifold, my concern is that in practice this assumption does not always stand, see Latent space oddity, Arvanitidis et al 2017. It would be a good idea to analyze the implicit curvature of the empirical manifold the preprocessing method creates. This way the arithmetic would be far more accurate and theoretically sound.
-In general the concept of the latent space arithmetic could be better explained in the paper
- Technically speaking the Gaussian noise is not injected via residual connections, since residuals require the same input to be added to the output of the layer(s).
- AdaIN layer, please write it in full the first time it is introduced it in the text.
- At the top of page 5, it is mentioned that C is of the same dimension as image I, that means they are 224x224, how does this affect the computational performance of the algorithm? How does it scale as images scale ?
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The main paper together with the supplement appear to provide the required information for reproducing this paper, if the authors improve the description of the latent arithmetic then I can see no issue for reproducing this paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Mainly covered with my comments above
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a solid paper. It proposes a novel method, that is evaluated in a very good manner and compared both in experiments and in theory with the related literature. I’ve noted a series of minor points that could be addressed to improve it event further. But I’m more than happy with the current quality of the paper, hence my acceptance rating.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

The authors proposed a novel framework that can generate realistic cardiac images with certain level of controllability of anatomical factors for data augmentation. First, disentangled anatomical representation are learnt from SDNET. Next, based on these controllable factors, a series of operations are designed to generate realistic images for augmentation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

+The proposed method is solving an important problem of data augmentation for medical image with controllable anatomical factors, which may be very interesting to the MICCAI community. +Texts are well written and easy to follow. +Experiments are good with multiple runs and statistic test. Several important aspects of data augmentations are discussed in the results.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

-If I understand correctly, the disentanglement and the semantic meaning of anatomical factors from SDNET is not guaranteed. For example, you can not force the model to learn the factor of “left ventricular” in one specific channel. Did the authors encounter situations where no good factors are found for swapping? How did the authors chose good anatomical factors from SDNET? Eyeballing or using some metrics? -I have some concerns for applying the loss of Pathology Classifier. When the proposed networks are optimized by this loss, does it encourage the network to generate images that are easier to be classified? This somehow against the goal of data augmentation and may introduce bias when evaluating.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

method: clear dataset: open dataset and clear evaluation: clear
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

SDNET also infer a modality representation along with the anatomical representations. I might miss it, but it seems that the proposed network doesn’t use that? Why?
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Very good paper aiming at an important problem. Novel and interesting methods. No obvious flaws.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

This paper tackles an important problem in medical imaging—generating scarce samples to facilitate downstream tasks (e.g., classification). The proposed DAA-GAN model generates new images in a target domain (usually with pathologies) in a controllable way; by first disentangling anatomical and imaging factors, manipulating the anatomical factors by plugging in pathological data, then re-entangling these factors using an image generator. Evaluations include both classification and segmentation tasks using a variety of cardiac datasets. Results are superior to comparison methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Comprehensive evaluation. Two cardiac datasets (ACDC and MM) with a fairly large number of subjects were used in evaluation. Quantitative comparisons including AC-GAN and SPADE show promising results after using the proposed data augmentation.
- Well established methodology. The paper combines several existing methods in the literature (mainly SDNet and styleGAN). The way of using injective noise to handle topology issues after label arithmetic is novel.
- Ablation study to show contribution of each network components.
- Good reproducibility. The authors have provided essential experimental and implementation details. Datasets used in evaluation are publicly available.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The proposed method depends on the disentangling outputs from another model, SDNet, to obtain anatomical and imaging representations. However, SDNet is a “synthesis-segmentation” hybrid model, and it requires training labels. It is unclear how many training samples are needed to train an SDNet to produce reliable anatomy labels for DAA-GAN.
- The fact that multiple labels from multiple populations can be combined freely (after registration) is intriguing. However, this could also be the limitation of the work if considering that some combinations may never be observed in real data (wrong topology), despite of the noise injection. I guess this also explains why the anatomy in Fig. 4 can only be changed by a limited amount. Therefore, there is a risk of generating out-of-real-distribution data that could bias the downstream tasks. Moreover, the biggest assumption in this paper is that replacing a healthy label with a pathology label could generate an image with the corresponding disease. This assumption is questionable.
- One missing element in the evaluation is to quantitatively evaluate (pix-to-pix) the other regions, i.e., 1-M regions. It is necessary to show that other anatomies are preserved after manipulating anatomical factors of the ROI.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Good. The authors have provided essential experimental and implementation details. Datasets used in evaluation are publicly available. Code is available upon acceptance. A little concern is that the proposed DAA-GAN depends on the outputs of SDNet. The fact that two deep learning models need to be trained for data augmentation could potentially limit the re-implementation of the proposed method.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
Some comments for clarity:
- Figure 1 top row was hard to understand when first referenced in the Introduction. A more detailed caption might help.
- Figure 2 notation \mathcal{L}_{att} was not defined or was mislabeled.
Several suggestions for future work:
- Quantitative evaluations on the other regions—1-M regions—are necessary. It is really important to show that other anatomies are preserved after manipulating anatomical factors within the ROI. A suggested reference on hallucination: Cohen et al. Distribution matching losses can hallucinate features in medical image translation. MICCAI 2018.
- The effects of noise injection network can be further studied, as this is one of the core innovations of the paper. For example, after the model is trained, what would the image look like if increase the std of the Gaussian noise? A suggested reference: Zuo et al. Synthesizing Realistic Brain MR Images with Noise Control. MICCAI-SASHIMI 2020.
- How to extend this work to a large number of labels and control topology in the augmented images?
- Extension to other anatomies, e.g., brain MRIs.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite the limitations pointed out above, this research overall is very interesting. The paper is well-written. The novelty of including injective Gaussian noise to tackle label overlapping issues and the completeness of the evaluations are the key factors that make this work stand out from other submissions.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

All the reviewers indicate “Accept” for the paper, although they also raise concerns. From my reading of the paper I agree there are some concerns in the paper in relation to description of latent arithmetic, and loss from pathology classifier. While there are merits of the paper I believe the authors should clarify those in the final version.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Author Feedback

We thank the reviewers and the meta-reviewer for their valuable comments. Brief responses and clarifications to the most critical ones follow. Straightforward comments will be nevertheless addressed in the camera-ready version.

R1: Rather than vectors, our arithmetic is based on spatial (tensor) factors and is realized as a swapping operation between factors of different patients. However, building on reviewer’s comment, in future work we plan to learn an explorable manifold based on spatial factors thus enabling arithmetic operations between factor values. Regarding the resolution of the input anatomical factors, the current version of DAA-GAN requires the preservation of the original image (i.e. patient’s MRI slice) resolution, thus the computational complexity of the model scales linearly with the image size. However, our DAA can be combined with any anatomy-conditioned generator architecture that may receive down-sampled spatial latent variables and use up-sampling layers to generate high-fidelity images.

R2: We agree that DAA-GAN is limited by the “correctness” of the input, hence the ability of SDNet to effectively disentangle the anatomical factors. As a matter of fact, SDNet learns to encode specific factors to specific channels, and since the factors are spatial and semantic (due to the supervision with semantic masks), they are interpretable. For DAA-GAN training/testing we used factors generated by samples with high DICE score (i.e. high segmentation performance). We pretrained SDNet in a fully supervised setting, without the discriminator. Regarding the pathology classification loss, we agree with the comment that the classifier encourages “easy-to-classify” pathology generation. However, we believe that this effect is mitigated by the discriminator, which encourages in-distribution sample generation. In fact, we choose to trade intra-pathology variance (e.g. severity, as a very severe pathology is easier to classify) for introducing inter-pathology variance through the factor mixing process. Nevertheless, this observation is very interesting, and we plan to further explore the impact of the classifier in future work. Regarding the imaging factors, since SDNet encodes them in a vector latent variable using a variational setup, DAA-GAN can either sample imaging information from the learned distribution or use the extracted imaging vector of the corresponding input. In this work we follow the latter approach.

R3: Regarding the limitations introduced by freely combining labels from multiple populations, we believe that realistic medical image generation is an open challenge that is far from being addressed solely by quantitative evaluation, i.e. a qualitative assessment from experts/clinicians is always required. However, our pathology classifier and our discriminator play an adversarial game with the generator, as the former encourages easy-to-classify pathologies and the latter penalizes out-of-distribution samples. We believe that this game, coupled with the mixing of existing (and not sampled from a Gaussian prior) anatomical factors, is an important step towards realistic medical image synthesis

back to top

Controllable cardiac synthesis via disentangled anatomy arithmetic