Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Euijin Jung, Miguel Luna, Sang Hyun Park

Abstract

Conditional Generative Adversarial Networks (cGANs) are a set of methods able to synthesize images that match a given condition. However, existing models designed for natural images are impractical to generate high-quality 3D medical images due to large memory requirements. To address this issue, most cGAN models used in the medical field process either 2D slices or small 3D crops and join them together in subsequent steps to reconstruct the full size 3D image. However, these approaches often cause spatial inconsistencies in adjacent slices or crops and the changes specified by the target condition may not consider the 3D image as a whole. To address these problems, we propose a novel cGAN that can synthesize high-quality 3D MR images at different stages of the Alzheimer’s disease (AD). First, our method generates a sequence of 2D slices using an attention based 2D generator with a disease condition to keep the computational requirements low, and then 3D space consistency is enforced by the use of a set of 2D and 3D discriminators. The key motivation for the use of a 3D discriminator is to generate continuous sequences of 2D slices in the same mini-batch. Moreover, we propose an adaptive identity loss to properly transform features which are relevant to the target condition. Our experiments show that the proposed method can generate smooth and realistic 3D images at different stages of AD and the image change with respect to the condition is better than the images generated by existing GAN based methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87231-1_31

SharedIt: https://rdcu.be/cyhVe

Link to the code repository

https://github.com/EuijinMisp/ADESyn

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The paper presents a method to generate synthetic realistic MRI images with a given stage of Alzheimers Disease. It is based on a 2D conditional GAN with a 3d consistency discriminator.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper proposes to enforce 3d consistency by adding an a additional discriminator which is in charge of classifying consecutive 2d slices as real or fake. This is an interesting idea since most 2d GANs have the issue of inconsistency at the z axis. The authors also proposes to use an attention mechanism which seem relevant for the task. Finally the conditional input (AD stage) seems to interpolate smoothly which is a very desired feature.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The reproducibility of the paper is a bit weak. Although most of the details of the architectures plus parameters are given, some are missing. e.g. kernel sizes, padding, strides. Some ablation studies are missing which could to determine the impact of different modules such as attention vs no attention, 3d discriminator vs no 3d discriminator, etc. No statistical significance tests were made even though it is checked in the reproducibility list.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility of the paper is a bit weak. Although most of the details of the architectures plus parameters are given, some are missing. e.g. kernel sizes, padding, strides. The authors do not engage themselves to share code/models which makes an exact results reproduction not possible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The paper presents very good ideas and it is very clear. Some improvements could be to add the variable names used in the paper to Fig.1 e.g. x2d to the input, s2d to the output of the generator, and so on. An analysis of the attention maps could be interesting; e.g. what is the model paying attention to? Finally some ablation studies could improve the paper e.g. what happens if we remove the attention module or the 3d discriminator.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents very interesting ideas and the results seem very promising. The idea of using a 3d coherency discriminator seem very relevant and could potentially be used in a wide variety of tasks.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

The paper proposes a conditional GAN that generates brain images at different stages of Alzheimer’s disease neurodegeneration. It uses a 2D discriminator, and applies 3D space consistency via 2D and 3D discriminators.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well-written and the approach is easy to follow
- The paper is well-structured and the method is simple and intuitive.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The paper is missing a crucial reference from MICCAI: Degenerative Adversarial NeuroImage Nets: Generating Images that Mimic Disease Progression (Ravi et al. MICCAI 2019), which performs almost the same task on the same ADNI dataset. This paper should be discussed in the introduction and used as a baseline in Table 1.
- “From Set A we generated 1203 synthetic subjects to form Set A across all three target conditions(NC, MCI, and AD)”, how is this done?
- The paper claims: “but previous methods often generated partially shrunk gray matter or hippocampus.” with no citations. Which methods?
- Results Table 1: i) It would be useful to have comparison of results fixed at a single stage of neurodegeneration. ii) The method outperforms “Real images” in FID and GANtest. What does this mean? iii) No statistical analysis. iv) As the paper claims to perform “3D Medical Image Generation” (title), why is the evaluation not performed in 3D (e.g. calculate FID on 3D patches).
- The paper claims “existing models designed for natural images are impractical to generate high-quality 3D medical images due to large memory requirements.” (abstract and intro). No citations are provided and this is not discussed in the methods. Concatenating 2D slices within a minibatch is the same (memory-wise) as a single 3D patch with batch size 1, so it is unclear to me where the memory is being saved.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- Missing some training details, such as batch size and how weights are initialized.
- The number of the channels of each layer, written in figure 1 are too small. I suggest the authors put network details in the supplementary materials.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- Please see suggestions in section 4,6.
- Mention that lower FID scores is better.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well-written and the approach is simple and intuitive. However the paper ignores a crucial reference from MICCAI 2019, which performs almost the identical task on the same ADNI dataset. Furthermore some of the claims the paper makes are not fully justified.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

The paper proposed a framework on top of cGAN to enable Alzheimer’s Disease stage synthesis. The major novelty of this paper is adding a 3D discriminator for better spatial consistency.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The experiments are thorough, and the qualitative and quantitative results are better than comparing methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The idea is relatively less novel. From my understanding, only the 3D discriminator is novel.
2. The use of attention mask need to be justified. What does this attention mask mean? Why disentangle a attention mask from the generated image and combined together is better? Why this attention mask can also be used as weights in loss? The current description of attention mask does not make sense to me.
3. The writing of this paper needs improvement. The motivation of the method need to be justified. The introduction mainly discuss the spatial inconsistency by 2D discriminator and computational consumption by 3D discriminator, but the experiments mainly demonstrates the ability of synthesize images of given disease stage.
4. Error in equation 12, 13. The adversarial loss should not be the same sign as 11.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

public dataset, no code.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. Further justify the use of attention mask.
2. Revise the writing to make the method, experiments to be consistent with motivation. If the motivation is improving spatial consistency, should add ablation study on removing 3D discriminator.
3. Revise final objective functions.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The reason behind the method should be further justified.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

I recommend this paper for rebuttal due to the high variance in the reviewer recommendations and to give the authors the opportunity to address some of the points.

The reviewers comment positively on the writing and on the idea of using a discriminator to ensure 3D consistency. Common points raised by the reviewers are missing details for reproducibility, the lack of statistical analysis of results (despite having checked significance tests the box), and the lack of some references.

In particular R2 points out a crucial reference by Ravi et al., that in their opinion should have been mentioned and used as baseline. Furthermore, R2 raises an interesting question about the motivation of memory savings given that stacking 2D images on GPU is the same as using a 3D image in the first place.

Please make sure to address the above points in particular in the rebuttal.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

Author Feedback

We thank the reviewers.

(R3) “(Ravi et al.~) should be discussed and used as a baseline” We agree with this concern, but it was hard to implement because the method (i) includes several steps with missing details, (ii) did not provide a way to generate whole 3D MRI (entirely using 2D slices), and (iii) requires large MRI data acquired across different age groups to compute proposed losses. Instead, we compared our method with an autoencoder-based method(CAAE) and obtained significantly better performance. Ravi et al. mainly focused on 2D MRI generation with novel losses while we propose a novel architecture for generating realistic 3D MRI. So, it would be interesting to combine both methods in future work.

(R3) “Concatenating 2D slices within a minibatch is the same as a single 3D patch with batch size 1” This is correct if the kernel of 3D CNN has a size of 1 for the z-axis, but it is equivalent to 2D CNN. If 3D convolutions are used, the number of parameters and the size of intermediate feature maps largely increase. Our method has the advantage of reducing them.

(R1, 3, 4) “Lack of statistical analysis” For FID and KID scores, our method had statistically significant differences (p-value < 0.05) compared to CAAE, AttGAN, and StarGAN. Interestingly, there were no significant differences between our method and GANimation. However, GANimation’s performance on GANtrain and GANtest was lower, implying poorer conditional transformations.

(R1, 3, 4) “Missing references and details for reproducibility” We will add the missing references and release our code.

(R3) “How are 1203 synthetic subjects generated?” Set A’ of 1203 subjects was generated by applying 3 target conditions (NC, MCI, AD) to set A of 401 real subjects.

(R3) “Comparison at a single stage of neurodegeneration” The FID and KID results on a single class were similar to those in Table 1.

(R3) “Outperforming “Real images” in FID and GANtest” FID and GAN-test are neural network-based metrics that measure distribution difference and cannot quantify the difference levels exactly. Thus, the measurements between a set of real images and a set of high-quality generated images can be better than those between real images.

(R3) “Evaluation was not performed in 3D” FID and KID exhibit some bias with fewer samples ([2] Bińkowski et al. ICLR 2018). Thus, the evaluation along each of the 3 axes was more reasonable.

(R4) “Only 3D discriminator is novel” The idea might look simple, but we validated lots of settings and provided the optimal model design and losses. We discovered that good smoothness between slices could be achieved with the combination of a 2D generator and a 3D discriminator, and the attention-based generator was effective to constrain brain changes according to a given condition. Also, the proposed adaptive identity loss was a core factor to improve the image quality and conditional generation. We believe the proposed method can be easily adapted to existing 2D GAN-based methods.

(R4) “Attention mask need to be justified.” As AD progresses, the hippocampus and amygdala volumes shrink significantly, while some regions (e.g., the pallidum) do not change a lot. Attention mask allows the generator to modify the areas of the brain that need to change while keeping other areas intact. Thus, the generator does not need to learn how to generate the whole image, reducing its complexity and allowing it to focus on changes in the areas of interest.

(R4) “Inconsistency of introduction and demonstration in experiments” In the introduction, we presented that the goal was to synthesize high-quality 3D MRI according to a given condition while keeping computational requirements low. We developed a model based on a 2D generator to meet the lower computational costs, validated the image quality by FID and KID, and evaluated the conditional generation by GANtrain and GANtest in experiments. We believe it is consistent.

(R4) “Errors in Eq. 12, 13” We will correct them.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal answered most points raised by the reviewers. The discussion converges to the question whether the innovation of a 3D discriminator can be considered novel and the contentious statement that it is impractical to generate high-dimensional 3D data using GANs raised by R3 (under Q4). After my own reading of the paper, I agree with R3 since 3D ADNI images have in fact already been generated using a 3D GAN in prior work. For example, [A] (not cited in the paper) investigated generating AD images from MCI images for full brain MRs with sizes of 128x160x112 voxels. The question boils down to the resolution/image sizes used in this work, an important detail unfortunately not reported anywhere in the paper. Furthermore, it is a bit unintuitive how an why the 3D discriminator works. Naively one would think that the independent 2D slice generations would have no way of integrating the “teaching” signal by the 3D discriminator. There is no mechanism for the 2D generator to “remember” or look at what it did for the other slices. I have come to the conclusion that since the slice generation is deterministic, it may be possible for the 2D generator to learn to always do a particular thing such that it is consistent without knowing why. But I would have liked to see this point better discussed in the paper.

Despite a slightly misleading motivation, missing the related work [A] (which should be included in the final manuscript), and not sufficiently the discussing the intuition behind the approach, I like the idea of a 3D discriminator and believe it could open the door for better high resolution 3D volume generation using GANs.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

11

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper presented promising results in generating 3D MR images with the proposed 3D discriminator, but the novelty is limited and the way of improving 3D consistency by stacking 2D images then applying 3D discriminator is not very elegant. There could be more efficient methods for reducing the computational cost. Since the authors addressed most of the concerns in the rebuttal, I tend to recommend accepting the paper. If accepted, the authors should incorporate all discussion in the rebuttal into the final version.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

11

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposed a GAN model with 2D generator and 3D discriminator in order to reduce the expense of memory usage (2D slice generation) as well as keeping spatial consistency across slices (3D discriminator). The main technical contribution is such 2D-3D arrangement, which is incremental and more importantly, not well supported by the experimental validation. The proposed method claimed its advantages over other 2D and patch-based 3D GANs, however, it was not compared with any 3D GANs for medical image synthesis, neither in performance nor in computational costs. The four methods ([27, 8, 5, 18]) used in the comparison were originally designed for general purpose image translation and do not represent the up-to-date achievements in this field. The proposed method missed the comparison with many medical image synthesis methods, not just the one mentioned by Reviewer#3.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

19

back to top

Conditional GAN with an Attention-based Generator and a 3D Discriminator for 3D Medical Image Generation