Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Uddeshya Upadhyay, Yanbei Chen, Tobias Hepp, Sergios Gatidis, Zeynep Akata

# Abstract

Image-to-image translation plays a vital role in tackling various medical imaging tasks such as attenuation correction, motion correction, undersampled reconstruction, and denoising. Generative adversarial networks have been shown to achieve the state-of-the-art in generating high fidelity images for these tasks. However, the state-of-the-art GAN-based frameworks do not estimate the uncertainty in the predictions made by the network that is essential for making informed medical decisions and subsequent revision by medical experts and has recently been shown to improve the performance and interpretability of the model. In this work, we propose an uncertainty-guided progressive learning scheme for image-to-image translation. By incorporating aleatoric uncertainty as attention maps for GANs trained in a progressive manner, we generate images of increasing fidelity progressively. We demonstrate the efficacy of our model on three challenging medical image translation tasks, including PET to CT translation, undersampled MRI reconstruction, and MRI motion artefact correction. Our model generalizes well in three different tasks and improves performance over state of the art under full-supervision and weak-supervision with limited data. Code is released here: https://github.com/ExplainableML/UncerGuidedI2I

SharedIt: https://rdcu.be/cyl4S

# Reviews

### Review #1

• Please describe the contribution of the paper

This paper proposes a progressive GAN for medical image translation that is guided by the uncertainty map. Each GAN takes in the output of previous GAN along with the domain image to predict a much more refined target image.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The general idea of progressive GAN refining the output of the previous GAN is sound.
2. The experiments showcase significant improvement over previous models for medical image translations.
3. The uncertainty map is interpretable and as the progression happens, the uncertainty decreases (which is expected).
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Instead of a GGD based uncertainty map, how would a residual based map perform? Residual would be the absolute difference in the predicted map and the target at a given stage of GAN. I agree that this would not encapsulate the uncertainty, but it would be interesting to see how residual based maps help.

2. Visual quality doesn’t transfer to performance on a downstream task in medical images. Hence, only providing generation metrics might not be sufficient to say that UP-GAN is actually helpful from a clinical aspect. It would be great if the author(s) can include at least one downstream task of some kind of classification or detection or segmentation that proves the clinical importance. I expect UP-GAN to perform well on such downstream tasks as well.

3. Was M=4 giving similar results as M=3 and hence the author(s) stuck to use M=3 for their experiments?

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The details of the GAN module, learning rate, optimizer etc are mentioned clearly. I hope the author(s) explicitly release the code, if accepted.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Probably accept (7)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Given the clarity of the paper and novelty of the proposed method, I would like to vote for an accept of this paper. In addition to the method, the results also seem impressive and the idea of having an uncertainty map for progressive GANs is novel.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

### Review #2

• Please describe the contribution of the paper

In clinical domain, different imaging modalities provide complementary information for diagnosis while acquiring medical images is susceptible to various kinds of noise and modality-specific artefacts. Generative adversarial networks have been shown to achieve state-of-the-art performance in generating high fidelity images. However, existing I2I translation methods based on GAN have failed to estimate the uncertainty of predicted results. To address this issue, this paper proposed a generic end-to-end model that incorporates progressive learning scheme and aleatoric uncertainty guide for effective and reliable medical image translation. Results are promising.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

a) Authors proposed to incorporate the estimated aleatoric uncertainty maps as attention maps to enforce the model to focus on improving image quality in regions that are likely to be poorly synthesized. b) Different from the previous works that assume the pixelwise error to be independent and identically distributed, authors model underlying per-pixel residual distribution as non-identically distributed zero-mean generalized Gaussian distribution instead. c) Guided by the aleatoric uncertainty map, authors adopted the progressive learning scheme for the further enhancement of synthetic images. d) This paper applied an adaptive fidelity loss function along with the original adversarial loss, to supervise the network training. e) Evaluated on three challenging medical image translation tasks, the experimental results indicate that the proposed method can outperform other state-of-the-art approaches under both full supervision and weak-supervision with limited data cases.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The reason why the authors adopt the non-identically distributed zero-mean generalized Gaussian distribution instead of independent and identically distribution to model pixelwise residual is not explained clearly.
In the "Related works" section, I wonder why the authors choose the aleatoric uncertainty instead of the epistemic uncertainty to quantify?
Figure 1. In my opinion, the boundary of the visualized scale (α) and shape (β) parameters of the GGD shows a tendency to first get clear and then get blurred with the increasing number of GAN. Please re-verify this visualization result and add some description to enhance the comprehensibility of the model.
How did the predicted (α ̂,β ̂,b ̂ ) learn the optimal values? Please explain.
In Equation (3), how is the uncertainty map calculated and utilized as the attention map? The operator ⊙ and the formula σ ̂_[m-1]i/(∑_j▒σ ̂_[m-1]ij ) are not explained clearly. What do they actually mean?
As indicated by the authors, the model without the uncertainty as the guide utilizes a sequence of 3 GANs as framework, however, both quantitative and qualitative results for w/o guidance group in Figure 3 and Table 1 seem to obtain even worse results than pix2pix method. Please comment on this.
The results of comparison study are too few. More state-of-the-art methods should be incorporated to demonstrating the superiority of the proposed method.

• Please rate the clarity and organization of this paper

Satisfactory

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

From reproducibility checklist filled out by the authors, we can see that the authors would like to release all code related to this work, and the relevant dataset description and experimental settings are included in the submitted manuscript. Based on the above, this paper has a good reproducibility.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
In "Related Works" Section, the authors introduced the independent and identically distributed residual but it seems that it is just an introduction and has no direct relevance to the innovation or motivation of this paper. In addition, compared to the i.i.d residual, the description of the advantage of generalized Gaussian distribution (GGD) is relatively insufficient.
The authors should give the color bar with corresponding range of values in Figure 1. Especially for the uncertainty map, it is of great importance to indicate how the color of the map relate to the uncertainty level of the predicted results.
The proposed mode generated output images along with the GGD parameters in each phase, I wonder how the network learn the optimal the optimal scale (α) and shape (β)?
The authors mentioned that the adaptive fidelity loss function and aleatoric uncertainty adopted in this paper is with reference to [21], is there modification or novelty for this work?
Is formula σ ̂_[m-1]i/(∑_j▒σ ̂_[m-1]ij ) equal to 1?
The operator ⊙, does it mean the dot product operation? What is the format of f_([m]i)? I suggest the authors visualize the output of each block to facilitate easier comprehension.
I wonder how the aleatoric uncertainty is associated with the GGD. In addition, the reason why the authors apply the aleatoric uncertainty instead of epistemic uncertainty is not stated clearly. Please explain.
The authors are supposed to verify the performance of the proposed method by comparing with more state-of-the-art medical I2I translation methods. In addition, for all the medical image translation tasks, I would suggest adding statistical tests to evaluate whether the differences between proposed and evaluated methods are statistically significantly different.


borderline accept (6)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This work has a reasonable structure and a clear logic. The main contribution of this work is that the authors proposed an uncertainty-guided progressive learning scheme for image-to-image translation. In “Introduction” and “Related Works” Sections, the medical I2I translation background and the motivation of this work are described. The Section 3 presents the architecture and workflow of the network. However, there are still some details that are not clearly described in the text. As mentioned above, the training of GGD parameters and the acquisition of uncertainty map should be further discussed. How ensure the uncertainty map given by GGD can guide the network to focus on poorly translated regions? What’s more, the effectiveness of the progressive learning scheme is still not convincing when compared with pix2pix method. Another serious problem is that the authors did not compare the proposed method with sufficient state-of-the-art medical I2I translation methods.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

### Review #3

• Please describe the contribution of the paper

In this paper, the authors proposed an uncertainty-guided progressive GAN for image-to-image translation. They used aleatoric uncertainty estimates as the guide to focus on improving image quality in regions where the network is highly uncertain about the prediction. Promising results were obtained in tasks including PET to CT translation, undersampled MRI reconstruction, and motion correction in MRI.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Progressive enhancement of synthetic images guided by prediction uncertainty is innovative. The authors demonstrated that the image quality progressively improved after each phase. Improved results were obtained compared with state-of-the-art GANs at both full supervision and weak supervision.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Some references are not relevant. Some statements about previous works are not accurate.

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper provided sufficient details.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. Please check the reference carefully and make sure they are all relevant. For example, in Related Works, “More recently, convolutional neural networks have been proposed for various image translation tasks [13,19,4,5,8,3]”, [4, 19] are about disease detection, [5, 8] are about tumor segmentation, they are not related to image translation tasks. Please use relevant references about image translation tasks, for example, the paper below about PET to CT translation: Shi, Luyao, et al. “A novel loss function incorporating imaging acquisition physics for PET attenuation map generation using deep learning.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2019.
2. Also in the Related Works, “quantifying it for general image-to-image translation problem largely remains unexplored.” However, Reference [21] provided such evaluation on MRI images (T1w to T2w).
3. In this paper the authors only show results up to phase 3 (M=3). How is this parameter determined? Will the quality still improve beyond phase 3? How long does it take to run each phase? It would be helpful to provide some more results and discussion on the trade-off between performance and running time.

Probably accept (7)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Progressive enhancement of synthetic images guided by prediction uncertainty is innovative. Results are better compared with state-of-the-arts GAN-based methods.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposes progressive GANs for several tasks in image translation. The idea is to use uncertainty to guide network attention in forward inference. The proposed method has its novelty, and the validation is relatively comprehensive. All reviewers are relatively positive to this paper. There are a few issues especially raised by R2, which are confusing and need be addressed in rebuttal though.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

# Author Feedback

We thank the reviewers for their constructive feedback indicating that our paper makes a novel and sound technical contribution (R1, R2, R3) and our results are promising (R1, R2, R3). Below we address the individual concerns.

R1: Using residual map (instead of uncertainty map) - Although the residual could indicate the area that is poorly generated, computing residual at test-time is not feasible because we do not have access to ground truth. Therefore, it cannot replace the uncertainty map. Moreover, [20,21,31] shows the correspondence between residual and uncertainty maps and motivates the use of uncertainty maps as a proxy to residual maps.

R2: How does the network learn the optimal scale (α) and shape (β)? - α and β are learned by optimizing the neg-log-likelihood objective using the GGD likelihood model for per-pixel heteroscedasticity (see Eq.1 and Eq.2 on page 4).

R2: i.i.d & GGD – Assuming i.i.d for pixel-wise error imposes identical mean and variances for all pixels. This does not allow estimating local uncertainty maps. GGD [21] is an extension of Gaussian distribution that can model more complex distributions and still have a closed-form equation for aleatoric uncertainty (Eq.3). Our key innovation is using uncertainty as the attention map to progressively guide our model generation.

R2, R3: Novelty upon [21], innovation of the paper – Our work differs from [21] in that (a) We introduce the GGD-based uncertainty estimator as the intermediate attention map in a new cascaded image generation framework, to progressively refine the uncertain regions and generate images with higher SSIM/PSNR and lower MAE scores. (b) We introduce a novel progressive learning scheme with uncertainty-based multi-phase training. (c) We perform a thorough evaluation on PET-to-CT, undersampled MRI reconstruction, and MRI motion correction. We will revise our citations to include the new references and correct old ones.

R2: In Eq.3, σ ̂[m-1]i/(∑_j σ ̂[m-1]ij ) equal to 1? -There has been a misunderstanding. In fact, this is an image, not a scalar quantity. On page 4 and Fig 1 we show that σ ̂[m-1]i is an image and σ ̂[m-1]ij is the “j-th” pixel of that image. ∑j σ ̂[m-1]ij is a scalar summing all the pixel values of the image σ ̂_[m-1]i. The result is a new image that is obtained by scaling with this sum. R2: In Eq.3 what is ⊙? – ⊙ is the element-wise multiplication, we’ll clarify in the manuscript.

R2: Compare with more methods & statistical tests – As the SOTA method MedGAN outperforms existing methods PAN [22], Fila-sGAN [32], ID-cGAN [30], we showed that our method improves over MedGAN. We will add statistical tests (p-values) on SSIM between MedGAN and our UP-GAN for the PET-to-CT (0.016) / undersampled MRI recon (0.021) / MRI motion correction (0.036). As all the p-values are <0.05, our results are statistically significant.

R2: Use of aleatoric instead of epistemic uncertainty – Aleatoric uncertainty, i.e., data distribution uncertainty, corresponds to residuals (which are not available on test time to guide the generation process) [20,21,30]. Residual can help inform the poorly reconstructed regions, but since we don’t have access to them, we use aleatoric uncertainty. Epistemic uncertainty quantifies uncertainty in model params and is orthogonal to the problem that we are addressing.

R2: Fig.2 interpretation – The qualitative results are in fact not blurry and show finer details as we move from phase 1 to 3, α, β, and more importantly the uncertainty map, σ, show the refined structure in nasal cavities and around the inner ear indicating the medical relevance of our results.

R1, R3: Number of GANs (M>3) – We use M=3 because M=4 is computationally more expensive while the generated images are already well-refined with M=3. For the rebuttal, we tried M=4 and found that it performs similarly to M=3 (an SSIM of 0.96 vs 0.95 for PET-to-CT task; similar trends in other metrics and tasks). More tasks are infeasible

# Post-rebuttal Meta-Reviews

## Meta-review # 1 (Primary)

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposes progressive GANs for several tasks in image translation. The idea is to use uncertainty to guide network attention in forward inference. The proposed method has its novelty, and the validation is relatively comprehensive. All reviewers are relatively positive to this paper. There are several issues especially raised by R2, to which the authors made clarification in rebuttal.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

## Meta-review #2

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

All reviewers agreed that estimating and incorporating aleatoric uncertainty maps into progressive GANs for image to image translation is quite interesting and novel. It’s also good to see the proposed method was validated on three different image translation tasks and showed promising performance. On the other hand, as Reviewer#2 pointed out the comparison with SOTA methods is too narrow, with only two other methods. Although authors responded that MegGAN outperformed some other GANs in its original paper, the results may not be consistent with the tasks, datasets and experiment settings given in this paper. To prepare the final paper, please try to add more comparison results and include the statistical test for the significance of improvement.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

## Meta-review #3

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper is a nice contribution to the field since leveraging uncertainty for image synthesis has not been explored before. The authors rebuttal is also convincing. I would like to see results on a downstream task (such as classification) in the final version.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1