Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Junxiao Chen, Jia Wei, Rui Li

Abstract

Paired multi-modality medical images, can provide complementary information to help physicians make more reasonable decisions than single modality medical images. But they are difficult to generate due to multiple factors in practice (e.g., time, cost, radiation dose). To address these problems, multi-modality medical image translation has aroused increasing research interest recently. However, the existing works mainly focus on translation effect of a whole image instead of a critical target area or Region of Interest (ROI), e.g., organ and so on. This leads to poor-quality translation of the localized target area which becomes blurry, deformed or even with extra unreasonable textures. In this paper, we propose a novel target-aware generative adversarial network called TarGAN, which is a generic multi-modality medical image translation model capable of (1) learning multi-modality medical image translation without relying on paired data, (2) enhancing quality of target area generation with the help of target area labels. The generator of TarGAN jointly learns mapping at two levels simultaneously — whole image translation mapping and target area translation mapping. These two mappings are interrelated through a proposed crossing loss. The experiments on both quantitative measures and qualitative evaluations demonstrate that TarGAN outperforms the state-of-the-art methods in all cases. Subsequent segmentation task is conducted to demonstrate effectiveness of synthetic images generated by TarGAN in a real-world application. Our code is available at https://github.com/cs-xiao/TarGAN.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87231-1_3

SharedIt: https://rdcu.be/cyhUu

Link to the code repository

https://github.com/cs-xiao/TarGAN

Link to the dataset(s)

https://chaos.grand-challenge.org/

Reviews

Review #1

Please describe the contribution of the paper

The paper presents a modality translation (CT to MRI) framework which is trained without any paired data using the now commonly used cycle consistency loss. The paper not only generates a translation at the whole image level but also generates a sub region e.g. a specific anatomical region; in this case the liver. It is argued that the focus on specific regions is able to generate higher quality images in these specific areas.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The combination of different losses like cycle consistency, modality classification and shape consistency seem appropriate for the task at hand. It is also interesting to make use a dataset whose original purpose was segmentation for a rather different task such as synthetic image generation. I appreciated the ablation studies showing the impact of the different types of loss fiunctions.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Some details are not very clear; for example what exactly are the inputs to the G network? it is written in Fig 2 that is a depthwise concatenation of the source image plus the target modality. Fomr the figure it seems to be a one hot encoding, but it is not very clear if it is just a full binary image encoding or some other way.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I really appreciated the authors went ahead and shared the code in an anonymous fashion using (anonymous.4open.science). However, the weights of the model are not available so it is not possible to reproduce exactly the experiments.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

In Fig. 2 it is not clear how is ‘t’ encoded. is this a one hot encoding e.g. a 3XHxW tensor? I would appreciate if the tensor shapes were added to the figures and to the networks descriptions in the suppl. material.

It is very good to evaluate the generated images on a downstream task such as segmentation. However, it could good to train an MRI liver segmentation model only with real data, then using the CT as input generate a synthetic MRI. Now we have the segmentation of this MRI since it is the same as the one of the CT, we could then use the synthetic MRI and segment it with the MRI segmentation network (which has never seen synthetic data) and evaluate the performance this way. Like this we could see if a network trained only with real data would perform well segmenting synthetic data.

I appreciated a lot the share of the source code; however, without it seems that only the training part is there, and without sharing the weights of the networks it is not possible to reproduce the results.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper presents an interesting combination of loss functions and to me it seems as a good idea to focus on specific anatomical regions. The fact os using the generated images on a downstream task such as segmentation is also very appropiate to evaluate the usefulness of the generated data. I would have given a higher score if the shared code had also included the weights of the models in order to reproduce the results better.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

The manuscript proposes a method for multi-modal medical image translation based on Generative Adversarial Networks. While there is a plethora of architectures very similar to the proposed one since around 2018 for medical imaging, the authors modify the networks to explicitly enforce shape consistency via the use of binary segmentation masks from the source domain and improve translation quality by the use of the untraceable constraint. Experiments are performed on 2D slices of 3D data gathered from a relatively small public dataset composed of CTs and MRIs. An ablation study and comparison with baselines and reported and discussed in the experiments.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is overall well written with only very minor grammar issues. The methodology is technically sound and the insertion of multiple loss components and modules are both properly motivated in a theoretical sense and validated during ablation.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

TarGAN requires previous segmentation of the organs of interest in the source domain to enforce shape consistency, thus severely limiting the application of the method in real-world scenarios.

The experimental setup only focuses on liver segmentation, while there are multiple other labeled CT datasets in the literature (i.e. https://zenodo.org/record/1169361#.X7IOsZ1KiV4, https://www.kaggle.com/user123454321/liversegtestimages/version/1).

The manuscript focuses on volumetric data translations and only during the experimental setup (Sec. 3.1) the reader is informed that the proposed method only works in 2D slices of 3D images. While there are indeed applications for this methodology, the authors should clarify this much earlier in the text. Additionally, while 2D image-translation is a very well explored area in the literature of visual pattern recognition, including multi-modality image translation in medical imaging, 3D data presents a whole new set of difficulties, including the sharp increase in computational requirements due to 3D convolutions and the lack of pretrained models commonly used to enforce perceptual losses in modern 2D image translation architectures.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Authors propose an architecture of image-translation GAN composed of multiple modules, loss components, two discriminators and combinations of existing architectures (namely StarGAN and CycleGAN). In the manuscript (including supplementary material) there are not enough details regarding architectural choices, hyperparameters and implementation details to allow for replication of this study, as all of these aspects of the method would require extensive tuning. However, the authors marked that pretrained models, training and evaluation code would be made available, with an anonymized link for code in the supplementary material. If these codes are inserted in the camera ready version of the main text, there should be no major reproducibility concerns regarding this manuscript.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

A more thorough analysis in multiple organ segmentations using additional public datasets would be my main suggestion to the authors. While the method indeed presented promising results compared to other baselines in the literature, a proper validation of the methods effectiveness is not possible if only liver translation is conducted in the experiments. Inserting multiple datasets instead of focusing only on CHAOS (a rather small data scenario) would help to further validate the method in comparison with other baselines.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The manuscript does seem to provide an incremental gain in the literature of medical image translation using GANs. However, the experiments are way too narrow due to the choice of liver segmentation in a small dataset only.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

This paper proposed a target-aware GAN for unpaired multi-modality medical image translation. The generator translate the global image and interested ROI jointly with the help of ROI labels. The method is based on starGAN with additional untraceable constraint loss, shape constraint loss and crossing loss to help the generator to learn. The generated results were found to be both of high quality in terms of FID score and beneficial to the down stream segmentation task.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Experiments were through. The paper not only provided image quality based metrics but also showed the usefulness of the generated image in downstream segmentation tasks.
- The two streams generator that handles both the global and local transfer and the use of crossing loss for regularization are interesting.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

It is a little bit unfair for the setup of the baselines in the liver segmentation experiment. The CSGAN and ReMIC should be compared in the “image enrichment” setting also using nnU-net.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- Using open access challenge data in the experiment
- Code is publicly available
- Training hyper-parameters were well documented
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- What is the training data for the shape controller? Please clarify what were the foreground and background for both the global image and the region ROI image.
- What makes the S-score on the T2w modality better than the other 2 modalities but segmentation performance significantly worse? Please discuss
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Interesting idea with through experiments
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The authors design a sophisticated loss combination, for shape consistency constraint in CT to MRI generation. All review comments are positive, considering the technical soundness and the validation of each claimed contribution. The authors are expected to share their code/model, which is exactly in line with the comments from all reviewers.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

We sincerely thank all the reviewers and ACs for the insightful comments and useful suggestions. We will share our code and model soon. Q1. Some details are not very clear; for example what exactly are the inputs to the G network? In Fig. 2 it is not clear how is ‘t’ encoded. is this a one hot encoding e.g. a 3XHxW tensor? (R1) A1: Because TarGAN is based on StarGAN, the forms of inputs to the G are the same as StarGAN. Specifically, in Fig 2, a target modality t is a one-hot code, it will be extent to HxW shape and concatenate with the image in second dimension. So, in Fig 2, the tensor shape of input to G is 3xHxW. Thanks your advice, we will consider adding the tensor shape to the figures and to the networks descriptions in the supplementary. Q2. The manuscript focuses on volumetric data translations and only during the experimental setup (Sec. 3.1) the reader is informed that the proposed method only works in 2D slices of 3D images. While there are indeed applications for this methodology, the authors should clarify this much earlier in the text. (R2) A2: We thank the reviewer for pointing out this issue. We will clarify this at the head of paper. Q3. What is the training data for the shape controller? Please clarify what were the foreground and background for both the global image and the region ROI image. (R3) A3: Thanks. The foreground of the global image or the region ROI image means the content area of image while the rest area is background. That is, an image inputs the shape controller, we want the shape controller to segment the content area. Q4. What makes the S-score on the T2w modality better than the other 2 modalities but segmentation performance significantly worse? Please discuss(R3) A4: Thank you for the detailed comments. You have raised an interesting point; however, we would like to clarify that there is no relationship between the S-score in Table 1 and the segmentation performance in Table 2 due to the different calculation ways. For example, T2w in S-score, we first train a segmentation network using real T2w images, then test it using the synthetic T2w images translated from CT or T1w.

back to top

TarGAN: Target-Aware Generative Adversarial Networks for Multi-modality Medical Image Translation