Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Devavrat Tomar, Lin Zhang, Tiziano Portenier, Orcun Goksel

Abstract

Interactive simulation of ultrasound imaging greatly facilitates sonography training. Although ray-tracing based methods have shown promising results, obtaining realistic images requires substantial modeling effort and manual parameter tuning. In addition, current techniques still result in a significant appearance gap between simulated images and real clinical scans. Herein we introduce a novel content-preserving image translation framework (ConPres) to bridge this appearance gap, while maintaining the simulated anatomical layout. We achieve this goal by leveraging both simulated images with semantic segmentations and unpaired in-vivo ultrasound scans. Our framework is based on recent contrastive unpaired translation techniques and we propose a regularization approach by learning an auxiliary segmentation-to-real image translation task, which encourages the disentanglement of content and style. In addition, we extend the generator to be class-conditional, which enables the incorporation of additional losses, in particular a cyclic consistency loss, to further improve the translation quality. Qualitative and quantitative comparisons against state-of-the-art unpaired translation methods demonstrate the superiority of our proposed framework.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_63

SharedIt: https://rdcu.be/cymbq

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper describes an image translation framework synthesizing from simulated ultrasound images to realistic ones. The proposed network consists of a single generator and a single discriminator. The generator takes simulated image, its segmentation and an real image as input, computing contrastive loss and noise contrastive estimation loss which was proposed in previous work, sematic-consistent loss with the mask, and finally a cycle consistent loss using the same G twice in contrast to cycleGAN-like models.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper leverages recent advances of the domain adaptation work, such as contrastive loss for training, and kernel inception distance for evaluation which I found are interesting.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Unfortunately there are quite some and crucial weakness in this paper. First, lack of novelty. The only differences between this paper and starGAN are contrastive loss and semantic-consistent loss. The contrastive loss comes from Park, T. [16]. The semantic-consistent loss is just replacing X by S in the contrastive loss, since S is just another source image for generating images in Y’s domain. The overall novelty is very limited. Second, the experiment design is poor and even wrong. The data splitting is improper which causes a high risk of ‘training and testing on a same image’. Please see comments part for details.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Reproducing can be hard, mainly because the simulation algorithm details are not listed (either main or supplementary script). Without the details of simulated images, reproducing the work can be very challenging.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
Overall this paper is well written and organized. Advantages of many recent proposed works have been incorporated. Besides the lack of novelty issue, my biggest concern is on the data splitting part in Section 3 Experiments.
1. For real in-vivo images, the authors collected 22 ultrasound sequences from 8 patients, each of them is only several seconds long. However, the author puts all frames together and makes an 80%-10%-10% data splitting. First of all, it is not possible to make such splitting over 8 patients, so this splitting is really done over all frames. So there will be overlap in terms patients data between training/validation and testing. In addition, each sequence is only few seconds long. This means the variation in the data is very limited. Overlapping data, limited variation, this significantly weakens the claims of the experiments, especially for GAN related work.
2. I don’t think the segmentation-consistency regularization term is well validated by the experiments. From Fig 3., comparing between CUT and CUTS, on the 4th column, the segmentation corresponding area is lost in CUTS. In Table 1, CUTS is consistently worse than CUT for all metrics. Then what is the advantage of adding this term? Please comment.
Please state your overall opinion of the paper

reject (3)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Reject. Lack of novelty. Proposed segmentation regularization is not well justified. Problematic data splitting in numerical experiments.
What is the ranking of this paper in your review stack?

4
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

The authors proposed a novel image translation framework to bridge significant appearance gap between simulated images and real clinical scans, which has two contributions: constrain the generator with the accompanying semantic labels of simulated images by learning an auxiliary segmentation-to-real image translation task and apply a class-conditional generator, which in turn enables the incorporation of a cyclic loss.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

-The proposed method is novel. Based on contrastive unpaired translation framework, the authors used the semantic labels of simulated images by learning an auxiliary segmentation-to-real image task to constrain the generator and extended a class-conditional generator to enable a cyclic loss. -The qualitative and quantitative metrics proved that this framework closed the appearance gap between simulated and real images, which had a significant influence on the US images generation. And a user study was performed to evaluate the realism of translated images by US experts.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

-A more clear outline of the next steps in research would be appropriate. -There are fewer data sets and all act on fetal images.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The details described in the paper and the reproducibility are good. No code and data set are provided.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

-A more clear outline of the next steps in research would be appropriate, some discussion but vague. There are already some new work based on CUT. -The author should try to experiment on a larger data set, and based on different types of images, which is more convincing, now it is only fetal images.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

-From the experimental results, the framework proposed by the authors can indeed bridge appearance gap between simulated images and real clinical scans to a certain extent. -The algorithm has a certain degree of innovation.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

4
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

This paper takes a bag of losses to boost the image-to-image translation. Extensive experiments compared to SOTA methods on realistic datasets evaluated the effectiveness of the proposed method to improve the realism of computationally simulated US images.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper tries to use various losses in the image-to-image translation field to improve the realism of the generated images. According to the experimental results, the simulated images are much more realistic than other methods. The authors use typical evaluation metrics and invite experts to rate the qualities. The semantic-consistent regularization is well designed.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1 As claimed in the Abstract, the authors try to improve the quality of simulated US images, which can facilitate sonography training. The authors should evaluate how the generated images by the proposed method help sonography training, compared to the traditional methods.

2 More ablation studies and visualizations should be explored to show the effectiveness of the losses (loss of GAN, CLS, CYC, NCE)
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

As the dataset is private and the code is not going be released, I am not sure this work can be reproduced.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Please see the “Weakness” part.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I choose the “borderline accept” because of the great performance shown in the experimental results. However, due to the lack of reproducibility, I am not sure if this paper contributes more to society. On the other hand, It needs more experiments to evaluate the clinical effectiveness of those realistic simulated ultrasound images.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

2
Reviewer confidence

Somewhat confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper recevies mixed reviews from three experts. R1 has a legitimate concern regarding data splitting. Please provide a response to this as well as other issues raised by all reviewers.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

Author Feedback

We thank the reviewers for their valuable feedback. Major comments are addressed below, and will be integrated shortly in the paper:

** Data splitting (R1) Other image analysis tasks such as classification and segmentation require evaluation on unseen patients to avoid bias. However, using train and/or test sets, even together, for computing GAN distribution metrics (FID/KID) is common practice [2,8, StyleGAN-Ada]. This is indeed more justified in an unpaired setting as in our appearance transfer task, where the goal is not to reproduce the real data, but to abstract and apply the style of the real training data.

For computing FID/KID, we followed the approach from [8, StyleGAN-Ada] that introduced / used these metrics; i.e., we computed on the train set, since set distribution metrics require large samples for robust estimates. Indeed, adding the test set does not result in any difference (CUTS-C* KID: 0.24/0.24, FID 1.51/1.52 for train/all). The test set alone would be too small to compute these metrics robustly.

Moreover, if there were bias from data splitting, this would also affect our baselines, against which our method is still seen to be superior. Indeed, larger models in our baselines might even profit more, if there were any such bias.

That said, we have conducted an additional, patient-wise splitting experiment (6 patients for train). This yields a KID of 0.24, the same as earlier random data splitting.

** Limited real data variability (R1), More datasets (R3) For transferring appearance, large variability in real data is not necessary. Indeed, even a single image in the target domain can be sufficient under certain circumstances (cf. [16]). Here we used thousands of real images to capture the appearance of a specific US machine.

Our goal here is task-specific, i.e., enhancing US simulation of the common 20-week fetal exams with real-machine like image appearance. Conceptually, there is no limitation in translating other US scenes (although a practical requirement would be the availability of detailed 3D models).

** Semantic-consistent regularization not well justified (R1) We assume a misunderstanding regarding the ablated models: CUTS is CUT with an auxiliary segmentation to real translation, but without the proposed semantic-consistent regularization. As seen in Fig.3 and pointed out by R1, learning this auxiliary task alone does not improve CUT. However, by adding the regularization term (CUTS-C), the translated images are semantically more close to simulated ones, c.f. Fig. 3 (CUT vs. CUTS-C). This is also corroborated with the better SSIM of CUTS-C, clearly demonstrating the benefit of our proposed regularization approach. As the namings may be confusing w.r.t. ablations, and we will rename our method.

** Limited Novelty (R1) For content preservation during appearance transfer, our novelty is to enforce semantic consistency with the proposed regularization during a multi-domain translation. Note that semantic regularization is not “replacing X by S in the contrastive loss”, but enforcing G to generate the same output for X and S.

** Future works (R3&R4), Clinical experiments (R3) Future works include: clinical evaluation of the effects of translated images on US training; improving seg-to-real image translation to bypass any expensive rendering entirely; evaluations with other US/non-US scenes.

** More ablations for other losses (R4) The experiments in the submission demonstrate the benefits of all proposed components. For completeness, we report additional comparisons here (FID/KID/SSIM): proposed model (1.51/0.24/72.13), without NCE (2.22/0.43/64.77), without CYC (2.33/0.45/84.77). The results show that these components are indeed essential. Without CYC, the translated images look similar to the simulated ones. GAN loss is what promotes realism; and without CLS, the discriminator would have no info on domain-dependent classification [5], so these are essential.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper recevies mixed reviews from three experts. R1 has a legitimate concern regarding data splitting. Please provide a response to this as well as other issues raised by all reviewers.

The rebuttal explains why the employed data splitting is justified: “the goal is not to reproduce the real data, but to abstract and apply the style of the real training data.” This somehow alleviate the issue, however, it is still questionable in my view. For example, a common use of image synthesis is through a subsequent application such as classification, segmentation, etc. In this regard, a strict data splitting is necessary. The rebuttal provides the results based on patient-wise splitting and it seems that similar performance improvements are reported. The rebuttal also provides more ablation studies.

Overall, the paper meets the MICCAI acceptance level if the authors can make necessary changes per rebuttal.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

11

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors propose a method to simulate realistic ultrasound images, by taking the structure from input simulated images, and the appearance from real images. The method clearly work in translating the appearance to the generated images. Authors have addressed the methodological criticisms, although I am not sure they have correctly addressed the criticisms regarding novelty.

Moreover, I am not sure they have addressed at all the issues related to clinical utility (for example raised by R3).

I have concerns about the user survey too. Authors said that experts were asked to rank images by “their likelihood for being an image from this machine”. Even if the answer to this is highly positive, it fails to demonstrate that images look realistic, because they don’t. As pointed out by the authors themselves, “the simulated and the real images have substantially different anatomical contents”, i.e. the simulated fetal geometries are highly unrealistic. This has two implications: first, because of the lack of realism, the clinical utility is at best arguable (reason for which authors have not been able to rebut this point). Second, the challenge of obtaining realistic fetal geometries to generate realistic looking images might be greater than that of simulating realistic images. In their rebuttal, authors address one of the main criticisms (lack of novelty) by saying that “ our novelty is to enforce semantic consistency “, however this is precisely (as I pointed out) the main flaw. As a result, I cannot recommend acceptance.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

15

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The solution provided by the authors is interesting and the strongest arguments raised by the reviewers (novelty and data splitting) were properly explained and justified in the rebuttal phase. The rebuttal by the authors is clear, well argued and complete, and provides clarifications to the mayor points raised by the reviewers. The weakest point in the rebuttal is the justification and highlighting of the work novelty, yet novelty is detailed in the manuscript.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

back to top

Content-Preserving Unpaired Translation from Simulated to Realistic Ultrasound Images