Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Kun Chen, Yuanfan Guo, Canqian Yang, Yi Xu, Rui Zhang, Chunxiao Li, Rong Wu

Abstract

Ultrasound (US) imaging is a fundamental modality for detecting and diagnosing breast lesions, while shear-wave elastography (SWE) serves as a crucial complementary counterpart. Although an automated breast lesion classification system is desired, training of such a system is constrained by data scarcity and modality imbalance problems due to the lack of SWE devices in rural hospitals. To enhance the diagnosis with only US available, in this work, we propose a knowledge-guided data augmentation framework, which consists of a modal translater and a semantic inverter, achieving cross-modal and semantic data augmentation simultaneously. Extensive experiments show a significant improvement of AUC from 84.36% to 86.71% using the same ResNet18 classifier after leveraging our augmentation framework, further improving classification performance based on conventional data augmentation methods and GAN-based data augmentation methods without knowledge guidance.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_6

SharedIt: https://rdcu.be/cyl5y

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a pipeline to generate synthetic plausible ultrasound (US) and shear-wave elastography (SWE) images from US images to enhance the classification of breast lesions. SWE images are generated using a modal translater, while the enrichment of US images consists of a semantic inverter that generates images from the opposite class. Thorough assessment is performed to demonstrate the value of the different steps in the pipeline.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • A well-written paper, with a nicely situated contribution, and sound methods.
    • Going for data augmentation by synthesizing images is very topical to improve posterior tasks, in particular here for the targeted application.
    • Thorough evaluation, including against state-of-the-art methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The potential complexity of the pipeline, or at least the time it may take for a reader to properly understand what is actually done (e.g. from Fig.1).
    • No examination or discussion of the failing cases.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The implementation details are well described. Data seem local and not to be released. In contrast, the authors mention that “code will be released soon”, which I hope could be once the decision about the paper is known.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Major comments:

    • It is interesting to see that data augmentation can lead to an increased performance in the posterior task, provided the synthetic images are relevant. Have you tried whether improvements remain if the synthetic images have lower relevance?

    • This type of augmentation might be thought as what is actually performed in our mind when observing images, leading to much better generalization. Could the authors briefly comment on how much labeled data are needed to achieve this?

    • Table 3: One may wonder whether I’’_g is actually needed: it seems that Table 3 demonstrates its relevance. What would the authors recommend vs. the increase in complexity of the methods? Also, it seems that adding I’_g to I_g-only decreases the performance: do you have an interpretation for this?

    Minor comments:

    • The Abstract is clear and situates well the authors’ purpose. However, it would would deserve a quantitative summary of performance and database size.
    • Fig.1 may be revised to better render what the images I_g, I’_g, etc. mean. Also, the subtitle I’’_g is placed below the subimage, but also near the arrow going to C_U, which may be confusing.
    • The semantic guidance means that knowledge from C_D goes to G_M and C_S. Is there a way to render this in Fig.1?
    • May the observations from Fig.3 be supported by quantitative measures of subgroup differences? (e.g. a statistical test)

    Writing issues:

    • p.3: “for training semantic inverter” > “… the semantic…”
    • p.5: “\lambda_S” > “\lambda^S”
    • p.7: “and joint utilization” > “and the joint…”
  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A timely paper, with a contribution nicely situated, sound methods and a thorough evaluation, including comparison to state-of-the-art methods and an ablation study.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    The authors propose translater (Ultrasound(US) images to shear-wave elastography(SWE) images) and inverter (benign <-> malignant) models. Using those models for data augmentation, the authors also construct the classification model for breast cancer (benign or malignant). The quality of synthesized images (virtual SWE images) and the classification performance are evaluated by using some major metrics.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Good validation: The authors evaluate all results by using the average on five folds.
    2. Good evaluation: The authors evaluate the model by using five metrics and compare to the models with/without the translate and the inverter.
    3. Good mathematical formulation for the model explanation
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Limited technical novelty: All the model architectures (such as U-Net and patch-GAN) are proposed in previous works. In addition, the idea to use image translater to generate images from different domain and then use the generated images for augmentation is not novel.
    2. Limited comparison to state-of-the-art: The authors compare to only Res18 for the classification task.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    1. The paper provides details about the algorithms, training parameters, and dataset.
    2. The authors are planning to release their code.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. The authors only use the generated images to enlarge the dataset. Indeed, this is somewhat straightforward. However, more advanced approaches should be considered. For example, combine the original image with the generated image to form a multi-channel input to the detection model.
    2. Are the quality of the synthesized images (virtual SWE) nice translated images for clinical specialists? In Table 1, indeed, the performance of the proposed model are better than the baseline model. However, the oracle model seems to be the best results on some metrics. I wonder if the virtual SWE images lack some important features from clinical view. Thus, the performance of the proposed model by using virtual SWE model is less than the oracle model.
  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Limited technical novelty

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    This submission proposes a knowledge-guided data augmentation framework for ultrasound images. The experiments presented show that the proposed framework is able to synthesize virtual US and SWE images with US images only.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed framework can be used in other modalities. Appropriate evaluation measures used. Interesting concept of using single and multi modal datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Not that many datasets used. Most of the DL architectures used are standard such as ResNet and UNet.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    N/A

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Use more datasets, is not clear how many datasets were used for training/testing/validation…

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think that the strengths outweigh the weaknesses of this submission.

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers have divergent opinions on this paper. R2 thinks the paper lacks novelty and comparison with SOTA. R3 thinks the authors should validate their methods on more datasets. Please clearly summarize the main contributions of this paper as well as provide comparison results with SOTA on more datasets.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4




Author Feedback

We thank the reviewers for their insightful comments. The major concerns lie in the main contributions and additional comparisons of our methods.

  1. Novelty and Contribution We would like to emphasize that the key contribution of our research is a novel augmentation framework leveraging both the cross-domain and semantic knowledge, rather than new architectures of the models (To R3#B4#(1), R4#B4#(2)). Such contribution is highlighted in three points. 1) Semantic inverter: We propose a novel viewpoint of synthesizing semantically inverted images and provide a general data augmentation tool for benign-malignant images (To R3#B4#(1)). 2) Attribute knowledge guidance: We use pre-trained classifiers to provide attribute knowledge for generators, which improves cross-modal augmentation in terms of both image quality and classification performance and serves as strong guidance for semantic augmentation. 3) Clinical novelty: Our framework learns to synthesize virtual SWE and US images from real US images to augment data diversity. The experiments validate the effectiveness of the augmented dataset in improving breast lesion classification.
  2. Performance 1) More Models (R3#B4#(2)): We argue that Res18 has enough capability for our dataset with 3,474 images and is typically applied to datasets of comparable size [1]. As a response to the request of the reviewer, we also validate the effectiveness of our augmentation framework on Res50 and report AUC for methods in Table 1: Baseline: 83.85±4.41; Oracle: 89.17±2.45; Ours w/ G_M: 84.96±3.19; Ours w/ G_S: 84.26±4.54; Ours: 85.30±2.57. Consistent performance improvements are observed on other metrics. 2) More Datasets (R4#B4#(1) & R4#B7): It is noted that few public datasets are available with pixel-aligned multi-modal images and attribute annotations. We have collected 298 images of benign lesions and 373 images of malignant lesions from another medical center and divided the dataset into five subsets as did in our paper. We report AUC of Res18 for methods in Table 1: Baseline: 74.22±2.53; Oracle: 84.37±2.28; Ours w/ G_M: 76.80±2.06; Ours w/ G_S: 75.74±2.79; Ours: 77.16±1.18. We didn’t report the experimental results on this private dataset due to its limited scale.

To R2:

  1. (B4#(2)) There are some failing cases, mainly hard cases of fibrosis, which are benign but are stiff to some extent (falsely depicted in red/yellow on SWE). As for them, our framework is inclined to generate virtual SWE with blue as most benign cases. This is not given due to limited space.
  2. (B7#(2)) Empirically, about one thousand labeled images are needed to achieve robust GAN training.
  3. (B7#(3)) Since I’’_g consistently enhances the performance, we recommend using I’’_g unless inference speed is a major concern. It can be noted that at line 1, 2 and 5 in Table 3, the semantic supervision provided by L_cls^S is necessary for semantic augmentation, which is consistent with the observation in the ablation study of semantic inverter.
  4. (B7#(4)) We will polish the writing and improve the figures in the final revision.

To R3:

  1. (B7#(1)) For classification tasks, we actually combine US with corresponding SWE to form multi-channel inputs to classification models.
  2. (B7#(2)) Though some features might not be perfectly recovered in the virtual images, we argue that our method has its potential clinical value as it promotes the diagnosis for radiologists without enough data. Given only US images, a reader study with 6 experienced radiologists is conducted with the average accuracy of 77.53%. As a comparison, an accuracy of 78.65% is obtained in our model with the synthesized images.

To R4:

  1. (B7) For each fold of the 5-fold cross-validation, four subsets are used for training and one subset for testing; no validation sets are used.

[1] Xue, Yuan, et al. “Synthetic augmentation and feature-based filtering for improved cervical histopathology image classification.” MICCAI. 2019.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have clarified the novelty and contributions of this paper in the rebuttal letter. Rather than proposing a new architectures of the models, this work presents a novel augmentation framework leveraging the cross-domain and semantic knowledge. Regarding the performance issue, the authors provided more results on ResNet50 and explained the reason of not providing results on more datasets. The AC would suggest the authors to included the results on the private dataset in the final version if there is sufficient space.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The idea of an augmentation framework leveraging both the cross-domain and semantic knowledge is interesting, yet there is only one comparison to the pix2pix for the image translation/synthesis part. The authors addressed most of the reviewers’ concerns in the rebuttal, although [1] cited in the rebuttal seems to be missing in the paper. I recommend acceptance of the paper after rebuttal. If accepted, authors should try to incorporate rebuttal responses into the final version, and should try their best to present more comparison results and showcase some failure cases.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    10



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The technical novelty regarding the deep learning literature is probably somewhat limited, nevertheless the novelty in terms of the clinical application and the evaluation efforts are likely sufficient and of interest as a MICCAI contribution.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    7



back to top