Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Junlin Xian, Zhiwei Wang, Kwang-Ting Cheng, Xin Yang

Abstract

A holistic understanding of dual-view transformation (DVT) is an enabling technique for computer-aided diagnosis (CAD) of breast lesion in mammogram, e.g., micro-calcification (μC) or mass matching, dual-view feature extraction etc. Learning a complete DVT usually relies on a dense supervision which indicates a corresponding tissue in one view for each tissue in another. Since such dense supervision is infeasible to obtain in practical, a sparse supervision of some traceable lesion tissues across two views is thus an alternative but will lead to a defective DVT, limiting the performance of existing CAD systems dramatically. To address this problem, our solution is simple but very effective, i.e., densifying the existing sparse supervision by synthesizing lesions across two views. Specifically, a Gaussian model is first employed for capturing the spatial relationship of real lesions across two views, guiding a following proposed LT-GAN where to synthesize fake lesions. The proposed novel LT-GAN can not only synthesize visually realistic lesions, but also guarantee appearance consistency across views. At last, a denser supervision can be composed based on both real and synthetic lesions, enabling a robust DVT learning. Experimental results show that a DVT can be learned via our densified supervision, and thus result in a superior performance of cross-view μC matching on INbreast and CBIS-DDSM dataset to the state-of-the-art methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_34

SharedIt: https://rdcu.be/cyl58

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed a new learning of Dual View Transformation for Mammography Lesion Matching. The proposed method incorporates: 1) a Gaussian model estimating the corresponding location of a lesion in another view from the original view, and 2) a follow-up adapted CycleGAN model synthesizing location & appearance consistent lesions in matching views based on the lesion in the original view. Ablation studies were conducted and comparison with some literature models were performed using public INBREAST dataset measuring the lesion matching accuracy and paring AUC, showing relative improvement of the proposed method

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is relatively well-organized & fairly well-written: with a good amount of details on loss functions, clear figures showing overall architectures.
    • Learning Dual View Transformation for lesion matching is a clinically relevant topic in Mammography. This paper exploits the use of a Gaussian location model & CycleGAN, making it a valuable contribution to this line work.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Missing some work in Mammography Lesion Matching (see detailed comments) in state-of-the-art analysis
    • Evaluation dataset is relative small (INBREAST) & there is no statistical test.
    • This work focused mainly on individual large calcifications, instead of microcalcification clusters (Figure 3), which diminishes a bit the clinical relevance of the study.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Dataset is publicly available (INBREAST)
    • Codes will be made available
    • Training hardware, software & training time unclear
    • Lacks details on how hyper-parameters are fine-tuned
    • Lacks details on the statistical significance of the performance comparison
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Chapter Introduction

    • “However, learning a complete DVT must rely on a dense supervision which indicates every paired region in CC and MLO images projected from a same breast tissue.” Reference needed here.
    • Analysis of state-of-the-art papers are mainly based on computer vision literature. Some papers (among others) in Mammography lesion matching to be included:
    • Yang et al., 2020, Momminet: Mammographic multi-view mass identi cation networks
    • Ma et. al., 2019, Cross-view relation networks for mammogram mass detection

    Section 2.1 Spatial Location Determination in Fake Lesions

    • Gaussian distribution would depend on the positioning and might varies for different sites & technicians, is this taking into account?

    Section 3.1 Dataset and Implementation Details

    • Microcalcification patches were down-sampled from 200 x 200 to 64 x 64. Was it due to computational resource? Down-sampling on microcalcifications, which are relatively small objects (couple of millimeters or less) might have negative effect on their visibility. This should be clarified.
    • Why the choice of 200 x 200 as patch size? The empirically measure Gaussian models show that the standard deviation of matched view to original view distance can be up to 325. A patch size of 200 x 200 seems to small to cover the standard deviation, therefore not including enough context for the models to learn. This should be clarified.

    Figure 3 shows mainly individual calcifications, relatively large in diameter. These calcifications are often typically benign and have less clinical relevance compared to calcification clusters. It is not clear whether study focused only on individual large calcification, or also included calcification clusters. This should be clarified and perhaps mentioned in future directions.

    Section 3.2 Ablation Study

    • There is no error bar for b-ACC and AUC results in all tables, making the statistical significance of the improvement unclear.
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Details in this paper (explaining how the Gaussian model was estimated & how the CyleGAN losses were adapted) makes this work a pertinent scientific contribution. the reviewer does feel that this paper falls short in its clinical translation, but this type of work should probably be encouraged for future improvements.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Somewhat confident



Review #2

  • Please describe the contribution of the paper

    The authors propose a new method to learn the dual-view transformation between CC and MLO views. Their main idea is to synthesize lesions across two views in order to improve sparse supervised approaches. The method is tested on INbreast mammography dataset using 5 fold cross validation

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Simple but effective idea
    • results (slightly) improved
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Only one dataset is used
    • no statistical analysis of the results
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The dataset is available but not the code

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The paper reads well, the authors should validate their results with a statistical test and highlight more the benefit of using their approach since the results are not too far from the lesion-pasting approach.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea is interesting, the resullt are slightly better than the comparative approaches so the benefits are not very clear

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    Matching corresponding structures across CC-MLO mammography views is a long-standing problem in mammography, which is nowadays tackled mostly through the use of Siamese networks. Given the paucity of anatomical landmarks in mammography, the only form of supervision is provided by lesions, which provide a sparse supervision signal. Moving from this observation, the authors propose a GAN-based lesion synthesizer which injects pairs of lesions on CC-MLO mammogram view. The architecture is based on a variant of the Cycle GAN architecture, and uses a probabilistic model to ensure consistency in the lesion position on the CC and MLO views. While the concept of lesion synthesis in mammography is not new, to the best of my knowledge the problem of matche lesion synthesis was not tackled before. The authors demonstrate that the proposed technique increases the performance of different Dual View Transformation learners.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The author propose an original solution to an intrinsic problem in matching CC-MLO views. The proposed technique is general and can be applied for data augmentation of any CC-MLO matching technique. While there are other papers that tackle the synthesis of artificial lesions in mammography, also using GANs, to the best of my knowledge this is the first paper to tackle the paired setting.
    • Experimental validation was performed with respect to three existing patch-based matching algorithm: MatchNet, 2-channel, SCFM
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • I was not able to fully understand the experimental setting and many aspects of the methodology seem rather obscure. I suppose that the GAN is trained and than the resulting images are used to train the matching network (e.g., MatchNet), but the authors do not explicitly state how the two components interconnect. Architecture and parameters of the matching networks are not reported.
    • The authors claim that previous methods can only learn an inadequate DVT around mass regions rather than a whole breast. However, in their proposed technique, supervision remains incomplete since only fake lesions are generated. I am under the impression that the proposed method acts more like a sort of data augmentation, rather than actually enabling the network to learn how to match generic breast structures. At the same time, correctly matching lesion is the most important aspect of this task, from a clinical point of view.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors do not provide an implementation. The paper is based on a public dataset, therefore it is in principle reproducible. For the sake of reproducibility, more details should be given on how MatchNet, 2-channel and SCFM methods are implemented.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Major remarks:

    • The term Dual-View Transformation is uncommon in the mammography domain, which usually refers to this task as CC-MLO matching. I would suggest rephrasing the paper to match the usual terminology in the field.
    • The methodology should be clarified. In particular, it is not clear to me how the training of LT-GAN and the training of, e.g., MatchNet are handled. The authors state that “Evaluations are based on 5-fold cross-validation”: does this mean that the LT-GAN is trained on 4 folds, and then MatchNet is trained on the same dataset, and evaluated on the 5th fold? Or is the LT-GAN trained once, and MatchNet evaluated in 5-fold cross-validation? And how many synthetic samples are generated?
    • What are the architectural details of the three methods evaluated in Table 2? Since these methods were never applied to mammography “as is”, more details are needed on their architectures and training, either in the paper or in a supplementary material.
    • The authors apply their GAN-based data augmentation scheme to three state-of-the-art methods for stereo matching. However, the same method could be applied to state-of-the-art CC-MLO matching techniques, such as [9] and [14]. A direct comparison with the relevant mammography literature would have been more interesting to the MICCAI readership.
    • More examples of synthetic lesions should be provided. Fig. 3 only shows a macro-calcification which is not clinically very interesting, as it is usually a benign finding.
    • The authors experiment with INBREAST which, to the best of my knowledge, is mostly oriented towards mass detection, but only consider microcalcifications in their experiments. INBREAST contains many annotations of individual calcifications, but relatively few calcification clusters. The authors selected 76 cases, but it is not clear how and which type of findings they included. It is also not clear whether they counted as a lesion patch a single microcalcification or a microcalcification cluster. As a result, it is not easy to interpret the significance of their findings.
    • Mean and standard deviation (or confidence intervals) should be reported for all performance metrics

    Minor remarks:

    • In the introduction, please clarify the sentence “SCFM [10] further exploited for combination in spatial-dimension”
    • In the introduction, the remark “Only a defective DVT can be learned around mass regions” may be a little aggressive, considering that the proposed method mostly synthesizes matched lesions
    • Section 2.1 could be revised for clarity. For instance, what d and d’ with the largest p-values?
    • At page 5, the authors claim that their method use “image-label labels”. This wording may be misleading, since the GAN is trained on patches, not on whole mammograms
    • Typo at the beginning of Section 3.1: To validate effectiveness -> to validate the effectiveness
    • At page 7, the authors state their method “guarantees a correct spatial correlation” between the two synthetic views. However, the wording may be misleading as there is no ground truth for synthetic lesions.
  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Synthetic lesion insertion is an important topic for mammography where the number of lesions is limited by the low prevalence. The proposed methodology is novel and has the potential to improve the performance of CC-MLO matching algorithms. Experimental results are sufficient to prove the validity of the concept. However, the experimental methodology is not written in sufficient detail and clarity to allow the reader to interpret and reproduce the experimental results.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a learning method to match structures across the CC and MLO mammography views. The method is based on a GAN-based lesion synthesizer which injects pairs of lesions on CC-MLO mammogram view. The paper tackles a clinically relevant problem with a new method that matches synthesized lesions. I’d like to invite the authors to reply to the following issues: 1) why doesn’t the paper cite relevant mammography lesion matching approaches (see reviews for specific papers) ?; 2) did the authors try matching microcalcifications?; 3) did the authors try other larger datasets? 4) Please clarify how the GAN and matching networks work together; 5) Please report on the architecture and parameters of the matching networks.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5




Author Feedback

#1.Regarding evaluation on a larger dataset: In our original manuscript, we have provided evaluation on INbreast, the most widely-used dataset for mammogram-related tasks. We have also evaluated on another larger dataset DDSM-CBIS. Experimental results demonstrate a consistent performance with that on INbreast, e.g., our densified supervisions improve AUC of MatchNet from 0.796 to 0.835 compared to sparse labels. We will report the results on DDSM-CBIS in our final manuscript. #2.Should consider matching microcalcifications or calcification clusters: Most existing works, if not all, focus on masses which contain richer information to match. In comparison, our work goes a step further to match same important in clinical but more challenging calcifications (71 micro- and 24 macro-calcifications evaluated in our original submission). Matching calcification clusters would incur more unique challenges which are beyond the scope of this work and will be considered in our future work. #3.Our method remains incomplete supervision and fails to learn how to match generic breast structures: We respectfully disagree with this. First, as stated in Introduction, we only claimed that our method aims at densifying sparse labels rather than obtaining a complete one which is unavailable. Second, our densified label still guarantees both spatial and appearance correspondence, enabling matching networks to learn from previously neglected non-cancerous regions which largely reflect more generic breast structures than a few lesion regions only.
#4.Should provide statistical test and benefits compared to lesion-pasting: We have calculated statistical significance and will include it in our final manuscript. Compared to sparse supervision and lesion pasting, improvements by our method are mostly significant on MatchNet and SCDM (p<0.05 except for AUC vs. lesion pasting). Our method is more convenient in clinical usage than lesion pasting as the latter requires ground-truth lesion masks, while our method only needs weakly annotations indicating existence of lesions. #5.Unknown effectiveness on [9] and [14]: In the original manuscript, we have demonstrated the effectiveness of our method on [9] based on results in rows 3 and 5 of Table 2 as [9] was just a MatchNet trained under sparse supervision. [14] aims at further improving matching accuracy of [9] by additionally classifying if patches contain lesions or not. For single lesion matching, our work should have the same effectiveness on [14] as [9]. But for multi-lesion matching, effectiveness on [14] could be unnoticeable since an auxiliary classification may lead the network to find a shortcut of lesion classification but ignore surrounding tissues for matching, which violates the goal of DVT learning as stated in Introduction. #6.Why not cite relevant approaches mentioned by reviewers: The two raised references aim at cancer detection rather than lesion matching which is the focus of our work. Having said that, we admit their methods considered dual-view feature matching to enhance detection, and thus we will cite them in our final manuscript. #7.How the LT-GAN and matching networks work together: LT-GAN is a precondition for training matching networks. Specifically, it is trained to synthesize paired fake lesions. Different matching networks are then trained based on both real and synthesized lesions. #8.Architecture and parameters of the matching networks: As stated in Table 2, we utilized ResNet-101 as a unified backbone, i.e., a share-weighted Siamese ResNet-101 for [5] and a single branch of ResNet-101 for [16] and [10]. Three fully-connected layers are utilized as a unified metric network to predict matching probabilities. Input settings and training hyper-parameters of matching networks have been stated in Introduction and Sec. 3.1 respectively. #9.Thank all reviewers’ efforts. We will clarify remaining minor concerns and details of method and experiments in the final manuscript.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In the rebuttal, the authors addressed most questions. However, the authors suggested that they will report results on DDSM, so it is unclear if that issue will be addressed. Given the positive points of the paper, I believe the paper has enough support to be accepted, so I’m recommending the paper to be accepted.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    7



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper received mixed reviews: ‘prob acc’, ‘borderline acc’ and ‘borderline rej’. The meta-reviewer requested a rebuttal to address the concerns that the reviewer raised, esp. those from R3. The rebuttal in my view addressed these concerns quite well. For example, it offers new results on another dataset, from which a similar performance improvement is perceived. It also presents necessary specifics (LT-GAN, missing references, incomplete supervision, etc.) Overall, I am happy with the paper quality and the rebuttal. The authors should make according changes in their final version per rebuttal.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    8



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents a pretty novel idea for densifying lesion views to enable matching in breast lesion images. With the inclusion of the second dataset result and updates to the references, I think this would make for a pretty interesting MICCAI paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    6



back to top