Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yanbo Zhang, Yuxing Tang, Zhenjie Cao, Mei Han, Jing Xiao, Jie Ma, Peng Chang

Abstract

Calcification is one of the most common and important lesions in mammograms, and a higher BI-RADS category indicates a higher cancer risk. In this paper, we present the first deep learning-based six-class BI-RADS classification for each individual calcification in mammograms. We propose an attention ROI generation strategy to highlight calcification features. Moreover, by incorporating malignancy information, the designed new loss function effectively boosts the performance of the model. We also design a novel evaluation metric for BI-RADS classification, which considers the severity of malignancy. Experimental results have demonstrated the superior classification performance of the proposed approach to the competing methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87234-2_12

SharedIt: https://rdcu.be/cyl77

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    1.BIRADS classification mammographic calcification

    1. Modified kappa measure
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Authors have done immense work. They have many fold contribution in this problem

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Authors should provide training performance too.
    2. How did author confirm the BIRAD classification ground truth? Whether both radiologists see all mammogram? If so, how did authors manage any confusion in BIRAD classification by two radiologists (if any).
    3. Among MAQWK-A and MAQWK-B, which one is better?
    4. I believe more experimentation is required to establish superiority of MAQWK over QWK.
    5. How did they achieve the performance (network connection/ deep network) is missing. Authors should provide details about details network structure.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    1. The details about network is missing without which reproducibility can not be achieved.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    See item 4

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is novel but many detailed information is missing

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    The deep learning-based six-class BI-RADS classification model was presented for each individual calcification in mammograms. An attention ROI generation strategy was proposed to highlight calcification features. The authors designed a loss function for the performance of the model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors proposed a new loss function (equation 4) to consider the biopsy based malignancy aspect. The authors implemented the new model on the real hospital data.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The application is not novel, and the background of this work is really weak due to BI-RADs scoring system was already installed in the mammogram CAD system. Only single institute data was evaluated. The evaluation method is new, but was not proven its feasibility on this work.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This is necessary to provide code and de-identified data on the reproducibility of the paper. So that the readers can use other measurements to check the feasibility of this work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The authors need to clarify why the proposed BI-RADs classification model is needed because BI-RADs scoring system was already installed in the mammogram CAD system. Otherwise, the clinical background of this work is really weak. Is there any clinical value from this work? I also confusing whether this work is focusing on classifying the images based BI-RADs scoring or classifying the images based on biopsy-malignancy status. The authors only used one single institute data and did not show the distributions of patients (age, BMI, etc .). It is hard to judge if the model is working on the other institute data. The authors did not present why the new evaluation method is necessary. Why were not the traditional measurements (sensitivity, specificity, AUC, etc) applied? Is that meaning the performances on traditional measurement were not satisfied? The multi-classes model is a big challenging, so I suggest the authors only work on one score vs other scores.

  • Please state your overall opinion of the paper

    reject (3)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The application is not new and comparison standards are very weak.

  • What is the ranking of this paper in your review stack?

    5

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    This paper proposes the diagnosis of breast calcifications according to the BIRADS categories. The classification is done as a regression tasks using the ResNet-18 as the backbone of the net. The proposed regression loss penalises BIRADS inconsistency. The evaluation is done in terms of the quadratic weighted kappa, which has been modified to account for malignancy estimation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The diagnosis of microcalcifications into BIRADS categories may help to reduce biopsies.

    • The approach is simple and well evaluated.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • There is no comparison with other architectures.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I think the reproducibility of the method is difficult. There are some ad-hoc parameters that are not fully explained probably due to the lack of space. Besides, they use a in-house database.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    This paper presents an approach for breast microcalcification diagnosis. I’m not sure about the benefits of diagnosing individual micro-calcifications, however. All mammograms may have distributed isolates micro-calcifications which are benign. Malignant micro-calcifications appears in clusters.

    Regarding the method, it is based on a regression ResNet-18 with a specific loss. Authors compared different losses, however the architecture is not compared with others. This is a drawback of the paper.

    I understand the necessity of the novel evaluation metric, however, a confusion matrix with the results is necessary to compare the results.

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The approach of the paper is simple and can help in research in breast cancer diagnosis. The novel loss and the malignancy-adjusted quadratic weighted kappa can be useful in other applications.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This work aims at improving CAD decision for classifying calcification in breast MG images based on BI-RADS. Authors propose an interesting twist in solving the problem and that is customizing the loss function for such specific application. This is an interesting idea and in my opinion, worth a discussion at MICCAI 2021. Furthermore, authors propose a new metric for evaluation based on quadratic kappa, which is also interesting. Nevertheless, the work lacks a thorough evaluation of STOA as pointed out by the reviewers. I encourage authors carefully address the major concerns of the reviewers during rebuttal phase: 1) How did author confirm the BIRAD classification ground truth? 2) Elaborate more on the clinical value of the proposed work. 3) Elaborate on why the new evaluation method is necessary and superior to traditional ways.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5




Author Feedback

We sincerely thank reviewers’ efforts. The responses to major concerns are as follows.

Reviewer 1

  1. How to confirm the BIRADS classification ground truth from two radiologists’ annotations? A junior radiologist initially labeled data, which was double-checked by an experienced radiologist to reach a consensus. The checked BI-RADS annotation was used as the ground truth.
  2. Among MAQWK-M and MAQWK-B, which one is better? They are metrics to evaluate malignant and benign data, respectively. It is hard to say which one is better, like specificity and sensitivity, their preference depends on the specific application scenario.

Reviewer 2

  1. The application is not novel, and the background of this work is really weak due to BI-RADS scoring system was already installed in the mammogram CAD system. Is there any clinical value from this work? We respectfully disagree with the reviewer on this comment. 1) BI-RADS is a standard to assess the cancer risk, and radiologists need to estimate a BI-RADS category based on their experience. Although a few commercial AI-based CAD software claimed the function of estimating BI-RADS category, none of them discloses their methods and there is no publication about evaluation for BI-RADS category estimation for calcifications. Moreover, there are no papers about the six-class BI-RADS classification method for calcification, and our work is the FIRST study on this topic. 2) Besides, this work has very important clinical value: the proposed method has the potential to assist radiologists to estimate a more accurate BI-RADS category so that to reduce the cancer-missing rate and unnecessary biopsy. In addition, the proposed MAQWK is a new metric for BI-RADS classification assessment, and is also a general evaluation metric that can be commonly applied to other medical imaging rating tasks.
  2. Whether this work is focusing on classifying the images-based BI-RADS scoring or classifying the images based on biopsy-malignancy status. This work focuses on BI-RADS classification, and we also incorporate biopsy-malignancy status as additional information to achieve better performance. We have presented it in the original manuscript.
  3. The authors only used one single institute data and did not show the distributions of patients (age, BMI, etc.). Our dataset was collected with two vendors’ digital mammography machines, the SIEMENS Mammomat Inspiration (Germany) and the GIOTTO Image MD (Italy), which makes the experimental results more reliable. Furthermore, the core contribution of this work is the developed novel algorithm instead of clinical evaluation, so we believe our dataset is sufficient to justify the efficacy of the method. For the mammographic research topic, not like clinical papers, technical papers in MICCAI usually do not provide age or BMI information.
  4. The feasibility and necessity of the new evaluation method? Why were not the traditional measurements (sensitivity, specificity, AUC, etc) applied? We use QWE and the designed MAQWE as evaluation metrics. Some metrics like sensitivity, specificity, and AUC do not consider the ordinal correlations among classes. In comparison, the QWK is a more proper and widely used metric for assessing rating problems, and the MAQWK is designed by considering the biopsy malignancy, making it closer to clinical needs.
  5. The multi-classes model is a big challenging, so I suggest the authors only work on one score vs other scores. We agree this is a challenging task since the nature of BI-RADS classification is a rating task, and it is correlated among classes. Therefore, it is not proper to pose it as a one score vs other scores problem.

Reviewer 3

  1. There is no comparison with other architectures. The core of the proposed method is the designed new loss. It is independent of CNN architectures.

Common Concerns

  1. The reproducibility of the paper. We will add more implementation details in the revised paper for easy reproduction of the method and results.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    As I pointed out before, this work introduces a few novel ideas that is worth sharing and being discussed at MICCAI 2021. Authors adequately addressed most major reviewers’ concerns in their rebuttal response.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addresses well the first two points. For point 3, even though the new evaluation may be more adequate, it is important that the paper also shows more standard results, so the reader is able to compare with previous approaches. Also, the use of different architectures would help understand the loss robustness with respect to other architectures. I believe the authors should address this issues before the paper can accepted to MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    11



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors’ response about the use of QWE and MAQWE as a new evaluation criteria is satisfactory. The consensus was that the methodology is novel. However after going through the paper, it lacks details of the implementation. The authors mention that they will address this in the revised paper. The experimental results are decent and they compare their results against other methods.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    7



back to top