Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Yinhao Ren, Jiafeng Lu, Zisheng Liang, Lars J. Grimm, Connie Kim, Michael Taylor-Cho, Sora Yoon, Jeffrey R. Marks, Joseph Y. Lo

Abstract

In mammography and tomosynthesis, radiologists use the geometric relationship of the four standard screening views to detect breast abnormalities. To date, computer aided detection methods focus on formulations based only on a single view. Recent multi-view methods are either black box approaches using methods such as relation blocks, or perform extensive, case-level feature aggregation requiring large data redundancy. In this study, we propose Retina-Match, an end-to-end trainable pipeline for detection, matching, and refinement that can effectively perform ipsilateral lesion matching in paired screening mammography images. We demonstrate effectiveness on a private, digital mammography data set with 1,016 biopsied lesions and 2,000 negative cases.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_33

SharedIt: https://rdcu.be/cyl57

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes breast cancer detection in mammography including ipsilateral information. The detection is done in three steps, in the first RetinaTrack is used in each view to detect suspicious regions. Subsequently, candidates from both views are correlated by a Distance metric that has been learnt from ground-truth by a greedy-based approach. The third step aims to fuse the information of the candidates.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The proposal for abnormality detection includes ipsilateral information (CC and MLO views). This is an important aspect which actually mimicks the radiologist procedure. Using only one view, lesions can be ocluded or false positive may arise due to overlapping tissues.
- The proposal is evaluated using a large dataset of images (almost 3000 images in the training set).
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The use of the Distance metric is controversial. This step is replacing the typical CC-MLO registration using a learning-from-examples stratregy instead of using the own physics of the mammographic acquisition.
- A second weakness of the method is that it not used bilateral information, which is the first comparison commonly done by radiologists, even before the CC/MLO.
- Although the dataset used to evaluate the approach is large, the ratio of malignant cases over total cases is 1/10, which is far from the one in screening (around 1/1000).
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I think the reproducibility of the method is difficult. There are many ad-hoc parameters that are not fully explained probably due to the lack of space. Besides, they use a in-house database
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

This paper presents another CNN-based approach for breast cancer detection. It is based on the use of RetinaTrack to detect suspicious regions in MLO and CC views and a subsequent step joining the information. This step is divided into the candidate matching and the information fusion.

I’m not sure section 4.2 & 4.3 should be in Experiments, perhaps would be better placed at the end of Section 3 (i.e. section 3.2).

Regarding the method itself, authors used the Distance metric as a way to avoid the physical correspondence in CC/MLO registration. This step limits interpretability of the approach and depends on having enough samples in the training set to interpret all the possible cases of mass locations.

As already commented, my main concern is about the evaluation performed. Authors should add much more normal mammograms to be sure the number of FP doesn’t increment largely.

Section 4.4 is comented without numbers. Seems strange.

I don’t understand why authors show two FROCs, one for benign & malignant and the second one for malignant only. Should be fairer to show benign only and malignant only, even the first one doesn’t favour the proposed approach.

It would interesting to show the values of alpha and beta optimised by the net. Are similar to the one expected by the authors?

The paper misses statistical analysis of the results (including the lack of the confidence interval of the FROCs).

There is no comparison with the state of the art. How significant is the proposal of the authors with respect to other proposals?
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I tend to accept the paper due to the single end-to-end training that avoids intermediate steps. However, the experimental section should be improved to gauge the significance of the approach.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper introduced an ipsilateral lesion match step in single-stage detector pipeline to improve lesion detection performance in Mammography images. The method was inspired and adapted from Retina-Match, allowing for an end-to-end training. Experiments using a large, private, curated dataset was conducted to demonstrate the effectiveness of the proposed approach.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well organized & fairly well written. Clinical context (the use of all views in Mammography lesion detection by clinicians) was well introduced.
- The analysis of state-of-the-art was extensive. Ablation study & comparison with state-of-the-art methods was thorough.
- Dataset used to demonstrate the performance is relative large & of good quality compared to public datasets (DDSM, INBREAST …)
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- This paper lacks details in description of the architecture, in particular how the distance metric network connects with other components.
- Mathematical notations needs further clarification, especially explaining the indication of super/sub-script letters.
- Performance graphs lacks error-bars, making the statistical significance of the comparison unclear.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- Dataset is private, with minimal description. No descriptions on patient demographics & distribution per types of lesions.
- Training & evaluation codes will be make public. Training details were mentioned in paper (hardware, software framework, key hyper-parameters). Training time & methods to finetune the hyper-parameters were not detailed.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
Chapter Introduction, 3rd paragraph
- “Nevertheless, this design assumes the latent features of each lesion candidate can affect all other candidates, breaking the clinical assumption of one-to-one matching.” Could the author add some references (clinical papers) to back this up?
- Section Lesion R-identification: please clarify some notions. y_match^k, f_CC^i, f_MLO^j, what do k, i and j refer to?
Section Lesion Matching Logic
- FP-FP is also assigned positive label 1? Please clarify.
- What does p^i_match refer to?
- “The proposed lesion renement loss is than computed…”: typo, than -> then
- Equation (3): clarify what N and M refer to.
Section 4.1 Dataset
- In biopsied soft-tissue lesion cases: only biopsied lesions were annotated. Also, BIRADS 2 patients may have non-biopsied benign lesions that are not annotated & treated in this work. This was clarified later in section 4.6 Comparison with the Relation Block approach, it would be more clear for readers to understand if the author clarify here.
Section 4.2 Network Architecture
- The description of architecture lacks details: specifically, how does the distance metric network connect to the 4 regressors and the greedy matching? This should be more clearly stated and illustrated in Figure 1.
Section 4.5 Ablation Study, Figure 2
- There is no error bar in this graph, making it difficult to compare the relative performance of different training set-ups. Also the sensitivity values in these two graphs do not seem to plateau yet, and seem to converge for different set-ups when the false positive rate becomes greater. It would be more convincing to show more sensitivity values with larger false positive rates, to compare the max sensitivity of different training set-ups.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Ipsilateral matching in Mammography is a clinically relevant topic. The proposed method was well-described in general. The experimental results, minus some minor details remain convincing. Seeing this type of research for 2D and 3D Mammography lesion detection is important for the research & clinical community.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

This paper proposes an end-to-end network for lesion detection from paired CC-MLO mammography views. The network is capable of simultaneously performing three-tasks: lesion detection on each individual view, CC-MLO matching (via a Siamese network), and classification refinement by integrating information from the two views. Specifically, the scores of the matched bounding boxes are linearly combined, with weights that are not fixed, but predicted by the network and are specific for each anchor. Compared to existing architectures, such as MommiNet where all features embeddings from all detections are input to a Relation Network, the architecture assumes one-to-one matching between regions on the CC and MLO view, which is more similar to the typical radiologists’ workflow. Extensive experimental evaluation shows performance gain with respect to the traditional single-stream RetinaNet as well as other recently proposed dual-stream architectures based on Relation Networks.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper proposes an architecture which is novel and original, and allows to effectively perform lesion detection on dual-view mammography.
- The output of the network is easily interpretable by the radiologists, as it closely resembles the workflow employed a human reader.
- The method does not require the two views to be registered
- Strong experimental validation is provided both against single-view detection, as well as against other competing methods, showing clear advantages in terms of performance.
- The paper is clearly written and easy to understand
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- RetinaMatch tackles two tasks at once: CC-MLO matching, and lesion detection. However, only the latter is experimentally evaluated. The effectiveness of matching, i.e., how often a lesion is actually matched to its corresponding ipsilateral view, is not evaluated.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The paper is not technically reproducible as the authors do not provide the code nor the dataset. The latter however, is very common in mammography since public datasets are rather small and/or based on screen-film mammography. The description of the method is quite clear, and should be relatively easy to reproduce. The following parameters are missing:
- Number and type of anchors
- Input image size
- Formulation of the regression loss
- Weights of the individual components of the loss
- Transformations used for data augmentation The authors should also clarify what they mean by lesion-level sensitivity.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
The paper proposes a novel architecture for ipsilateral matching and lesion detection in mammography. Compared to previous architectures, such as MommiNet, the proposed architecture enables explicit one-to-one matching between lesion view, which directly mimics the radiologist’s workflow and provides an interpretable output to the radiologist. The other advantage of the proposed architecture is that it focuses the attention on local features, whereas other architectures (e.g., relational block) are prone to focus on global features. I agree with the authors that local features are essentially in discriminating masses from false positives due to tissue superposition. I do not fully agree that global features do not capture useful information; global aspects, e.g., related to breast density, or to the presence of diffuse calcifications, may impact the final classification.

The paper is generally well written, and the experimental section well curated. Some points could be expanded or clarified, perhaps relying also on supplementary material given the space constraints:
- In the related work, the sentence ” GCNet [2] showed that general non-local blocks often degrade into a global feature extractor.” Should be further clarified and put in the specific context of mammography
- In Section 3, does the distance function D takes into account the position, as well as the embedding?
- Why are the embeddings explicitly regressed, instead of being extracted from the classification network or the backbone? What is the advantage of this additional head?
- In Section 4.1, the authors mention the application of a pectoral muscle segmentation model for the MLO view, and that a 2D breast depth encoding model is constructed. However, it is not clear how this information is used in in the architecture. I suggest adding it to Figure 1.
- I understand the advantage of regressing parameter alpha and beta, instead of fixing them manually. However, I wonder what is the specific advantage of having anchor-specific parameters, with respect to constant values optimized over the entire dataset.
- It is not clear in Table 1, what is the difference between case, view and annotation - especially view and annotation. What are considered as lesions in calculating the sensitivity? Please clarify and use a consistent notation across the paper
- All models should be trained for an equal number of epochs. From the description of the methodology, it appears as the single-view RetinaNet is trained from 25 epochs, and then RetinaMatch is further fine-tuned for 50 epochs. The single-view model should correspondingly be further fine-tuned for 25 epochs. In future works, it would be interesting to compare the convergence properties of different models models, i.e., whether RetinaMatch converges faster or slower than RetinaNet.
- A few examples that illustrate and compare the detections by different models would be nice.
- The probability of each candidate lesion on each view is modified based on the probability of the corresponding matched lesion candidate on the opposite view. How is the final lesion probability calculated to calculate the per-lesion probability?
- The differences between the models are subtle, an analysis of their statistical significance should be included
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is an interesting paper with well-conducted experiments, with proper baseline and comparisons with other methods. The proposed method is not a trivial extension of existing models and has practical advantages (in terms of simplicity and interpretability) compared with existing methods. The paper is well written and easy to understand.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

6
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposes a lesion detector and classifier by analysing CC and MLO views. All reviewers acknowledged the innovation of the method and the solid experimental results, so I am recommending the provisional acceptance of the paper. There are a few points that need to be addressed: 1) can the authors assess the CC-MLO matching, by checking how often a lesion is actually matched to its corresponding ipsilateral view?; 2) the reviewers requested a few methodological clarifications; and 3) can the paper add statistical significance tests to the results?
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Author Feedback

We really appreciate the reviewers for your constructive feedback. We have corrected the indicated language errors and made a few cleanups for the mathematical notations and model architecture description for the camera-ready version.

Here we would like to respond to a few other points been raised:

Regarding the lesion matching performance, we derived our final formulation from a patch-based matching model, where this evaluation is available. The patch level exhaustive pair vs not-pair AUC is around 0.96 when we conduct the matching operation at 3 FP per image. We will include some more detailed matching evaluations in the final camera-ready version.

Regarding Lesion Matching Logic, we treat TP-TP pairs as positive and TP-FP pairs as negative. We believe lesion level matching performs better than full mammogram registration (can be supervised and less overhead), and we will conduct more comparisons in the future to prove this point. For position encoding, we used the pectoral muscle segmentation to construct the nipple distance of each lesion.

Regarding the statistical significance for this study, we made the assumption that our dataset is relatively large thus decided to not include a cross-validation study that could yield a confidence interval for the FROC. We will pay more attention in our future studies to add statistical significance tests.

Thanks again for all the above throughout reviews. We will definitely incorporate those comments into the design of our future studies.

back to top

Retina-Match: Ipsilateral Mammography Lesion Matching in a Single Shot Detection Pipeline