Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Mickael Tardy, Diana Mateus

# Abstract

Digital Breast Tomosynthesis (DBT) is an emerging imaging technique for breast cancer screening aiming to overcome certain limitations of traditional mammography, such as the superimposition of tissues. On the downside, DBT increases the radiologists’ workload as it generates stacks of high-resolution images, which are time-consuming to review and annotate. In this work, we propose a deep- multiple-instance-based method for DBT volume classification that relies on the local summarization of DBT slices (referred to as slabbing) and only requires volume-wise labels for training. Slabbing offers several advantages: i) it reduces the classifier’s computational complexity across the depth, letting it focus on the higher transversal resolution. Thanks to this strategy, we are the first to train a method at almost full-resolution (as high as 120x2500x2000); ii) it produces slabs that are closer to standard mammography, favoring an efficient transfer from classifiers trained on larger mammography databases; and iii) the slabs combined with a Multiple-Instance Learning (MIL) classifier result in localized information favoring interpretability. The proposed slabbing MIL approach is also novel for the automatic classification of DBTs. Moreover, we propose a trainable alternative to the handcrafted slabbing algorithms based on slice-wise attention that improves performance. We perform an experimental validation on a subset of the public BCS-DBT dataset and achieve an AUC of 0.73 with five-fold cross-validation. On a private multi-vendor dataset we obtain a similar AUC of 0.73, demonstrating an excellent performance consistency.

SharedIt: https://rdcu.be/cyl79

N/A

N/A

# Reviews

### Review #1

• Please describe the contribution of the paper
• Propose a MIL approach using DBT slices for breast cancer screening
• Employ slice-wise attention in the learning model for the local summarization of DBT
• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• This study is clinically motivated and well organized.
• The MIL approach is reasonable in DBT-based cancer screening with high-resolution DBT.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• In the experiments, the slabbing results are not clearly described. How are they different from the conventional approaches like MIP or SoftMIP. Quantitative comparison may help understand the benefits of the proposed method.
• The proposed method assume that the slabbing is performed for the subsets (slice group) of DBT. The parameter T may be related to the slice thickness and spacing.
• How is the bounding box acquired? There is margin? It is required to describe the bounding box acquisition more clearly.
• The test data size seems not enough to evaluate the generalizability of the proposed method for the DBT.
• Please rate the clarity and organization of this paper

Satisfactory

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Implementation details are described in the manuscript. However, the PMV-DBT is not published.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
• The benefits of the trainable slabbing are not clear, except the classification performance improvement. How are their results different from those of the conventional approaches?
• The resolution is very high, and it seems to be overfitted very early due to the limited size of the dataset. Is there any regularization method applied?

borderline accept (6)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Performance improvement and clinical motivation

• What is the ranking of this paper in your review stack?

3

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

### Review #2

• Please describe the contribution of the paper

This paper proposes an interpretable intermediate representation that condenses high-resolution information to ease the Digital Breast Tomosynthesis (DBT) images volume processing and device a method that summarizes the volume of images into a small number of views which are generated from a group of contiguous slices. This work gives trainable attention-based model for generation and use of slabbing for DBT classification and also end-to-end method which is capable of processing full-resolution DBT volumes.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) The proposed method, slabbing MIP is noble and shows better performance compared with other methods

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1) This paper doesn’t compares with other state-of-art work 2) Different combinations of sllabbing thicknesss and other parameters in determining the performance are not discussed 2) Full resolution of DBT volumes is not clearly explained.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

No data/code provided

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Authors can compare the performance with other state-of-art algorithms

probably reject (4)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Probably reject because there are no performance results shown compared with other state-of-art algorithms

• What is the ranking of this paper in your review stack?

3

• Number of papers in your stack

3

• Reviewer confidence

Very confident

### Review #3

• Please describe the contribution of the paper

The paper descripes the development of a slabbing based methodology to improve the classification of mammographic abnormalities in DBT data.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

With the use of DBT increasing, this is a very timely concept.

Full DBT volumes can be processed.

Evaluation is covering several datasets.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Novelty is incremental.

Comparison with other approaches is limited.

No statistical significant improvements made, although the trend is there.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The description of the methodology is specific enough to be reproducible. It is less clear if there is access to all the data.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Could this be directly compared with 3D deep learning techniques applied on the full resolution DBT data.

Could this be translated to location specific annotations?

Probably accept (7)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Interesting application area, but novelty is incremental. Evaluation is good, but comparison could be more detailed.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

2

• Reviewer confidence

Very confident

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposes a new method for for breast cancer screening using Digital Breast Tomosynthesis (DBT) slices.

The key strengths include: 1) This study is of greatly clinical relevance 2) Evaluation is conducted on several independent datasets.

The key weaknesses include 1) Some details about the experiments are missing. 2) No compare with other state-of-the-art techniques.

Therefore, some feedback from the authors is needed for the further evaluation.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

11

# Author Feedback

We thank the reviewers and AC for their comments. We agree that two major strengths are the clinical relevance associated with the new Digital Breast Tomosynthesis (DBT) modality and the evaluation on two independent (a public and a private) datasets. A third strength is the computational advantage of our method, which allows handling high-resolution (60x2500x2000) DBT volumes.

The major weakness identified by the three reviewers is the presumable lack of comparison to state-of-the-art (SOA) methods. We believe this remark was due to a lack of clarity in the description of our experiments. The comparison to a SOA method is already part of the experiments presented in Table 1, where the line called “MIL” corresponds to the method in reference [23] with a small adaption. We chose [23] as it is one of the most recent (2019) DBT classification methods relying on volume-wise labels. Our adaption favors transfer learning from mammography (i.e., instead of training a classifier from scratch) and allows the processing of full-size DBT slices (instead of 1024x1024 in [23]). We will clarify that the MIL approach refers to this adaption of [23]. Furthermore, we could add the AUC metrics of the original implementation of [23] without the adaption: “Method & Transfer learning & Fully trainable & Partially trainable & Not trained” “MIL ([23] adapted) & 63.80±9.74 & 67.03±6.72 & 64.30±8.62” “[23] & NA & NA & 62.27$±$10.62 & NA”.

where NA indicates the training strategy is not relevant. Another, more recent (2020), SOA classification method Doganay et al [7], rescales images to 256x256. As discussed on page 2 (see “However, recent…”) such scale prevents capturing smaller findings. We have studied the effect of downsampling in Section 3 and Fig. 3, showing that lower resolutions result in worse performance. Moreover, the method of [7] requires annotations at the slice level. These reasons justified our initial choice to not directly compare against [7]. If considered relevant, we can add this comparison to [7] to the experiences in Table 1 as follows “Doganay et al. & NA & 60.47$\pm$8.15 & NA & NA“. We see that on our dataset both SOA methods, [23] without adaption and [7], are outperformed by [23] with adaption and by the proposed method.

The quality of the slabs produced by our method vs. MIP or SoftMIP (R1, R4) was evaluated indirectly through the classification task (see Table 1) since no direct quantitative metrics exist [6]. Moreover, we note in Fig. 3 that the performance decreases with thicker slabs, validating the interest of multiple thinner slabs, compared to unique volume-wise MIP or SoftMIP projections.

Regarding the slab thickness (R1 and R4), we draw attention to Fig. 4 and to the “Slab thickness study” paragraph in section 3, where the performance for different thicknesses was reported. R1 suggests we set T also according to the slice spacing, this is what we do as stated in the “Hyper-parameters” paragraph.

Regarding the R5’s question, 3D deep learning techniques could not be directly applied to full-resolution DBT images due to hardware limitations. That is, a full DBT volume has a size of approx. 60x2500x2000, which can not be fitted for training in modern GPU hardware (e.g., NVIDIA Tesla V100 or RTX 2080 ti).

Concerning the access to data (R4 and R5), we note that the BCS dataset [3] is public. As for PMV datasets, they can be shared upon justified requests and right-holder approvals. We will mention it in the dataset paragraph of section 3.

Concerning the bounding box cropping (R1), it is a common preprocessing technique consisting of removing empty background columns and rows of an image. For clarity, we will add a statement in the “Data Preparation” (sec 3) and revise Fig 1.

Finally, we observed no overfitting (R1) in our experiments. To prevent it we used data augmentation described in “Data preparation” and we rely on transfer learning from a bigger mammography dataset (i.e., 2000 images).

# Post-rebuttal Meta-Reviews

## Meta-review # 1 (Primary)

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposes a new method for for breast cancer screening using Digital Breast Tomosynthesis (DBT) slices. I agree that this is a borderline paper, but I think the proposed method may potentially benefit the field. Therefore, I recommend “accept”.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

9

## Meta-review #2

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The main objection from reviewers is the lack of comparison SOTA method. Authors’ rebuttal somewhat answers this question. Given that the study has high clinical significance, and the method is interesting, we recommend to accept this paper.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

9

## Meta-review #3

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

DBT diagnosis is a very timely topic, and a challenging one due to large volume sizes. Instead of max-pooling individual slices (MIL), the paper proposes to group them in “slabs”, so-called summarization, before combining for a decision. Echoing the reviewers’ assessments, the contribution value of this is relatively incremental.

Original comparison with a baseline of MIL from the conference paper [23] (and adding [7] in rebuttal) are also not too convincing. These baselines are motivated as the only classification-based ones in DBT, although methodologically a similar restriction of learning from image-level labels occur in a wide range of medical applications, from histopathology to MR&CT based radiology reports. Whether the presented 60-70% AUC is clinically acceptable is also another question.

On a problem motivation level, in DBT just a diagnostic decision (volume classification) alone is probably less practical, as any method would need to be able to reason a decision to the radiologist, e.g. showing the mass or micro-calcifications for a positive result, therefore the clinical applicability of the given method as-is is also doubtful.

So, I would not champion the acceptance of this work myself, but given the timeliness of DBT, I would not be against its acceptance either.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

11