Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Shivam Kalra, Mohammed Adnan, Sobhan Hemati, Taher Dehkharghanian, Shahryar Rahnamayan, Hamid R. Tizhoosh

Abstract

Deep learning methods such as convolutional neural networks (CNNs) are difficult to directly utilize to analyze whole slide images (WSIs) due to the large image dimensions. We overcome this limitation by proposing a novel two-stage approach. First, we extract a set of representative patches (called mosaic) from a WSI. Each patch of a mosaic is encoded to a feature vector using a deep network. The feature extractor model is fine-tuned using hierarchical target labels of WSIs, i.e., anatomic site and primary diagnosis. In the second stage, a set of encoded patch-level features from a WSI is used to compute the primary diagnosis probability through the proposed “Pay Attention with Focus” scheme, an attention-weighted averaging of predicted probabilities for all patches of a mosaic modulated by a trainable focal factor. Experimental results show that the proposed model can be robust, and effective for the classification of WSIs.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_34

SharedIt: https://rdcu.be/cymap

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This study proposes pay attention with focus scheme for whole slide classification. The method has been tested on large testing sets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The main strengths is are two-folds: (1) technical novelty in FocAtt-MIL module (2) large dataset for testing
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

There are a few weakness: (1) In Fig.2 (b) FocAtt-MIL, the internal structures for attention network, focal network, prediction MLP are not provided. (2) it lacks the comparisons with other WSI classification method that uses attention mechanism.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

For reproducibility, more detail explanations about Fig.2.(2) and implementation sample codes should be provided
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

It is suggested to polish the contents and remove grammar errors.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Technical contribution and evaluations.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

The paper proposed a two-stage attention MIL-based model to address the challenge of analyzing large-scale histopathological slides with slide-level labels. The model was developed on publicly available TCGA dataset.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The proposed model can be trained with slide-level labels and doesn’t require expensive manual annotations.
- The model was validated on different disease domains.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Discussion on how hyper parameters were selected may be needed, given that the dataset was split into training and testing (i.e., no validation set).
- Comparison with closely related MIL methods may be needed (e.g., the attention MIL method, which aggregates instance-level features for slide classification [1]).
- Besides accuracy, average precision and AUROC may need to be reported.
[1] Ilse M, Tomczak J, Welling M. Attention-based deep multiple instance learning. InInternational conference on machine learning 2018 Jul 3 (pp. 2127-2136). PMLR.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper performed experiments on the publicly available TCGA dataset and the authors agreed to provide training/validation code upon acceptance.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- What is the scanning magnification for input slides? Will the downsampling operation (1000 * 1000 -> 256 * 256) cause the loss of finer details, which may be important for cancer grade classification?
- In the second experiment, did the model classify cancer grades or just types of cancer? For example, in the prostate / testis, was the model trained to predict different Gleason grades or just to differentiate prostate cancer versus testis cancer?
- In the second experiment, the authors combined slides from multiple types of cancers. Yet, informative histopathological features could be different among some cancer types. It may be interesting to incorporate domain knowledge here and only to combine slides from cancer types that are more likely to share similar informative features (e.g., colon and prostate cancer.)
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors performed extensive experiments on different types of datasets provided by TCGA to evaluate model performances. Additional evaluation metrics (e.g., average precision) may be used besides the accuracy. Also, it could be nice to include comparison results from the related attention MIL model as mentioned above.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The paper describes a two-stage algorithm to classify WSIs. The first stage helps in selecting regions (or tiles) that are used in the second stage where a MIL is used for classification
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The problem with using traditional CNNs on WSIs is accurately identified. The two-stage idea proposed is a neat solution to identify important regions that are meaningful in terms of classification. Training feature extractors to identify important regions, freezing them, and then using it in classification can become a part of clinical setup.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. It’s unclear how the ground truth for mosaic was obtained.
2. How did the authors map attention regions to malignant regions. Were the malignant regions marked before attention was obtained and then the overlap was measured? How do we know if the attention map is sensitive (covers all malignant regions) or specific (all mapped regions are malignant). There should a metric to map this aspect. 3.The method described in this paper looks similar to Ianni et al. Tailored for Real-World: A Whole Slide Image Classification System Validated on Uncurated Multi-Site Data Emulating the Prospective Pathology Workload. Sci Rep 10, 3217 (2020). https://doi.org/10.1038/s41598-020-59985-2
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

With more details on the training phase hyperparameters and ground truth details, this method can be reproduced.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Kindly answer the questions from weaknesses section.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. The attention maos need more validation.
2. A different paper as cited above has a methodology very similar. Authors might have to draw parallels and state improvements in their proposal.
What is the ranking of this paper in your review stack?

5
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper proposed a two-stage attention MIL-based model to address the challenge of analyzing large-scale histopathological slides with slide-level labels. The first stage helps in selecting regions (or tiles) that are used in the second stage where a MIL is used for classification. The idea is somewhat novel and mimics the clinical reading procedure. The model was validated on a large data set. The rebuttal should address the following issues: 1) comparison with other MIL method with attention mechanism. 2) clarify technical details and experimental setup.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

9

Author Feedback

We thank reviewers for their time and valuable comments. We are glad that R1 and R2 found our approach to be technically novel, and comprehensively validated. As mentioned by R2, our approach does not rely on expensive and laborious manual delineation from experts that allows us to take advantage of learning from large publicly available datasets.

*R1, R2, R3: Implementation Details. Aligning with the MICCAI’s vision of reproducible research, we will release the source codes and implementation details of our models publicly.

*R1, R2: Comparison with other Attention-based Methods. We compared our approach with (Kalra & Adnan et al. [11]) which uses an attention-based mechanism in Table 1, we achieved 3% higher accuracy. The main novelties of our method are focal factor and global context learning along with the attention-weighted pooling, which results in 6% improvement of accuracy (Section 4, LUAD vs LUSC). Global context learning alone improves the accuracy by 4%. We shall extend Table 1 with these details.

*R2: “What is the scanning magnification for input slides? Will the down-sampling operation … cause the loss of finer details”. For the LUAD/LUSC dataset, we have mixed 20x and 40x slides, and for the second experiment, we have used 40x slides only. We did not down-sample patches for the first experiment. However, for the second experiment, we down-sampled patches to 256 x 256, which corresponds to the 5x magnification. Based on [1], computer-aided diagnosis prediction can be reliably conducted at 5x and it performs at par (or better) than 20x magnification. We shall add this discussion in the final camera-ready paper.

[1] Coudray, Nicolas, et al. “Classification and mutation prediction from non–small cell lung cancer histopathology images using deep learning.” Nature medicine 24.10 (2018): 1559-1567.

*R3: It’s unclear how the ground truth for the mosaic was obtained. The ground truth for mosaic is the same as the label for the associated WSI. We will highlight this in the final camera-ready version.

*R3: How did the authors map attention regions to malignant regions? We do not have the information about the malignant regions of a WSI. The proposed method learns to pay attention to patches that enable more accurate overall prediction. These automatically learned “attention” values are then mapped onto the WSIs for visualization purposes (Figure 3). We showed these visualizations to a medical expert for interpretation (Section 4, LUAD vs LUSC classification). We will add another visual inspection of a pathologist to the camera-ready version.

*R3: How do we know if the attention map is sensitive (covers all malignant regions) or specific (all mapped regions are malignant). Our work is premised upon the unavailability of regional annotations for WSIs. To the best of our knowledge, we are not aware of the large public datasets with delineated malignancy regions. The unavailability of such datasets makes attention validation difficult. We validated by showing to an expert. We shall clarify this more clearly in the final camera-ready version of the paper.

*R3: Similarities and Difference to “Ianni et al. Tailored for Real-World: A Whole Slide Image Classification System Validated on Uncurated Multi-Site Data Emulating the Prospective Pathology Workload.” Thank you for bringing another WSI classification research work to our attention. However, there are many differences between their and our approach. i) We use Deep Set architecture to compute a global context of a WSI used for computing attention and focal factor. ii) We utilize the hierarchical labels of WSIs to extract more robust features. iii) We pool the attention-weighted predictions modulated by the focal factor as the final prediction. The main technical contributions of our approach are different or non-existent compared to the mentioned paper. We shall discuss their method in our literature review section in the final camera-ready paper.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper describes a two-stage algorithm to classify WSIs. The first stage helps in selecting regions (or tiles) that are used in the second stage where a MIL is used for classification. The method has certain novelty and achieves good performance on a large pan-cancer data set. The rebuttal sufficiently addresses reviewers’ concerns.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper proposes a two-stage architecture for WSI classification, in which the 1st stage selects the regions and MIL is used on the regions for classification in stage two. Proposed model is applied on slide-level which does not require additional annotation, and the motivation is reasonable. More information about the experiments and related work may be helpful. The authors have addressed most of the reviewers’ comments in the rebuttal appropriately.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

I think the method itself is moderately novel (although other methods can also obtain attention) and I really like the extensive evaluation of the method on pan cancer TCGA data. The authors rebuttal clarify a few methodological details. Plus, authors promise to release the code, which can be potentially used for many analysis based on TCGA data and therefore I support to accept the paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

back to top

Pay Attention with Focus: A Novel Learning Scheme for Classification of Whole Slide Images