Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Xiyue Wang, Sen Yang, Jun Zhang, Minghui Wang, Jing Zhang, Junzhou Huang, Wei Yang, Xiao Han

Abstract

A large-scale labeled dataset is a key factor for the success of supervised deep learning in histopathological image analysis. However, exhaustive annotation requires a careful visual inspection by pathologists, which is extremely time-consuming and labor-intensive. Self-supervised learning (SSL) can alleviate this issue by pre-training models under the supervision of data itself, which generalizes well to various downstream tasks with limited annotations. In this work, we propose a hybrid model (TransPath) which is pre-trained in an SSL manner on massively unlabeled histopathological images to discover the inherent image property and capture domain-specific feature embedding. The TransPath can serve as a collaborative local-global feature extractor, which is designed by combining a convolutional neural network (CNN) and a modified transformer architecture. We propose a token-aggregating and excitation (TAE) module which is placed behind the self-attention of the transformer encoder for capturing more global information. We evaluate the performance of pre-trained TransPath by fine-tuning it on three downstream histopathological image classification tasks. Our experimental results indicate that TransPath outperforms state-of-the-art vision transformer networks, and the visual representations generated by SSL on domain-relevant histopathological images are more transferable than the supervised baseline on ImageNet. Our code and pre-trained models will be available at https://github.com/Xiyue-Wang/TransPath.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_18

SharedIt: https://rdcu.be/cyl9X

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors present a self-supervised learning approach for pretraining histopathology images. To this aim, a CNN-transformer architecture is designed for making use of local and global receptive fields to extract discriminating features.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Leveraging large-scale histopathology image for pretraining is well-motivated
- Using less annotated data for machine learning is an important topic in medical image analysis
- The results demonstrate the effectiveness of the proposed results
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The detailed evaluation process is not described.

Lack of novelty.

Lacks comparison with existing pertaining methods.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

As the data and code will be open sourced, this work is reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- The detailed evaluation process is not described. The aim of pretraining is to learn better representations so that less annotated data is required. A more detailed experiment is required to show how much annotated data is needed for down-streaming tasks after pertaining.
- Lack of novelty. Much of this paper is just a combination of existing approaches from computer vision.
- Lacks comparison with existing pertaining methods.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper address the important topic of learning better representations with less annotated data. However, it lacks novelty and more detailed experiments are required.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper presents a hybrid framework by combining self-supervised pretrained CNN with a modified transformer architecture for histology image classification. The proposed method was evaluated on multiple datasets and demonstrated with improvements over CNN and combined CNN with multi-head self-attention based transformer methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

– Application of transformers is relatively new to the histopathology domain. – The method was tested on multiple datasets. – It shows a good improvement over CNN and CNN+Trans methods with a big margin. – Pre-trained on a very large dataset, which is interesting and also challenging.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

– The novelty of this submission is limited. It’s an application of the existing works BYOL and transformers. – The paper didn’t compare with the state-of-the-art work in self-supervised learning, i.e., SimCLR, MoCo.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I carefully assessed the sensibility of the experiments, and the results seem convincing.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

– I am curious to know the performance of the method with fine-tuning on 1%, 10%, and 50% labeled data on the downstream task. Since the method has been pretrained on a very large database, it would be really interesting to know how the method performs under limited label settings? – In Table 1, it would be interesting to know the performance of CNN+Trans+SSL, without the TAE module; in this way, we could see whether the TAE module or pre-training on a larger dataset has lead to improved performance gains? – Most of the contrastive learning methods (such as SimCLR and BYOL (ref 2, 5)) are highly sensitive to data augmentations. I am curious to know why did the authors use similar data augmentation strategies as SimCLR, which are tailored for natural images. There are well-established augmentation methods [1] that have been shown to reduce variation in domain shift and improve out-of-distribution (OOD) generalization, which could be future work in this direction.

[1] Tellez, David, et al. “Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology.” Medical image analysis 58 (2019): 101544.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors present an exciting application of transformers to the histopathology domain. Furthermore, through extensive validation on large datasets, they showed substantial improvements in image classification benchmarks.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

The paper adopts a self-supervised deep learning strategy with transformers and adding one module called token-aggregating and excitation (TAE) before self-attention to capture both local and global patterns of pathology slides. The authors trained the network with TCGA data applied results on three datasets and the results were promising. The huge computational resources (3.2k V100 hour) is the backbone of this work.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Training a model in a well-known and large public datasets and showing the strangeness of the work by applying it on 3 public datasets.
- Adding TAE on the transformer to better global feature extraction, This simple mechanism improved the network’s accuracy considerably.
- Conducting an ablation study to show the power of the architecture design. This study helps the reader to understand the role of each component.
- Comparing with state-of-the-art models, and showing better results on three public datasets.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- TCGA consists of 22k frozen tissue and 11k FFPE tissues (diagnosis). Although both frozen and FFPE from one case (i.e. LUAC) but the pattern is totally different. Maybe removing frozen be a better idea.
- I anticipate seeing the state-of-the-art numbers for each dataset (even if they were better with specialized methods). This is acceptable that a method with generalized power provides slightly lower accuracy.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

They stated that the codes will be shared. It means even if the work missing reporting some parameters, which is the case, readers could find them in the shared codes.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- “The cropped histopathological image patches are usually large to capture both the cell-level structure and and tissue-level context”, large is not a proper word. Many study are working with 64x64 which is small to ..
- “total of 32,529 WSIs from the cancer genome atlas (TCGA)” separate frozen tissue from PFFE
- BN in figure 1 is not intrucuced in the text.
- Conclution could be expanded a little more to provide more details on the methods.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is strong. Codes and results would be helpful for other researchers in the comunity.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

All reviewers acknowledge that the paper addresses an interesting topic in medical image analysis and the benefits of the proposed method. While the technical novelty of this paper is still limited, the application of Transformer coupled with self-supervised learning for histopathological image classification is new and the results clearly show the effectiveness of the proposed method. Therefore, the paper can be interesting for MICCAI community. Please address the all the comments raised by all reviewers to further improve the paper.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Author Feedback

We would like to thank the reviewers for their time spent on reviewing our manuscript and the overall very positive ratings. We also appreciate the insightful comments from all the reviewers, which are very helpful to further improve our manuscript and to plan our future work. Our responses to the major comments are provided below. Citations are numbered in the same order as in the final manuscript.

General Responses (R2/R3) The technical novelty of this paper is still limited. We would like to point out that our model is not just a simple application of the existing works. Although our main framework is a combination of CNN, transformer, and BYOL, we have added a customized TAE module to extract global information. The designed TAE mechanism, even though simple, has improved the network performance by a large margin (cf. ablation study). Moreover, as summarized in our contributions, our model has been pre-trained on very large public datasets (approximately 2.7 million images with the size of 2048×2048 pixels), evaluated on other three public datasets, and has achieved superior performance compared with existing vision transformer networks (cf. Fig.2). Our pre-trained and to-be-released TransPath model learns histopathology-specific feature representations from a large set of histopathological images, which has the potential to be transferred to any histopathological image analysis tasks, which we also deem to be an impactful contribution.

(R2/R3) Lacks comparison with existing pretraining methods. We have tried using MoCo as the pre-training method, which produced similar results as the BYOL. We did not discuss it in the manuscript due to space limitation. We will include a more thorough comparison in our extended paper in the future.

(R2/R3) Performance of the method with finetuning on different numbers of labeled data. Self-supervised pre-training with a large amount of data makes it possible to use fewer annotations to achieve good performance in downstream tasks [2]. In our experiments, the MHIST dataset is about 1/30 of the other two datasets, and using the pre-trained model provided a higher performance improvement for MHIST than the other two (cf. Table 1 and Table 3). These results indirectly demonstrated the effects as asked by the reviewers. We will add more experimental results in the extended version of our manuscript in the future.

Reviewer#3  See whether the TAE module or pre-training leads to performance gains? There might be a misunderstanding. The performance gains offered by TAE can be seen by comparing the results of CNN+Trans with those of CNN+Trans+TAE. The extra benefits of pre-training can be verified by comparing the results of CNN+Trans+TAE and CNN+Trans+TAE+SSL (cf. Table 1).

Data augmentation strategies. Although the applied data augmentation strategies (random crop, Gaussian blur, and color distortion) were originally designed for natural images, they are also applicable to pathological images. In future work, we will include comparisons with other data augmentation methods tailored for histopathological images, like proposed in the literature [17].

Reviewer#4 Separate frozen tissue from FFPE. We agree that this strategy might improve the performance. On the other hand, the deep learning model has a sufficient number of parameters. In addition, both frozen and FFPE slides are histopathology images, which may help train a more robust model. We will compare the performance of separating frozen tissue slides from FFPE ones in our future work.

The Conclusion could be expanded. We have expanded the Conclusion section by including a little more details on the methods and our future work.

Minor error. We appreciate the careful review and the suggestions. We have changed the “large” to “enough” and added the BN in the title of Fig.1.

back to top

TransPath: Transformer-based Self-supervised Learning for Histopathological Image Classification