Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Pengshuai Yang, Zhiwei Hong, Xiaoxu Yin, Chengzhan Zhu, Rui Jiang

Abstract

Self-supervised learning provides a possible solution to extract effective visual representations from unlabeled histopathological images. However, existing methods either fail to make good use of domain-specific knowledge, or rely on side information like spatial proximity and magnification. In this paper, we propose CS-CO, a hybrid self-supervised visual representation learning method tailored for histopathological images, which integrates advantages of both generative and discriminative models. The proposed method consists of two self-supervised learning stages: cross-stain prediction (CS) and contrastive learning (CO), both of which are designed based on domain-specific knowledge and do not require side information. A novel data augmentation approach, stain vector perturbation, is specifically proposed to serve contrastive learning. Experimental results on the public dataset NCT-CRC-HE-100K demonstrate the superiority of the proposed method for histopathological image visual representation. Under the common linear evaluation protocol, our method achieves 0.915 eight-class classification accuracy with only 1,000 labeled data, which is about 1.3% higher than the fully-supervised ResNet18 classifier trained with the whole 89,434 labeled training data. Our code is available at https://github.com/easonyang1996/CS-CO.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_5

SharedIt: https://rdcu.be/cyl1v

Link to the code repository

https://github.com/easonyang1996/CS-CO

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

Firstly, the authors design a new pretext task, i.e. cross-stain prediction, for self-supervised learning, aiming to make good use of the domain specic knowledge of histopathological images. Secondly, they propose a new data augmentation approach, i.e. stain vector perturbation, to serve histopathological image contrastive learning. Finally, they integrate the advantages of generative and discriminative approaches and build a hybrid self-supervised visual representation learning framework for histopathological images.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper proposes the CS-CO method for histopathological images, which is divided into two self-supervised learning stages. In the first stage, a pretext task is proposed for histopathological images named cross-stain prediction, and the second stage uses the stain vector perturbation for data augmentation and contrastive learning.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

(1) The figure and text description are inconsistent, and the corresponding description of Part C and D in Fig. 1 is insufficient. (2) The visualization of stain vector perturbation in Fig. 2 is not obvious, which is of little significance in this paper.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

No source code provided, but the dataset and implementation details are described clearly, so the results would be reproduced.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

(1) The implementation details of the second stage, i.e., contrastive learning are not clear. (2) The motivation of the approach of stain vector perturbation is insufficient. Why is the new data augmentation approach more suitable for histopathological images? Its novelty should be highlighted clearer. (3) The content of H and E channel images is different. Why can cross prediction be carried out? What is the basis for this?
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Novel self-supervised learning method and good experimental results
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper presents a novel framework for self-supervised visual representation learning for histipathological images. The novelty is related to architecture based on two self-supervised learning stages: the first that learns visual representation in the cross-stain prediction task, and the second that is based on contrastive learning and further trains encoders for visual representation learning. Furthermore, a new data augmentation approach named stain vector perturbation is also introduced to serve contrastive learning.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The main strength of this paper is fully self-supervised computational pipeline for visual representation learning for histological images. In addition to contrastive learning stage the important contribution is the cross-stain prediction task that learns hameleon-to-eosin and eosin-to-hameleon encoders from stain-separated images. Learned encoders are used to initialize related ones in the contrastive learning stage. Further strength is novel data augmentation method coined stain vector perturbation that is based on statin separation error from the first stage. It is shown in an ablation study that proposed data augmentation method plays a crucial role in achieving state-of-the-art performance on public colorectal carcinoma dataset. That brings us to the third strength of this paper: achieved eight-class prediction accuracy is higher than the one obtained by fully supervised ResNet18 model.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

I did not see the catastrophic weakness this paper. However, there is inconsistency between description of the contrastive learning process in section 2.4 and corresponding stage in Figure 1. To be specific, there is no predictor q-prime in Figure 1. Also, in part (d) of Figure 1 both learned encoder hameleon-to-eosin and learned encoder eosin-to-hameleon are used for the final visual representation. However, eosin-to-hameleon encoder is learned in cross-stain prediction (pretext) stage only?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Based on reported information all reproducibility requirements are met.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

There is inconsistency between description of the contrastive learning process in section 2.4 and corresponding stage in Figure 1. To be specific, there is no predictor q-prime in Figure 1. Also, in part (d) of Figure 1 both learned encoder hameleon-to-eosin and learned encoder eosin-to-hameleon are used for the final visual representation. However, eosin-to-hameleon encoder is learned in cross-stain prediction (pretext) stage only?

In the forthcoming work the authors should perform statistical significance analysis of the results obtained by their method. It appears to me that in Eq.(7) one regularization constant is enough to take care about relative importance of the two terms in the total loss.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The main strength of this paper is fully self-supervised computational pipeline for visual representation learning for histological images. In addition to contrastive learning stage the important contribution is the cross-stain prediction task that learns hameleon-to-eosin and eosin-to-hameleon encoders from stain-separated images. Learned encoders are used to initialize related ones in the contrastive learning stage. Further strength is novel data augmentation method coined stain vector perturbation that is based on stain separation error from the first stage. It is shown in an ablation study that proposed data augmentation method plays a crucial role in achieving state-of-the-art performance on public colorectal carcinoma dataset. That brings us to the third strength of this paper: achieved eight-class prediction accuracy is higher than the one obtained by fully supervised ResNet18 model.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The color deconvolution is used to generate H and E images for auto-encoder. A stain vector perturbation is proposed as data augmentation for contrastive learning. A hybrid system is built for contrastive learning in histopathology. The performance of using 1000 cases for a linear classifier achieves better performance than the fully supervised learning method with fully supervised learning. Ablation study shows the impacts of different components.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The framework (c) contrastive learning in Fig 1 is the most innovative part in this paper. The frozen HE is used to avoid the model collapse. Two encoders are used to achieve representations from H and E respectively to utilize the domain knowledge of pathological images. The decoder is freezer to avoid model collapse. Ablation study is a strong evaluation in this paper to show the impacts of different pieces of the method as the design contains many different elements.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

It is a little bit weird that ResNet18 is used as the backbone for Table 1, without good explanations. In the BYOL or SimSiam paper, the ResNet50 is typically used for real imaging dataset (rather than ResNet18 for CIFAR10). The rationale of using normalized L2 distance as the loss function of the contrastive learning is not clear. It is not clear what data augmentations are used for training the resnet18 supervised learning approach.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

good
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Maybe the ResNet50 is a more reasonable backbone. More training details such as number of epochs, GPU resources would be helpful for readers to reproduce the work.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Good performance, inspiring idea
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper presents a novel framework for self-supervised learning for histopathological images, based on two stages: i) a pretext cross-stain prediction task, and ii) contrastive learning using a new data augmentation approach named stain vector perturbation. All reviewers acknowledge the novelty of the paper and suggest accepting the paper. Yet authors could further improve their manuscript by incorporating reviews comments. Additional to these comments, I have a concern that whether the relatively slow speed of Sparse NMF stain decomposition will affect contrastive training and maybe the authors can discuss it a bit. Also, it is also worth to carry an ablation study to show which component (cross-stain prediction or contrastive learning) contribute more to the boosted performance. Finally, authors can also consider release their code to promote reusability.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

Response to Reviewer #1 We thank the reviewer for the valuable feedback. We will revise Fig. 1 and the corresponding description according to the comments. As for Fig. 2, we put it in the manuscript to make readers better understand the effect of stain vector perturbation (SVP). The proposed SVP disturbs the estimated stain vector matrix and affects the results of stain separation. The degree of difference between results of original stain separation and results with SVP is related to the strength of sigma (standard deviation of normal disturbance). As for the basis of cross-stain prediction, we suppose circular blank areas in E channel image are more likely to be nuclei, and blank areas between nuclei in H channel image are more likely to be extracellular matrix or cytoplasm. Therefore, though the content of H and E channel images is different, the cross-stain prediction still can be carried out. As for SVP, it is specially designed for histopathological images based on the domain-specific knowledge. It can introduce variances into the stain-separated single channel images, which is suitable for contrastive learning. In addition, SVP can also make the encoder robust to the error of stain vector estimation in the process of visual representation extraction. Because of the anonymity, we didn’t provide the source code in the manuscript. We will release the source code soon.

Response to Reviewer #3 We thank the reviewer for the valuable feedback. We will revise Fig. 1 and the corresponding description according to the comments. For Eq. (7), we will use only one lambda in the upcoming camera-ready version. In our manuscript, q-prime is not a predictor. We denote the projector as ‘f’ and the predictor as ‘g’. z and z-prime are the outputs of ‘f’, and q and q-prime are the outputs of ‘g’. We train both H2E encoder and E2H encoder in the cross-stain prediction stage. For the sake of simplicity, we denote the combination of H2E encoder and E2H encoder as HE encoder ϕ in our manuscript. ϕ is also trained in the contrastive learning stage. After two-stage training, the visual representations are extracted using ϕ.

Response to Reviewer #4 We thank the reviewer for the valuable feedback.
We choose ResNet18 as the backbone because of its small computation cost. ResNet50 also works under the proposed self-supervised learning scheme. The loss of contrastive learning is the same as BYOL, and it actually evaluates the cosine distance. Maybe the ICML2020 paper “Understanding contrastive representation learning through alignment and uniformity on the hypersphere” could give further explanation. As for the fully-supervised ResNet18, we didn’t do data augmentation during training. The total number of epochs is set to 100, but early-stopping is used to avoid overfitting. The GPU we used to train the model is RTX 3090. We will release the source code to facilitate reproducing.

Response to Meta-Reviews We thank the reviewer for the valuable feedback, and we will revise the manuscript according to the reviewers’ comments. The sparse NMF is indeed a little slow but bearable. Maybe in practice, the user could do stain vector perturbation once and save the results locally to accelerate contrastive learning. We have carried ablation study in our manuscript. If only contrastive learning is performed, the proposed method is equivalent to Simsiam, so we didn’t report the performance in the section of ablation study. Because of the anonymity, we didn’t provide the source code in the manuscript. We will release the source code soon.

back to top

Self-supervised visual representation learning for histopathological images