Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Ziwang Huang, Hua Chai, Ruoqi Wang, Haitao Wang, Yuedong Yang, Hejun Wu

Abstract

Survival prediction using whole slide images (WSIs) can provide guidance for better treatment of diseases and patient care. Previous methods usually extract and process only image features from patches of WSIs. However, they ignore the significant role of spatial information of patches and the correlation between the patches of WSIs. Furthermore, those methods extract the patch features through the model pre-trained on ImageNet, overlooking the huge gap between WSIs and natural images. Therefore, we propose a new method, called SeTranSurv, for survival prediction. SeTranSurv extracts patch features from WSIs through self-supervised learning and adaptively aggregates these features according to their spatial information and correlation between patches using the Transformer. Experiments on three large cancer datasets indicate the effectiveness of our model. More importantly, SeTranSurv has better interpretability in locating important patterns and features that contribute to accurate cancer survival prediction.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_54

SharedIt: https://rdcu.be/cymbg

Link to the code repository

N/A

Link to the dataset(s)

https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors presented a WSI level survival prediction method (SeTranSurv) which is different from existing methods in two ways. First, they used an existing self-supervised learning based feature extractor (SimCLR) to extract patch feature instead of using pre-trained network features. Second, they employ a Transformer to incorporate the spatial relationship of patches in a WSI for survival prediction. Their proposed method achieves a better C-Index score as compared to three existing survival prediction methods on three TCGA cohorts.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Authors show superiority of the recently published self-supervision network (SimCLR) for histology image analysis over ImageNet based pre-trained features.

    They also address the issue of spatial relationship loss in existing survival analysis methods by fusing spatial coordinates in Transformer features.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors randomly selected many patches to train the SimCLR model. However, it is not clear whether they have extracted the same number of patches from all WSIs or they follow some criterion to select a specific number of patches from a WSI.

    Again, the authors used N=600 patches for the training of the Transformer network but did not explain how they selected those patches.

    The motivation behind using three specific cohorts from 14+ TCGA cohorts is missing. Why these three? Why not top three TCGA cohorts with the highest number of patients or the highest number of uncensored patients?

    The single-fold experiment does not reflect the impact of a random sampling of patches on final results. Authors should report multi-fold (e.g. 3 or 5) cross-validation results to show that random sampling of patches does not impact the model performance.

    It is not clear how authors aggregate slide-level results to patient-level results.

    Authors should report the p-value of each experiment in Table 2 to show that the results are statistically significant or not.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors used two existing methods as backbone of their proposed method. Therefore, one can reproduce the proposed method using the existing methods with default parameters. However, the authors have not give enough information for patient selection in three cohorts as the number of patients are different (less) from the publically available patients on TCGA portal.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Authors should add number of patients with censored survival in Table 1.

    Visualization of highly attended patches can be improved by presenting few high resolution examples of highly attended patches.

    Page 4, para 2: We use two different data augmentation methods …. (state the names augmentation methods)

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors have utilize state-of-the-art methods to develop survival prediction method and shows its superior performance over three different datasets. My main concerns is about dataset selection setup and experimental setup. I am not sure whether only tried the given datasets or they only got good results given datasets out of other TCGA datasets.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    The authors proposed SeTranSurv, a novel method for patient survival prediction from WSIs. Specifically, instead of directly using pretrained networks on ImageNet, SimCLR was utilized to effectively extract features from WSI patches. The Transformer encoder was also utilized to effectively fuse features from different patches while taking advantage of the correlation and spatial distribution of patches. SeTranSurv outperformed existing WSI survival analysis methods on three different cancers from TCGA. The authors also demonstrated that both the SimCLR component and the Transformer component were necessary for performance improvement through ablation study.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    SeTranSurv addressed two critical aspects in effectively utilizing WSIs for survival predictions. Firstly, detailed labels or annotations for WSIs are hard to acquire, while using pre-trained models on exiting natural image datasets to extract features for WSI patches might not be suitable as these two kinds of images are quite different. Secondly, the correlation and the spatial information among different sampled patches from WSIs are often ignored when learning with WSIs. The authors addressed these two issues with SimCLR and Transformer respectively in the proposed pipeline and achieved superior results than existing methods. The authors also performed extensive ablation studies to demonstrate both the SimCLR part and the Transformer part could contribute to the improvement of survival analysis.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Overall this paper is quite complete. However, it would be better if the author could demonstrate the generalization ability of SeTranSurv by training and testing on different datasets of the same cancer type or samples from different institutions. Moreover, the authors could further strengthen the interpretation of the results by providing a more detailed analysis of the attention weights.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Overall the reproducibility information is good. The authors provided a clear illustration of the proposed framework and a detailed introduction of different components. The description of the source of the datasets is also clear.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. The authors mentioned that some patients might have multiple WSIs. For the position encoding part, for the same patient, how to define the positions for patches from different WSIs? For example, positions from different WSIs might carry different meanings, as the different WSIs from the same patient are often not or not able to be aligned.
    2. It would be better if the author could provide statistical analysis of the results from Fig. 3. For example, are there significantly more patches with high weights that fall into the RoIs when considering that the patch sampling from the WSI might not be perfectly uniform? The author could also provide a heatmap of attention weights to interpret the results better.
    3. The authors should pay more attention to use technical terms more consistently throughout the manuscript. For example, “ResNet18” and “Resnet18” were both used in this paper.
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall this is an interesting paper addressing two important issues in utilizing WSIs for prediction tasks: lack of patch-level labels and effectively utilizing correlation and the spatial information among different sampled patches. The author addressed these two issues with SimCLR and Transformer, and the proposed framework achieved better results compared with existing methods. The analysis of the results is also quite complete where the authors performed necessary ablation studies.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    6

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    This paper presents a hybrid framework by combining self-supervised pretrained CNN with a transformer architecture for survival prediction from whole slide images. The proposed method was evaluated on multiple datasets and demonstrated with marginal improvements over other survival prediction approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    – Application of transformers is relatively new to the histopathology domain. – The method was tested on multiple datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    – The novelty of this submission is very limited. It’s an application of the existing works: SimCLR and transformers. – Experimental design is poorly conducted and many key details are missing

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I carefully assessed the sensibility of the experiments but, results seem uncertain because of lack of cross fold validation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    – Most of the contrastive learning methods (such as SimCLR (ref 1)) are highly sensitive to data augmentations. I am curious to know what kinds of augmentations have been applied for self-supervised pretraining? If the authors have adopted similar augmentations to SimCLR (which are tailored for natural images), then how does one overcome the variation in domain shift, which is predominately visible in a multicentre dataset like TCGA.

    — The authors argue that the “features extracted via SimCLR are better than the ImageNet pre-trained features”. To substantiate this claim, it is interesting to show the results with pre-trained Imagenet features as an input to Transformers and compare against their method.

    — Is this study conducted to predict 5 /10 years overall survival, which is the crucial part missing in this paper. Further, different cancer types have different overall survival median time, and it is worth taking this into account while designing the methodology [1]. For instance, the BRCA and OV has higher overall survival time (> 5 years) when compared to LUSC ( < 5 years) [2-3]. Without taking this into account will lead to clinically irrelevant comparisons [4].

    1. Liu, Jianfang, et al. “An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics.” Cell 173.2 (2018): 400-416.
    2. Lu, Cheng, et al. “A prognostic model for overall survival of patients with early-stage non-small-cell lung cancer: a multicentre, retrospective study.” The Lancet Digital Health 2.11 (2020): e594-e606.
    3. Katzman, Jared L., et al. “DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network.” BMC medical research methodology 18.1 (2018): 1-12.
    4. Wulczyn, Ellery, et al. “Deep learning-based survival prediction for multiple cancer types using histopathology images.” PLoS One 15.6 (2020): e0233678.

    — The training, Val, and test splits are randomly selected, which is the incorrect way of evaluation because of the large variation in median follow-up time between different cancer types [1]. It would good if the authors perform a 5 fold cross-validation on all these datasets, which is a common practice in survival prediction [4, 5, 6, 7].

    — Fig. 3 is not very clear to me. There are many regions outside the ROI that has larger weight (denoted by blue). How does one know which part of an image is contributing to the overall risk score? Further, I see that author’s extracted around 600 image patches per WSI, and from Fig. 3, it looks like every patch that has been extracted has given a higher weight. It would be nice to show a heat-map in this case.

    — Comparison with many key papers in the field are missing:

    1. Yao, Jiawen, et al. “Whole slide images based cancer survival prediction using attention guided deep multiple instance learning networks.” Medical Image Analysis 65 (2020): 101789.
    2. Wulczyn, Ellery, et al. “Deep learning-based survival prediction for multiple cancer types using histopathology images.” PLoS One 15.6 (2020): e0233678.
    3. Di, Donglin, et al. “Ranking-Based Survival Prediction on Histopathological Whole-Slide Images.” International Conference on Medical Image Computing and Computer-Assisted Intervention. Springer, Cham, 2020.

    — The implementation details of SimCLR pretraining is missing in the paper?

    — The transformer modules are known to be computationally expensive compared to the CNN counterpart. What is the total training of the proposed approach?

  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    – Very limited novelty with the combination of two approaches: SimCLR and Transformers — Lack of justification on why SimCLR pretrained features are better than ImageNet pretrained features and no ablation results to substantiate this claim. – Experiments have been conducted without taking into account crucial factors into consideration such as overall survival period between different cancer types (high survival rate cancer types vs low survival cancer types), and also no cross-fold validation experiments. – Comparison with many key papers in the filed are missing

  • What is the ranking of this paper in your review stack?

    5

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper presents a method for survival prediction from whole slide images by combining self-supervised pretrained CNN with a transformer architecture. The reviewers have brought up well constructed arguments to the limitations of the paper. The novelty of the proposed method is not well justified and discussed in the paper. Another key issue is the experimental design. Many important points are missing or questionable, such as data splitting, pretraining parameters, computational complexity, discussion of the key difference between the proposed method and those state-of-the-art methods. Please clarify these mentioned problems in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    8




Author Feedback

We sincerely thank all reviewers for their time and comments. Please find the responses (R) to specific comments (C).

Common questions by three reviewers: C1: About data selection and splitting R1: We selected the 3 TCGA datasets because we want to verify the effectiveness of the method at different sample sizes, and the 3 datasets include a largest dataset (BRCA), a medium dataset (LUSC), and a smallest dataset (OV). Since these datasets were originally prepared for multi-omics study, we kept samples with complete multi-omics data. We have re-performed the experiments on the full data, and as expected, the addition of WSI samples didn’t significantly change the performance (the average C-index of 0.694 vs 0.699). We employed the single-fold experiment in order for a fair comparison with the 3 methods (WSISA, DeepGraphSurv, and CapSurv) as these studies all used this strategy. We did perform the 5-fold CV and the average C-index is essentially the same (0.694 vs 0.699).

C2: About the visualization of attention R2: We measured the learned attentions patches in Fig 3, and 20 repeated experiments showed that 85% of selected patches had greater attention values than the average in the ROI regions, significantly higher than the 26% in non-ROI regions. We will add the visualization plot in the revision.

C3: About patches selection and aggregate of multiple WSIs (Raised by Reviewer 1 and 2) R3: We selected patches from the non-background area of each WSI and use all of them to train the self-supervised model by SimCLR. The model was then tuned through Transformer by randomly selecting 600 patches in the non-background area of WSI. For a patient with multiple WSIs, we randomly selected 600 patches from each WSI and then average the risk scores of all WSIs for the patient (as DeepGraphSurv did). Here, we only used the positional information of each patch in the WSI, and didn’t need to align the positions of patches in different WSIs.

C4: About detail of the SimCLR (Reviewer 1 and 3) R4: The data augmentation followed the same strategies as used in SimCLR for natural images, which was shown to work well for WSI images. We only changed the batch size to 512 for a balance of performance and running time. All hyperparameters were determined for optimal performance on the validation set.

C5: About the algorithm novelty and comparison with other SOTA methods, feature extraction on SimCLR trained model vs ImageNet pre-trained model (Reviewer 3) R5: To our best knowledge, this is the first study to combine self-supervised learning and Transformer for WSI feature fusion. In consideration of the super-size of WSI images that are different from natural images, we have split the WSI images into patches and fit them into Transformer according to their spatial positions. The model achieved an average C-index of 0.699, which is 4% greater than DeepAttnMISL(MIA 2020) and 2.8% greater than RankSurv(MICCAI2020). Reviewer 3 may have overlooked the ablation studies in Table 2. It indicated that the models without using SimCLR (OursV2 model, which uses pre-trained ImageNet model) and further exclusion of positional information from Transformer (OursV1 model) caused drops of 1.9% and 4.1%, respectively.

C6: High-risk group vs low-risk group study (Reviewer 3) R6: We added an experiment on the high-risk group vs the low-risk group as used in the RankSurv. Our AUC is 0.706, which is 4.2% greater than RankSurv(MICCAI2020).

C7: Comparison with many key papers in the field (Reviewer 3) R7: We have supplemented the results of DeepAttnMISL(MIA 2020) and RankSurv(MICCAI2020), which results are 0.668 and 0676 respectively on average, while our result is 0.699.

C8: Computational complexity (Reviewer 3) R8: The training time of our model increases linearly with the sample size. It takes ~38 hours for the SimCLR to train a ResNet18 through WSI patches, and ~6 hours to train the final Transformer model for LUSC dataset on a single GTX 1080 GPU with 11 GB memory.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents a method for survival prediction from whole slide images by combining self-supervised pretrained CNN with a transformer architecture. The reviewers have brought up well constructed arguments to the limitations of the paper. The novelty of the proposed method is not well justified and discussed in the paper. Another key issue is the experimental design. Many important points are missing or questionable, such as data splitting, pretraining parameters, computational complexity, discussion of the key difference between the proposed method and those state-of-the-art methods. In the rebuttal, the authors have clarified most of the concerns regarding the novelty, experimental design, and performance discussion.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    7



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposed a deep learning framework integrating SimCLR and Transformer for patient survival prediction from WSIs. It is a novel idea combining self-supervised learning and Transformer for WSI feature fusion and survival prediction. They also conducted a comprehensive experiment to validate the approach. The rebuttal sufficiently addresses the concerns of data set selection and experimental design.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The proposed combination of SimCLR and transformer achieved better performance than SOTAs, but the technical novelty is limited as indicated by R3. Considering the authors’ addressed most of reviewer concerns in the rebuttal and provided additional comparison results, I tend to recommend accepting the paper after rebuttal.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    10



back to top