Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Lei Fan, Arcot Sowmya, Erik Meijering, Yang Song

Abstract

Recent deep learning techniques have shown promising performance on survival prediction from Whole Slide Images (WSIs). These methods are often based on multiple-step frameworks including patch sampling, feature extraction and feature aggregation. However, feature extraction typically relies on handcrafted features or Convolutional Neural Networks (CNNs) pretrained on ImageNet without fine-tuning, thus leading to suboptimal performance. Besides, to aggregate features, previous studies focus on WSI-level survival prediction but ignore the heterogeneous information that is present in multiple WSIs acquired for the same patient. To address the above challenges, we propose a survival prediction model that exploits heterogeneous features at the patient-level. Specifically, we introduce colorization as the pretext task to train the CNNs which are tailored for extracting features from patches of WSIs. In addition, we develop a patient-level framework integrating multiple WSIs for survival prediction with consistency and ranking losses. Extensive experiments show that our model achieves state-of-the-art performance on two large-scale public datasets.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_57

SharedIt: https://rdcu.be/cymbk

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper
    1. The authors proposed a survival prediction based on features from the colorization model.
    2. Additional consistency loss is introduced to regularize consistency among embedding features from multiple WSIs of the same patient.
    3. Extensive experiments on GBM and LUSC dataset show performances of the proposed model comparing with other deep survival models.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. A colorization model as a pretext task is introduced in the framework which could make the model capture some semantic information and localize various objects in the images.
    2. A two phase approach is used which provides better results than state-of-the-art deep survival models. The first phase trains the model using the widely used negative log-likelihood loss and the consistency loss. The second phase uses the ranking loss to fine-tune the model.
    3. Extensive experiments are conducted to compare recent deep survival models. Ablation study is also presented to validate each component of the proposed framework.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The validation of using colorization-based feature extraction is not validated well in comprehensive ways. It is good to see Table 3 but experiment setting is missing. Also, visualization from Fig.3 is not clear enogh to show more meaningful results from colorization model.
    2. The concern about consistency among different WSIs belong to one patient remains and the authors need to validate this. Some WSIs may not contain tumor sample and only have normal tissues. It seems not correct to require those normal WSIs to be consistent with same patient’s WSIs with tumor in patient-level.
    3. The proposed work is quite similar with the work in [22]. They used the same C-MIL in their architectures. Only differences are ways to handle multiple WSIs and colorization-based feature extraction. It is necessary to compare [22] in the experiments and also the authors could use the design in [22] and change [22]’s CNN with the proposed colorization model. This could validate the effectiveness of handling multiple WSIs using consistency loss rather than phenotype clusters in [22].
    4. It is not clear why fine-tune is needed and what is the motivation to use the proposed two phase training. The authors should report results from the first phase as most state-of-the-art survival models mainly use the negative log-likelihood loss for survival prediction. The need of using ranking loss in the fine-tune step should be clearly justified.
    5. Experimental settings are not very clear and more results could be mentioned. Results from each fold are not reported. Also, it is recommended to compare if significance could be found between the proposed model from the baseline.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    1. The network could be easily written with Pytorch as the framework used many open-sourced packages and codes, e.g. [2],[9],[22]
    2. Implementation details are somewhat missing for readers to reproduce. The color palette construction is not presented with more details. Also, how to train the model in two phase is not very clear for reproduce purpose.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. It is not clear how authors handle various number of sampling patches from different patients. Do they set the fixed number using 1000 ?
    2. Two phase training details are not very clear which seems very important. How to say the first phase is trained well and could be stop during traning ? Do you use fixed epochs or additional tunning set to perform early stop ? There are no details about fine-tune the model in the second phase.
    3. Fig.3 is not very helpful to convince that colorization bring benefits as only one patch with such very small view could not provide meaningful visualization. It is necessary to show visualization from WSI or a much larger size of view.
    4. The authors should test if the improvement of C-index in Table 4 is significant. Could use cindex.comp(cindex1, cindex2) in R survcomp package.
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is an interesting paper and provide new findings about survival prediction using WSIs. However, validations are not sufficient to prove the effectiveness of each introduced component.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper
    • Uses Colorization to train deep learning network to extract features for survival analysis.
    • Combines multiple slide images for a patient via a novel loss functions.
    • The experimental evaluation shows better performance compared to previous methods.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper trains a model by a colorization task to extract more relevant/meaningful features from whole slide images. It then combines these features and multiple images from a patient in a deep learning network to predict patient survival. The deep learning network uses an adaptation of ranking loss and consistency loss to train a survival prediction model.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Misses a relevant work: Wulczyn, E., Steiner, D.F., Xu, Z., Sadhwani, A., Wang, H., Flament-Auvigne, I., Mermel, C.H., Chen, P.H.C., Liu, Y. and Stumpe, M.C., 2020. Deep learning-based survival prediction for multiple cancer types using histopathology images. PLoS One, 15(6), p.e0233678.
    • Fails to experimentally compare with [22] (Yao, J et al. Whole slide images based cancer survival prediction…) which is a more recent work compared to WSISA, DeepCorrSurv, DeepGraphSurv, MILSurv.
    • While the proposed approach improve performance, the performance improvement is rather small. The new model is 0.4% and 2.5% better than RankSurv on the whole dataset.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The datasets used in the experiments are public. The methodology and experiments sections appear to have sufficient detail to reproduce the experiments with some help from the authors.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The authors use a novel approach of combining colorization, whole slide feature aggregation, and aggregation of multiple whole sidle images for a patient in a deep learning framework to predict survival. The experimental results using two different cancer types show performance improvements.

    The authors should reference a recent relevant work: Wulczyn, E., Steiner, D.F., Xu, Z., Sadhwani, A., Wang, H., Flament-Auvigne, I., Mermel, C.H., Chen, P.H.C., Liu, Y. and Stumpe, M.C., 2020. Deep learning-based survival prediction for multiple cancer types using histopathology images. PLoS One, 15(6), p.e0233678.

    The paper could be improved by including the work by Wulczyn et al, and by Yao et al [22]. Given the proposed method has relatively small improvements over RankSurv, which is a more recent method than the other methods used in the experimental evaluation, it would be interesting to see how the proposed approach will do against the methods proposed by Wulczyn et al, and Yao et al.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents an integrated deep learning framework that combines colorization and makes novel use of multiple loss functions for feature aggregation and aggregation of multiple whole slide images for a patient. The experimental evaluation is carried out using two different cancer types, rather than focusing on a single cancer type.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Somewhat confident



Review #3

  • Please describe the contribution of the paper

    This apper develops a self-supervised learning based WSI embedding method for survival prediction. Experiments on two datasets show the effectiveness of the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Applying self-supervised learning techniques to WSI representation learning improves the representation performance a lot.

    Experimental results are comprehensive.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Other multiple-instance based methods can also generate a patient-level representation since there’s no difference between aggregating patches from single WSI and multiple WSIs.

    To make Formula (1) more general, better to replace the numbers with variables.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Looks true.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Solve the comments in weakness above.

  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    novelty of the method; writing and experimental setting; clinical application.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper develops a self-supervised learning based WSI embedding method for survival prediction, where colorization is used to train the network to extract features. Experiments on GBM and LUSC dataset show the effectiveness of the proposed method. All reviewers agree to accept this paper but raised many questions. The revised manuscript should carefully address the questions raised by Reviewer 1 & 2.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2




Author Feedback

We sincerely thank all reviewers and Area Chair for your time and comments. In the final version, we will (1) describe the experimental settings more clearly, improve Fig 3, include the statistical test results (log-rank p-values are smaller than 0.005 on both datasets), and better clarify the motivation of two-phase training (for R1); (2) include the additional references (for R2); (3) revise Eq 1 as suggested (for R3). We will also address other comments in our journal paper, including more ablation studies for the motivation of aggregating multiple WSIs, ways of patch sampling and different loss functions, better visualization of colorization output, and result comparison with [22] (for which we are trying to optimize the model proposed in [22] for the specific datasets to achieve optimal performance).



back to top