Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Richard J. Chen, Ming Y. Lu, Muhammad Shaban, Chengkuan Chen, Tiffany Y. Chen, Drew F. K. Williamson, Faisal Mahmood

Abstract

Cancer prognostication is a challenging task in computational pathology that requires context-aware representations of histology features to adequately infer patient survival. Despite the advancements made in weakly-supervised deep learning, many approaches are not context-aware and are unable to model important morphological feature interactions between cell identities and tissue types that are prognostic for patient survival. In this work, we present Patch-GCN, a context-aware, spatially-resolved patch-based graph convolutional network that hierarchically aggregates instance-level histology features to model local- and global-level topological structures in the tumor microenvironment. We validate Patch-GCN with 4,370 gigapixel WSIs across five different cancer types from the Cancer Genome Atlas (TCGA), and demonstrate that Patch-GCN outperforms all prior weakly-supervised approaches by 3.58-9.46\%. Our code and corresponding models are publicly available at https://github.com/mahmoodlab/Patch-GCN.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_33

SharedIt: https://rdcu.be/cymao

Link to the code repository

https://github.com/mahmoodlab/Patch-GCN

Link to the dataset(s)

https://portal.gdc.cancer.gov


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed a Patch-GCN for WSI survival prediction. The patch-GCN is context, spatial-aware graph convolutional network. It aggregates instance-level features using attention-based MIL for WSI-level prediction. Five different cancer types from TCGA dataset are used for validation. Results compared with prior weakly-supervised survival prediction have shown its superior performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The consideration of conxtext and spatial information of patches is interesting. Context-aware features are modeld between adjacent image patches and this idea is the main contribution of the paper.
    2. Nice visualization by showing attention heatmap which could help readers understand the interpretability of the proposed Patch-GCN.
    3. Extensive experiments on five cancer types which could show the generability of the proposed model across cancer types.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The authors claim that the proposed model could learn context-aware features that than other GCN-based models. However, constructed graph of the Patch-GCN is about 2302*2302 receptive field size and it is not clear how strong the context-aware features are within this quite small region. Besides, it seems DeepGraphConv could perform better on UCEC data and the average is about 0.636 vs 0.620 which shows the proposed model doesn’t improve much.
    2. From Fig 1, it seems each patient has a very large number of patches and the authors mentioned that some patients have graph sizes as large as 100K instances. Compared with DeepGraphConv which has around 1K sampled patches in their original report, the slight improvement (0.636 vs 0.620) is coming from up to 100 times more complexity which seems not worthy in practice.
    3. It is interesting to see performance of the proposed model if fewer patches are sampled and they might not be adjacent patches.
    4. Several important implementation and experiments details are missing which make readers have difficulty to understand and reproduce the proposed model, like risk attention heatmap. See my full comments in detailed comment section.
    5. There are some recent GCN-based WSI models are proposed using graph clustering which could also be used for learning context-aware features. It is better to discuss or compare with them. “Graph Attention Multi-instance Learning for Accurate Colorectal Cancer Staging”, MICCAI 2020. “CGC-net: cell graph convolutional network for grading of colorectal cancer histology images”, ICCVW 2019.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Not easy to reproduce as important details are missing.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. It is not clear about the size of H(L), it should be LMd_out and need to clarify. Also, dense connections after the output of each GCN layer are quite confusing. What is the output size after such dense layers ?
    2. Is there only one attention-based pooling layer performed on H(L) ? Since H(L) has the size of LMd_out and it seems not possible to use only one pooling layer on the stack of 2D tensors. You may use L attention-based pooling layers on each graph features which is M*d_out.
    3. Then how to pool WSI-level features from graph-level features ?
    4. It is not clear how the attention heatmap is produces. If weights are from attention-based pooling, it is important to clearly define the FattnMIL operation in the paper with notations.
    5. In experiments, Kaplan-Meier curves is used to stratify patients into high and low risk group. It is not clear how this is done. What is the cutoff value to do such stratification ?
    6. Baseline models like DeepAttnMISL and DeepGraphConv actually didn’t sample so many patches in their reports. Did the authors use the same experiment settings in comparisons ?
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The consideration of context-aware is interesting but important details are missing which make readers not understand the whole framework clearly. Improvements compared with GCN-based methods are very slight in most cases but the authors have to sample 10x more patches than state-of-the-art methods. It brings concerns about efficiency in practice.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    The paper developed a patch-level GCN-based model for cancer survival prediction, which was able to utilize spatial information of patches extracted from whole slide images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The model was developed and validated on datasets of 5 different cancer types from the TCGA.
    • The proposed model didn’t rely on ROI annotations and can be trained with weak labels.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • It would be nice if authors could discuss main innovations of this work. The strategy of constructing graph based on Euclidean space has been used in prior works. For example, Ding et.al.,[1] developed a feature-enhanced graph network (FENet) for genetic mutation prediction. They considered patches as graph nodes and the euclidean distance between two patches determines if there will be an edge between two nodes,

    • Discussions how hyper-parameters/models were selected with cross validation may be needed. The paper reported results from 5-fold cross validation, so if the model was selected based on left-out validation, the model could be overfitted to the validation set.

    • More details on the experiment of the DeepGraphConv model may be needed. e.g., as discussed in 5.1, in the DeepGraphConv experiment, patches were randomly selected, while all patches were included in the proposed model. Also as shown in the Table 1, the performance difference between DeepGraphConv and Patch-GCN was relatively small. I was wondering if the performance difference was caused by the number of selected patches instead of different graph construction ways.

    [1] Ding K, Liu Q, Lee E, Zhou M, Lu A, Zhang S. Feature-Enhanced Graph Networks for Genetic Mutational Prediction Using Histopathological Images in Colon Cancer. InInternational Conference on Medical Image Computing and Computer-Assisted Intervention 2020 Oct 4 (pp. 294-304). Springer, Cham.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper performed experiments on the publicly available TCGA dataset and the authors have detailed plans to make training/validation code publicly available upon acceptance. Thus, I think the reproducibility of this work is good.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • Are node features fine-tuned on whole slide images? Given the large domain gap between natural images and whole slide image, it might be interesting to consider fine-tune node features or pre-train node feature extractors on whole slide image dataset using unsupervised learning methods.
    • Did authors use FFPE slides or frozen sections from TCGA for experiments?
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The main strength of this paper is that the authors performed many experiments and validated the model on multiple datasets. However, as mentioned above, discussion on how the innovation in this paper is different from prior works may be needed.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    The paper ‘Whole Slide Images are 2D Point Clouds: Context-Aware Survival Prediction using Patch-based Graph Convolutional Networks’ describes Patch-GCN, a graph convolution network-based method to infer patient survival in WSIs. The GCN aggregates the contextual features in the WSI in a hierarchical manner by representing it as a graph where nodes are image patches and edges exist between neighboring image patches. The method is validated on WSIs across five different cancer types in the TCGA dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The motivation to use a graph-based hierarchical approach to combine local and global features in WSI is intuitive, as cell-based morphological features as well as tissue-based neighborhood features are important for histopathological image analysis. The paper effectively utilizes the feature hierarchy in a GCN-based framework which is well-formulated to analyze the extremely large-sized WSIs to predict patient survival.
    2. Comparison with existing weakly-supervised methods is performed on five different cancer types from TCGA dataset. The proposed method outperforms state-of-the-art methods in 4/5 cancer types.
    3. The paper is well-written and organized.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The author may want to explain why the concordance indices are lower (0.5-0-6 range) for 3/5 cancer types on which evaluation was performed. Only the GBMLGG type shows a c-Index of > 0.8. Hence, the authors could comment on the practical usability of the proposed GCN method.
    2. The authors mention that 4 GPUs with a batch size 1 were used for training the model. It would be interesting to know the total model footprint and how much is the complexity to predict survival in the full WSI. This is an important consideration to determine if the models can be used in clinical settings.
    3. It is stated that ‘To evaluate Patch-GCN, we trained our proposed model using 5-fold cross-validation for each cancer type, in which each dataset was split into 5 80/20 partitions for training and validation’. Were these partitions formed on the WSI level or patch level? How was the model regularized to prevent overfitting?
    4. The authors may want to comment in the discussion why the proposed method is not as good as DeepGraphConv only for the UCEC dataset.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Satisfactory response as dataset and code can be made available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. The validations could be further explained as it is not clear how the training, validation and test data was divided. Information leaks could take place if the patches from same WSI were used in training and validation.
    2. Typos should be corrected E.g. ‘Path-GCN’
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper describes a novel patch-based GCN model for predicting patient survival in histopathological WSI. The method is validated on the TCGA datasets for 5 cancer types and suggests improvement over most existing methods. It can perform a WSI-level analysis using a hierarchical approach. In my opinion it can be a valuable contribution in the field.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper presents a Patch-GCN model for survival prediction from whole slide images, which aggregates instance-level features using attention-based MIL. The reviewers have brought up well constructed arguments to the limitations of the paper. Comparing to the GCN based survival prediction methods, the proposed method achieves slightly better results in most cases (not all) with the price of sampling 10x more patches. Some important points are missing, such as generation of attention heatmap, experimental setting details, computational complexity, when and why the proposed method can achieve better results comparing with DeepGraphConv, discussion of the key difference between the proposed method and those recent GCN based methods. Please clarify these mentioned problems in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    6




Author Feedback

We thank the reviewers for their thoughtful feedback. We are encouraged that Patch-GCN was received positively for its contributions in learning context-aware features in WSIs without using ROIs (R2, R3, R5), as well as attention heatmaps that visualize prognostic image regions (R2). We are glad that all reviewers found our work to be well organized, and appreciate the validation in using 5 cancer types from the TCGA with comparisons to multiple SOTA methods (R2, 3, 5). We address the main reviewer comments below and will incorporate all feedback.

DeepGraphConv (DGC) performing better on UCEC Though DGC has higher c-Index on UCEC, we note that in comparison to other cancer types, cancer prognosis in UCEC correlates with global-level morphological determinants such as tumor size and depth of tumor invasion in the myometrium, rather than cell-to-cell mediated interactions between tumor cells and other cell types. As a result, features such as tumor-lymphocyte co-localization learned via Patch-GCN may not be the most prognostic for risk stratification in UCEC, which may be better captured via approaches like DGC and Attention MIL. Despite the performance differences in this one cancer type, in our overall results, Patch-GCN consistently improves over all prior methods in 4 out of 5 cancer types in the TCGA, which demonstrates the generalizability of Patch-GCN in learning context-aware features to improve prognosis. We reported results for all cancer types to initiate an open discussion on how differing modeling strategies can be created for learning cancer-specific prognostic biomarkers, which we will include in the discussion section of our final submission.

GCN Complexity The “100x” increase in number of patches was not a computational barrier for practical implementation of GCNs for WSIs. Current mini-batching procedures for graphs in PyTorch can efficiently perform inference / back-prop in < 1 sec on large 100K graphs (see GitHub). Using a single GPU, training Patch-GCN via 5-fold CV can be done in < 5 hours. Though including all available tissue patches does increase training time, our method follows the current standard-of-care for pathologists which exhaustively examines all tissue on each slide for accurate staging and prognosis. GCNs that randomly sample a limited number of patches would not only be unable to learn important context-aware features for cancer prognosis such as tumor-lymphocyte co-localization, but also fail to visualize / discover these features as prognostic biomarkers for future medical support decision systems.

Related Work Raju et al. similarly follows DGC in randomly sampling “k” patches (our implementation of DGC is more similar to Raju et al. in that we similarly apply an attention head). Ding et al. is similar to our method in connecting nodes via spatial distance, but also randomly samples “k” patches. We discuss Zhou et al. in Section 2.1 (Ref 21). Our work is distinct in that we use all patches and interpret WSIs as point clouds, and demonstrate via attention maps (Section 5.2) that Patch-GCN is able to learn context-aware features that are specific for cancer prognostication with robust validation.

Experimental Details 5-fold CV was done on the patient-level. Models were not selected via left-out, and were trained with the same (default) hyperparameters. Only FFPE slides were used. Node features were not fine tuned, but we agree it is worth considering in future studies. Only one global pooling layer is used, performed on top of all nodes in the graph at the last layer. For KM curves, the median percentile was used for low and high risk. We will reflect all additional details in our final submission.

Reproducibility We make available our code for reproducibility at https://github.com/miccai2021anon/2410, which should elucidate comments on complexity, attention heatmaps, implementation details of Patch-GCN and other baselines, validation, and hyperparameters.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents a Patch-GCN model for survival prediction from whole slide images, which aggregates instance-level features using attention-based MIL. The reviewers have brought up well constructed arguments to the limitations of the paper. Comparing to the GCN based survival prediction methods, the proposed method achieves slightly better results in most cases (not all) with the price of sampling 10x more patches. Some important points are missing, such as generation of attention heatmap, experimental setting details, computational complexity, when and why the proposed method can achieve better results comparing with DeepGraphConv, discussion of the key difference between the proposed method and those recent GCN based methods. The authors have addressed most of the concerns in the rebuttal such as performance discussion, experimental setting details, and computational complexity.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    1



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    All reviewers thought positively about the paper. The idea is novel and well motivated. Various questions were raised and were addressed well in the rebuttal. Thus I recommend the paper to be accepted.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    10



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The reviewers all think that this is a well-written paper, and the experimental results are extensive (validation on 5 cancer datasets). R3 concerns about the paper innovative aspects were strong. In the rebuttal, the authors briefly clarified how their work differs from existing state-of-the-art methods. However, such clarification remain limited. —this needs to be further addressed in the final version. So, yes, incremental, but incremental with a compelling narrative and extensive experiments.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2



back to top