Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Nathaniel Braman, Jacob W. H. Gordon, Emery T. Goossens, Caleb Willis, Martin C. Stumpe, Jagadish Venkataraman

Abstract

Clinical decision-making in oncology involves multimodal data such as radiology scans, molecular profiling, histopathology slides, and clinical factors. Despite the importance of these modalities individually, no deep learning framework to date has combined them all to predict patient prognosis. Here, we predict the overall survival (OS) of glioma patients from diverse multimodal data with a Deep Orthogonal Fusion (DOF) model. The model learns to combine information from multiparametric MRI exams, biopsy-based modalities (such as H&E slide images and/or DNA sequencing), and clinical variables into a comprehensive multimodal risk score. Prognostic embeddings from each modality are learned and combined via attention-gated tensor fusion. To maximize the information gleaned from each modality, we introduce a multimodal orthogonalization (MMO) loss term that increases model performance by incentivizing constituent embeddings to be more complementary. DOF predicts OS in glioma patients with a median C-index of 0.788 ± 0.067, significantly outperforming (p=0.023) the best performing unimodal model with a median C-index of 0.718 ± 0.064. The prognostic model significantly stratifies glioma patients by OS within clinical subsets, adding further granularity to prognostic clinical grading and molecular subtyping.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_64

SharedIt: https://rdcu.be/cyl6H

Link to the code repository

N/A

Link to the dataset(s)

https://wiki.cancerimagingarchive.net/display/Public/TCGA-GBM

https://wiki.cancerimagingarchive.net/display/Public/TCGA-LGG

https://github.com/PathologyDataScience/SCNN

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a Deep Orthogonal Fusion (DOF) model to combine information from different modalities for glioma prognostic discovery, such as radiology data, histology data, sequence data and clinical data. It introduces a new MMO loss and attention-gated tensor fusion to better fuse those information.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) introduce a new multimodal orthogonalization (MMO) loss term encourages each unimodal representation to provide independent prognostic information. (2) design an attention-gated fusion method to fuse different modalities. (3) combine different aspects of clinical information to provide a comprehensive glioma prognostic discovery.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

(1) The dataset description is unclear. Each patient has what kind of data and what the features for each data modality? (2) The experiments are not complete. How about the effect for attention-gated tensor fusion? why not compare it with other fusion methods?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

(1) clearly describe the data used for each patient, and how to extract their features (the model used for each part).
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

(1) The fusion method used in this paper is kind of late fusion. The feature extraction process for each modality is independent, which may already lose some relation information among modalities. Further, the goal of multi-modal fusion is how to fusion, the authors should also compare with other fusion methods. (2) Some description error in the paper. For example, Table S1-3, see patient selection in Fig. S2. There are no such tables in the paper.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

(1) unclear data description, (2) incomplete experiment comparison, (3) some description errors cause fuzzy understanding for the paper.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

The author presented a Deep Orthogonal Fusion (DOF) model, which learns to combine information from multiparametric brain MRI exams, biopsy-based modalities (such as HE slide images, DNA sequencing), and clinical variables into a comprehensive multi- modal risk score, to predict overall survival (OS) of glioma patients. The author introduced a multimodal orthogonalization (MMO) loss term that increases model performance by incentivizing constituent embedding to be more complementary.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The author proposed a data efficient scheme for the fusion of radiology, histology, genomic, and clinical data to derive novel multimodal prognostic biomarkers. The addition of a novel MMO loss component, which forces unimodal embedding to provide independent and complementary information to the fused prediction
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

According to the Table 1, the full fusion model is not the best when was trained with MMO loss. The further analysis for the reason should be add which can help better understanding the fusion model.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The method should not difficult to reproduce.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

According to the Table 1, the full fusion model is not the best when was trained with MMO loss. The further analysis for the reason should be add which can help better understanding the fusion model.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall the paper is well written and organized. The paper proposed a deep learning based fusion model.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

2
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The authors proposed a deep learning network fusing the complementary data from radiological scans (mpMRI), histopathology, genetic and clinical information, to improve the prognostic model performance. The main contribution is the fusing module of the network with disentanglement learning and application to the brain tumor prognostic biomarker discovery.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This study includes multi-omics data (radiology, histology, sequencing and clinical information) which is of clinical interests and importance.
2. The DOF model achieved significantly better performance (C-index 0.788) than the unimodal model (C-index ~0.71).
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Limited size of the dataset (97 GBM 79 LGG)
2. Model complexity
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The dataset is public and the algorithms are described clearly. Although the code is not available, the reproducibility is expected.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
The manuscript is well-organised and easy to follow. I have several comments: Major:
1. High-grade glioma (HGG) are usually defined as the WHO grade III and IV [1]. However, in this manuscript, Figure 4 classified Grade III into low-grade glioma (LGG).
2. One particular challenge of GBM prognostic modelling is the limited size of dataset, causing overfitting problem especially when the number of model parameters/feature number are huge. The performance on the holdout test set is the objective evaluation. They authors keep 20% data (~35 patients) as holdout test set but the performance was reported according to the 15-fold cross validation on the training data? (Table 1, Figure 3, Figure 4) This should be clarified.
3. The overall model is complex. It would be better to compare the number of network parameters, in addition to the model performance.
Reference: [1] David N. Louis et al. “The 2016 World Health Organization Classification of Tumors of the Central Nervous System: a summary”.

Minor:
1. Treatment is an key factor of the overall survival. Does the dataset includes standard treatment or differ in hospitals?
2. the superscripts R^(l1xN) in Methodology, R^(l1xM*N) before Equation (3) are confusing
3. Why 75th percentile of the predicted score is used as the final result?
4. More information about the genomic features should be provided.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method is novel and the application is of clinical importance. But some concerns need to be clarified.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper uses a simple deep learning model to fuse multimodal data derived from multiparametric brain MRI, histology slides and DNA sequencing. The AC finds the proposed multimodal orthoganilization loss to enforce modality disentanglement in the fusion process incremental in terms of novelty. Besides, the authors have not benchmarked their proposed fusion model with its variants against existing state-of-the-art methods in multi-source, multi-view or multimodal data fusion. The authors might want to add results including other benchmarks as the presented problem is not conceptually novel and several attempts have been made to solve multi-type data fusion. Mixed reviews on this paper invite for a rebuttal. Please check the detailed comments by reviewers and properly address the major highlighted issues.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

Author Feedback

We thank the reviewers for their constructive feedback. We address comments from reviewers (R) 1, 2, and 3 and the AC below. For references, see original bibliography.

[R1] DATA AND FEATURE DESCRIPTION IS UNCLEAR/MISSING Most of R1’s critiques refer to missing details. However, R1’s comment in Section 7.2 makes clear that they missed the supplement, which includes the details requested (dataset info, features, supplementary figs/tables). Review of these materials may have alleviated some of these concerns. To make things clearer, we will add a supplementary table of DNA features (also requested by R3) and revisit the experimental details section.

[R1, R3] EFFECT OF ATTENTION-GATED TENSOR FUSION NOT EXPLORED/MODEL COMPLEXITY We ran an ablation study suggested by R1 for the DNA+rad+path model. The results (median C-index) are:

Original: 0.79±0.07

No tensor fusion: 0.78±0.07

No attention gating: 0.77±0.08

No attention gating or tensor fusion: 0.76±0.07 We observe that our original model performed best, but a simplified fusion module can still achieve strong performance (addressing R3’s concerns on impact of model complexity).

[AC, R1] NO BENCHMARKING AGAINST OTHER FUSION APPROACHES Given short rebuttal time and no specific recommendations on preferred benchmarks, we assessed if partial fusion prognostic models [15-18] could be adapted to include missing modalities. Only [16] provided editable code. We added a branch for radiology images/features. This fusion model underperformed both with (CI=0.73±0.05) and without (CI=0.72±0.05) radiology included.

We emphasize that this study’s goal was not framework comparison. Rather, we sought to perform a first-of-its-kind deep learning-based predictive fusion of radiology with histology, DNA and clinical data. We performed 34 experiments comparing 15 different modality combinations. The most crucial benchmark is this comparison with partial combinations previously used for prognostication [15-18]. While no SOTA exists for this previously unexplored confluence of modalities, our approach performs comparably to partial fusion models despite using 77% [14,15] to 98% [16] fewer patient samples.

[AC] PRESENTED PROBLEM AND MMO LOSS HAVE LIMITED NOVELTY Our primary contribution (noted for its novelty and clinical impact by R2 and R3) is the fusion of radiology (using both CNNs and radiomic features) with histology, genomics and clinical data. To our knowledge, these modalities have not been combined into a deep predictive model. Most deep multimodal prediction strategies have fused biopsy-based modalities [14,15,16], while multimodal work using radiology has been largely correlative [12-13]. Adding radiology improved fusion models most, showing its independence in informing prognosis. We will further emphasize this in the paper’s introduction.

The MMO loss further boosts the prognostic power of such a multimodal fusion by orthogonalizing their contributions. This benefit is illustrated by the inferior performance of [16], which instead maximizes correlation between modalities. While not the paper’s primary focus, we believe the performance benefit of MMO loss is an additional contribution of interest. It reinforces the central finding of this work that diverse clinical modalities harbor unique predictive signals that can be effectively combined in a neural network to improve estimation of patient prognosis.

[R2] CLINICAL DATA WORSENED FULL FUSION MODEL Since the TCGA-LGG and TCGA-GBM studies, clinical consensus on subtyping, diagnosis, and glioma grading (e.g. the difference from 2016 WHO guidelines noted by R3) has shifted considerably to better reflect patient outcomes. We believe the reliance on outdated grading and diagnostic information in our clinical feature set likely negatively impacted the ability to model survival in this public dataset, not just in the full fusion model but in most combinations with clinical data. We will add this explanation to the text.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal provided clarifications about the novelty of the paper which lies in being the first to integrate radiology with histology, DNA and clinical data. Hence it lies more on the types of data being integrated rather the methodological aspect of the utilized integration algorithm. The authors provided new results with more baselines for benchmarking. The authors responded to these concerns in detail and have agreed to take care of these comments. Thus an accept is recommended based on the understanding that the authors will fulfil their commitments.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors clarified most important aspects raised by the reviewers and meta-reviewer. In particular, important concerns of reviewer 1 (data and implementation details), the only one suggesting borderline reject, are actually addressed in the original supplementary material. The additional ablation studies and SOTA comparison can be integrated in the camera ready version. While novelty remains slightly limited, I still recommend acceptance of the paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

back to top

Deep Orthogonal Fusion: Multimodal Prognostic Biomarker Discovery Integrating Radiology, Pathology, Genomic, and Clinical Data