Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Jianan Chen, Helen M. C. Cheung, Laurent Milot, Anne L. Martel

Abstract

Colorectal cancer is one of the most common and lethal cancers and colorectal cancer liver metastases (CRLM) is the major cause of death in patients with colorectal cancer. Multifocality occurs frequently in CRLM, but is relatively unexplored in CRLM outcome prediction. Most existing clinical and imaging biomarkers do not take the imaging features of all multifocal lesions into account. In this paper, we present an end-to-end autoencoder-based multiple instance neural network (AMINN) for the prediction of survival outcomes in multifocal CRLM patients using radiomic features extracted from contrast-enhanced MRIs. Specifically, we jointly train an autoencoder to reconstruct input features and a multiple instance network to make predictions by aggregating information from all tumour lesions of a patient. Also, we incorporate a two-step normalization technique to improve the training of deep neural networks, built on the observation that the distributions of radiomic features are almost always severely skewed. Experimentalresults empirically validated our hypothesis that incorporating imaging features of all lesions improves outcome prediction for multifocal cancer. The proposed AMINN framework achieved an area under the ROC curve (AUC) of 0.70, which is 11.4% higher than the best baseline method. A risk score based on the outputs of AMINN achieved superior prediction in our multifocal CRLM cohort. The effectiveness of incorporating all lesions and applying two-step normalization is demonstrated by a series of ablation studies. A Keras implementation of AMINN is released.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_72

SharedIt: https://rdcu.be/cyl6P

Link to the code repository

https://github.com/martellab-sri/AMINN

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

In this paper, the authors present AMINN: a strategy for handling multi-focality of tumors when training radiomics-based predictive models. The authors leverage an auto-encoder of radiomics features to produce reduced features for a multiple-instance learning predictive model. The authors demonstrate its ability to effectively build radiomics models to predict overall survival in patients with 2 or more colorectal lever metastatic lesions.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
This is a very strong paper, which presents a straightforward and effective solution to a common problem in radiomics studies: training predictive models across multifocal lesions. The authors see impressive performance boosts with their approach, despite limited training data.
- The authors are thorough in providing performance baselines and a comprehensive ablation study
- The proposed two-step normalization approach to combat skewed feature distributions is very interesting, I would be curious to try it in my own work.
- The paper is clearly written and organized, and made for an easy and insightful read
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The technical novelty of the approach (autoencoder+MIL) is limited. However, it presents an elegant solution to a considerable challenge faced in radiomics studies
- The list of radiomic features is described in very sparse detail. A supplemental table including the specific features used should be provided
- One of the key advantages of radiomics approaches is their (relative) interpretability, which is lost here as there’s no way to tell the specific features/trends that are contributing to a prediction
- The training and validation dataset size is limited
- The authors use the subset of patients with unifocal lesions for hyperparameter tuning, leaving the set of patients with multifocal lesions (n=50) unspoiled for validation (a positive of this study). However, this comes with some tradeoffs such as further reduced sample size and the inability to evaluate the model on a combination of uni- and multi-focal patients. This approach would be most useful if can function just as well for both groups. I am curious to see how it fares when all patients are combined.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Seems to be mostly strong. The authors plan to release their code, but cannot release their data due to privacy concerns. The biggest concern is the lack of details on the radiomic feature set.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- Would be interesting to see how the approach does when trained on combined uni- and multi-focal data. Seamlessly being able to handle any number of lesions in a single model would be a big win for radiomics. It is unlikely that future studies will stratify cohorts to train separate unifocal and multifocal models, so whether it can handle any arbitrary number of lesions (even 1) is the most important unanswered question regarding its utility
- Additionally, it would be nice if its performance in the optimization set were also provided to allow us to assess its performance in a uni-modal only setting.
- Did the authors evaluate multiple weightings of the BCE and MSE loss? It seems finetuning the balance between the two could make a difference in model performance
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Although not technically groundbreaking, the paper lays out several interesting ideas to address important needs in the field of radiomics. The straightforward approach is an asset here, as this is a strategy that could feasibly used by clinical radiomics researchers, who often don’t have the machine learning background to implement the latest state of art CNNs. The most significant weakness is the lack of certainty as to whether the approach is robust to the mixing of uni- and multi-focal lesions, which if insufficient could limit the approach’s utility. Adding the optimization cohort results in the supplement would alleviate this concern partially, although it would be best to see its performance validated on a mix of uni- and multi-focal patients. Nonetheless, I believe this a strong candidate for MICCAI 2021 and am curious to see how the approach performs in larger datasets and new therapeutic contexts
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

A method for outcome prediction in CRLM patients. Using radiomics to extract tumor features, an autoencoder for feature reconstruction branch, a multi-instance network to combine multiple tumor features for prediction, and a two-step feature normalization method. Experiments are taken on 50 CRLM patients with multiple tumors and an additional 58 patients with one tumor (to my understanding). Results and statistical analysis show that the proposed method is better than compared baseline methods and clinical features/biomarkers.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

An overall interesting work handling outcome prediction in metastasis cancer. A reasonable methodology design and a clinical-relevant evaluation.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Small data size. Only 50 patients with multifocal CRLM – (1) The utilized deep neural network-based approach tends to have overfitting risk; (2) May not have enough statistical power to perform analysis.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

fair
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Dataset is too small even for a radiomics approach. Only 50 patients with multifocal CRLM. Although authors state that deep learning is currently impractical due to requiring huge datasets, the proposed method is still deep neural network-based, e.g., multiple hidden layers. Experimental results on such a small dataset can be unstable and less convincing, especially given that the task (outcome prediction in metastases) is highly complicated. It would also be interesting to implement some CNN-based deep learning approaches to see what’s the best performance can achieve. It is a bit unclear why first using unifocal patient subset to tune parameters, then using the selected parameters to train the multifocal model. Statistical analysis should be adjusted by other clinical factors, such as age, sex, etc., and importantly, different treatment methods.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Not enough data size to perform deep neural network-based study.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

4
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The paper presents an algorithm for the prediction of 3-year survival of metastatic colorectal cancer. For that matter a multiple instance (=lesion here) neural network is used from some selected radiomics features on each lesion. The dimensionality of the input radiomics features is reduced with an auto-encoder shared for each tumor of the same patient. Several pooling functions are tested with the average giving the best results. The approach is then cross-validated on about 50 patients with multi-focal liver metastases after hyper-parameter tuning on 50 other patients with single metastases. As a preliminary step, a logarithmic transformation on radiomics input features is presented to reduce their skewness (which is an issue often overlooked) to improve the training.

The proposed method outperforms classical ML algorithms considering the largest lesion for the prediction. It clearly shows the advantage of taking all the lesions in account.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Novelty: considering all metastases for obtaining a prediction is not common and often a fixed number of metastases is considered. The presented approach is clear and promising and could easily be extended to other similar multi-focal cancer pathologies .
- Results show a clear improvement over state of the art for multifocal cases.
- Strong analysis, comparison with recent clinical methods and honest discussion about the limitations (dataset size, delineation, …) of the work.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Comparison between methods (Delong’s test..)
- Missing information of the retrospective cohort (distribution, imbalance).
- Selection of the input radiomics features is not described. Pyradiomics can extract hundreds of features, what was the rationale in the 100 kept for the analysis? What is the influence of this selection.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Reproducibility is fine for this paper: data cannot be shared (as it is often the case) but everything is clearly detailed and the code should be published after peer review. The radiomics input features should be given.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The paper is well-written, the approach clearly described. The analysis is strong and the limitations of the work are honestly listed. I would add a precise description of the rational behind the selection of the radiomics features and an analysis of the influence of this choice.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Although not ground-breaking, I have found this paper clearly-written, the approach interesting (and it could be extended to other similar pathologies) and carefuly validated and discussed. It is sufficiently simple that it can be reproduced by anyone interested.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper addresses the important problem of incorporating data from multiple lesions in radiomics-based outcome prediction using a neural network model. The authors develop and test AMINN, an autoencoder-based multiple instance neural networks, for this purpose, and use the model to predict overall survival in patients with two or more colorectal cancer metastatic lesions to the liver.

The paper has several strengths. First, it presents a solution to training radiomics-based predictive models in which multiple lesions are present. Incorporating multiple lesions is currently not common in such studies despite its potential to boost predictive power. Second, the AMINN approach outperformed the current state of the art in outcome prediction for patients with such metastatic lesions. The authors were thorough in comparing methods and also included an ablation study. The results overall show the benefit of incorporating each lesion into the prediction model.

The main weakness of the paper is that the training and validation dataset size is limited. With only 50 patients with multifocal metastases, there is a concern of overfitting with the neural network model. Second, information about patient characteristics, such as age, sex, and medical treatment history, are not presented and are not considered in the analyses. Third, the use of only unifocal patients, rather than both unifocal and multifocal, to tune parameters is not well justified. Fourth, the selection of input radiomic features is not explained.

Overall, while the AMINN model was validated using a dataset with small sample size, it addresses an important need in radiomics. Moreover, the model outperformed current prediction models of outcomes of patients with metastatic colorectal cancer. In addition, the model could be very useful in other imaging-based prediction tasks involving multiple lesions. The decision is therefore to provisionally accept this article.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

We thank the reviewers and the meta-reviewer for their positive assessment of our work and helpful suggestions for improvement. We really appreciate the encouraging comments from R1 and R3 even when they are aware of the limitations of our work.

We agree that additional information on the choice of 100 radiomic features (R1, R2, R3) and patient demographics are important. We will add this information in the revised manuscript and supplementary documents.

We agree with R2 that our dataset is small and consequently the experimental results could be relatively unstable. We kept that in mind during experimental design and tried our best to prevent overfitting and make our results more convincing (“thorough”, “comprehensive ablation study”, R1; “Strong analysis”, R3). For example, all the experiments were performed using 10 repeated runs of cross-validations. We also used the unifocal cohort for hyperparameter tunning (as R1 pointed out, as a compromise since we do not want to leak information from the multifocal cohort).

Compared to the two current paradigms of radiomics analysis (“hand-crafted radiomics” and “deep radiomics”), in this work, we used a slightly different approach (R2), which takes advantage of the robustness of hand-crafted features in relatively small datasets and the strong learning capability of neural networks to select features and make classifications. Although it seems redundant because radiomic features and neural networks can both serve as feature extractors, our results empirically show that with appropriate transformation, hand-crafted features can be good inputs for neural networks when there is not enough data for training a deep learning model on images directly.

Again, we thank all reviewers and our meta-reviewer for their efforts and time.

back to top

AMINN: Autoencoder-based Multiple Instance Neural Network Improves Outcome Prediction of Multifocal Liver Metastases