Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Ziteng Zhao, Guanyu Yang

Abstract

Tumor classification is important for decision support of precision medicine. Computer-aided diagnosis by convolutional neural networks relies on a large amount of annotated dataset, which is costly sometimes. To solve the poor predictive ability caused by tumor heterogeneity and inadequate labeled image data, a self-supervised learning method combined with radiomics is proposed to learn rich visual representation about tumors without human supervision. A self-supervised pretext task, namely “Radiomics-Deep Feature Correspondence”, is formulated to maximize agreement between radiomics view and deep learning view of the same sample in the latent space. The presented self-supervised model is evaluated on two public medical image datasets of thyroid nodule and kidney tumor and achieves high score on linear evaluations. Furthermore, fine-tuning the pre-trained network leads to a better score than the train-from-scratch models on the tumor classification task and shows label-efficient performance using small training datasets. This shows injecting radiomics prior knowledge about tumors into the representation space can build a more powerful self-supervised method.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_24

SharedIt: https://rdcu.be/cyl2u

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors propose a novel method for unsupervised contrastive learning on medical imaging data, being a backbone for subsequent training of a fine-tuned model, and compare their method to a variety of other approaches.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The method the authors propose is novel and generally well-described. The abstract is precise, the used figures facilitate an easier understanding and the method is evaluated on multiple datasets.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The statistical evaluation of the results seemed rather thin to me at some points and could have benefitted from additional data, additional approaches and especially additional evaluations of the achieved results. More precisely: confidence intervals and standard deviations, as well as statistical tests, could have been added, to make it easier to assess the value of the method at hand.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors state to make their code as well as their results publicly available. Yet, they have not added their implementations to the supplementary material. The question “A description of results with central tendency (e.g. mean) & variation (e.g. error bars).” was answered with “yes”. However, error bars and measures of variation are missing.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- I would highly recommend to add some additional statistical evaluations, such as statistical tests, central tendencies and confidence intervals, which can for example be computed using bootstrapping [1].
- Additionally, I would have liked to see a comparison to other methods in the field of contrastive learning, as well as a discussion on the commonalities and differences. There is a lot of recent work (such as [3,4,5]), so this would have made it easier to assess the value of the work at hand.
- Some of the claims in the paper seemed rather bold to me, as pointed in in the following. I would like to ask the authors to weaken the claims or add some additional evidence.
- In the Abstract the authors write: “which is costly and impossible sometimes.”
- “Among many pretext tasks, unsupervised contrastive learning [23, 8, 1] performs best in the eld of natural images.”, p.2 -> What is the measure for “best” here? Also, unsupervised contrastive learning is not neccessarily successful on natural images, too, as pointed out in [2].
- “The table shows that the representation obtained by our self-supervised method is better than other methods.” -> I would suggest using the terminology “performed superior on the given dataset”, “achieved higher results”, or similar.
- “This proves that the embeddings obtained by our pretext task are discriminative.” -> I would argue it is evidence or it implies that it can be (more) discriminative. In my opinion, the word “prove” should be reserved to pure mathematical properties.
Minor:
- On p.8 the authors state “And we found that the best results were obtained without fine-tuning the last group of convolutions layers in ResNet-50 and the last two layers in fully connected network for processing radiomics handcrafted features.” -> What is the intuition behind this? How did other layer choices behave?
- The paper contains a variety of grammar and spelling issues, and a variety of typos, e.g.: “self-supervise model”, Abstract, “inadequate[ly] labeled”, p.1, “The remaining data […] is using for”, p.6, “because of [a] wide range”, p.6, “If there are less than 10 slices, select slices repeatedly from the middle to the ends.”, p.6, missing space at “descent(SGD)”, p.6, “to analysis radiomics handcrafted features”, p.7, “so some of [the] radiomics”, p.7, “are trained as the above settings”, p.8
References: [1] Efron, Bradley. “Bootstrap methods: another look at the jackknife.” Breakthroughs in statistics. Springer, New York, NY, 1992. 569-593. [2] Xiao, Tete, et al. “What should not be contrastive in contrastive learning.” arXiv preprint arXiv:2008.05659 (2020). [3] Dai, Bo, and Dahua Lin. “Contrastive learning for image captioning.” arXiv preprint arXiv:1710.02534 (2017). [4] Tian, Yonglong, et al. “What makes for good views for contrastive learning.” arXiv preprint arXiv:2005.10243 (2020). [5] Chen, Ting, et al. “A simple framework for contrastive learning of visual representations.” International conference on machine learning. PMLR, 2020.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the paper idea is interesting and has a clear fit to the MICCAI, some aspects of the work are rather difficult to assess regarding their value. The work generally has a good structure and clearly stated method, but provides too few data to get an impression of how significant the results actually are or whether it performs better than alternative approaches.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

7
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper aims to integrate conventional handcrafted radiomics features into deep features in an unsupervised contrastive learning approach for label efficient tumor classification tasks. In specific, radiomics features were extracted separately and then were considered as a Radiomics View input to a fully dense network along with augmented tumor images as the Deep Learning View input to a CNN model.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) The paper is well written, and the proposed method is well explained. 2) The paper focused on improving the performance of classification networks with limited training data that is of great interest to MICCAI community. 3) Efficiently integrating the Radiomics into a CNN network using a contrastive learning approach is original and sounds well.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1) My major concern is that the authors extracted radiomics features from volumetric tumors while they extracted the deep features from 2D slices of kidney CT images. Consecutive 2D slices are highly correlated which would lead to redundant and correlated deep features. How the authors justify this conflict? 2) Related to the previous comment: On page 6 it was mentioned that analyses were done in a 5fold cross-validation approach. To avoid the confusion, it can be explicitly stated that subject wise cross validation was performed. If not true, there is a high risk of overfitting. 3) Section 3.4: the authors performed a conventional Radiomics analysis to highlight the advantage of their proposed model. However, the performance of the radiomics features were investigated only by training a shallow, fully dense network. In conventional radiomics pipeline, however, other types of learning algorithms such as Random Forest and Adaptive Boosting perform quite well and would outperform the naïve shallow dense networks. The authors are strongly encouraged to conduct such experiments.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I believe with the level of provided explanations, it would be feasible to reproduce the proposed model.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The quality of the current manuscript can be strengthened if a 3D deep feature extraction on Kidney datasets performed. It will make a fair process to integrate 3D radiomics into 3D deep features.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors proposed a novel method, and the reported results over comprehensive datasets are in line with the assumption behind the method.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The paper introduces a self-supervised learning framework that bridges the information gap between deep network features and radiomics features. This model leads to good performance on tumor classification and also improves data efficiency.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This is a novel approach to combine radiomics and deep networks. Radiomics and deep networks are two directions and both have advantages and drawbacks in tumor classification. This paper offers a simple yet effective way to combine them.
2. The contrastive design enables self-supervised learning for pretraining. This relieves annotation burden which is important in the field of medical image analysis.
3. Experiments are extensive and illustrative.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Here are some minor points to improve:
1. This paper used 2D-ResNet50 as CNN backbone. I wonder the performance of 3D networks on the second dataset compared to the 2D counterpart, since 3D networks are natural for 3D tumor classification.
2. Contrastive loss is a powerful tool. However, the consistency of radiomics and deep features are already sufficient for pretraining. It’s interesting to show the ablation of contrastive loss and use simple consistency loss for fully / semi / self supervised settings.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Code not provided.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Please refer to the weakness part.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This will be an impactfull work and I recommend accept. It’s novel and investigate in very important problems.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

8
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
All reviewers recognized the clarity, novelty of the paper, adequation of the methods to the considered tasks, and compelling exerimental evaluation. Please clarify in the rebuttal:
- why using a 2D ResNet and 3D radiomics features (see comments from Rev. 2)
- patient splits in the cross validation
- add error bars or confidence intervals on reported measures in Tables 1 and 2
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

Author Feedback

We thank the reviewers for their insightful comments and constructive feedback. Thank all reviewers for recognizing the clarity, novelty of the paper, adequation of the methods to the considered tasks, and compelling experimental evaluation. We will answer the major points below and make the suggested changes to the main text.

Reviewer #2, Reviewer #4: Why using a 2D deep learning network on the kidney tumor classification dataset?

We use a 2D network mainly because of wide range of image slice thicknesses (1mm-5mm), which is mentioned in the second paragraph of Section 3.1. And the slice thickness of nearly half of the dataset is 5mm, which makes it difficult for 3D networks to understand the 3D nature of the data [a]. We have tried to use a 3D network, and its feature representation and classification performance are not as good as a 2D network due to artifacts caused by scaling and the large slice thickness.

Both radiomics view and deep learning view output 3D feature vectors representing the entire tumor, and they can learn from each other through contrastive learning. The deep learning network used in the paper processes a 3D combination of slices representing the entire tumor, and the output of the network is a 3D feature vector representing the entire tumor. This corresponds to the 3D radiomics features extracted by radiomics view.

Reviewer #2: Explicitly state the five-fold cross-validation approach on the kidney tumor classification dataset. The patient wise cross-validation was performed. Specifically, the dataset was randomly split to a train set and a test set by the ratio of 80:20 five times, while preserving the percentage of samples for each class.

Reviewer #1: Add variability measures for the results of the kidney tumor classification in Tables 1 and 2. We add standard deviations to the mean values of the five-fold cross-validation results of kidney tumor classification in Tables 1 and 2. In Table 1, F1 scores of the four self-supervised learning methods are 46.4±5.3 (Autoencoder), 45.0±3.5 (Jigsaw Puzzles), 50.0±4.5 (SimCLR) and 52.0±2.8 (Ours) respectively. In Table 2, the classification accuracy and f1 score of the four different models are 56.2±6.4, 56.8±7.0 (Radiomics method), 50.0±5.3, 49.3±7.6 (ResNet-50), 52.8±5.7, 52.0±5.6 (scratch hybrid network), 64.3±5.9, 63.7±6.9 (fine-tune hybrid network). It can be seen that the experimental results are relatively stable, and the method of this paper has a certain degree of robustness. We will add this evaluation to our paper.

Reviewer #1: List some related work that the authors are encouraged to compare and discuss. Thank the reviewer for providing the related work. But the work [3] focuses on image captioning and does not match the application background of this paper. The rest of the work [4,5] has been cited and discussed in the third paragraph of Chapter one. And the work [5] has been compared with our method in Section 3.3, which implies the superiority of our method in tumor classification.

Reviewer #2: Use other types of learning algorithms besides a shallow, fully dense network in a conventional radiomics analysis. In the conventional radiomics analysis experiments, we have used many types of algorithms such as logistic regression, linear SVM and random forest for classification, and finally selected the best performing model, which is mentioned in the second paragraph of Section 3.4.

References: [a] Heller, Nicholas et al. “The state of the art in kidney and kidney tumor segmentation in contrast-enhanced CT imaging: Results of the KiTS19 challenge.” Medical image analysis vol. 67 (2021): 101821.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors clarified all major aspects of their work. The performance comparisons between algorithms detailed in Tables 1 and 2 with now the standard deviations mentioned suggest probably no stat. significant differences between the top 2 algorithms. However, based on the novelty and sound of the approach I can recommend acceptance of the paper. I strongly advise to include all important clarifications in the final version of the manuscript.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper received mostly positive reviews, and the rebuttal addressed convincingly to points raised by the AC and reviewers. I recommend this paper to be accepted.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The reviewers had a positive reception of this work considering it clear, novel and sound. They had raised relatively minor points on the method and some more important remarks on the experiments. These have been well addressed during the rebuttal. Therefore, I recommend acceptance.

The authors should include the results reported in the rebuttal in the final version of the paper and should account for the comments made by review 1 to lower the tone in some of their claims.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

9

back to top

Unsupervised Contrastive Learning of Radiomics and Deep Features for Label-Efficient Tumor Classification