Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Sebastian Pölsterl, Tom Nuno Wolf, Christian Wachinger

Abstract

Prior work on diagnosing Alzheimer’s disease from magnetic resonance images of the brain established that convolutional neural networks (CNNs) can leverage the high-dimensional image information for classifying patients. However, little research focused on how these models can utilize the usually low-dimensional tabular information, such as patient demographics or laboratory measurements. We introduce the Dynamic Affine Feature Map Transform (DAFT), a general-purpose module for CNNs that dynamically rescales and shifts the feature maps of a convolutional layer, conditional on a patient’s tabular clinical information. We show that DAFT is highly effective in combining 3D image and tabular information for diagnosis and time-to-dementia prediction, where it outperforms competing CNNs with a mean balanced accuracy of 0.622 and mean c-index of 0.748, respectively. Our extensive ablation study provides valuable insights into the architectural properties of DAFT. Our implementation is available at https://github.com/ai-med/DAFT.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_66

SharedIt: https://rdcu.be/cyl6J

Link to the code repository

https://github.com/ai-med/DAFT

Link to the dataset(s)

http://adni.loni.usc.edu/

Reviews

Review #1

Please describe the contribution of the paper

This paper proposed a simple and effective network architecture DAFT to fuse image features and tabular data. This network was utilized to diagnose Alzheimer’s disease from magnetic resonance images and related tabular data.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The idea of using tabular data to adjust feature maps is novel and effective, different from the traditional way to mix up these two kinds of data by concatenating.
2. The ablation study is sufficient.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Although the manuscript described how to use the output values of DAFT in formula 1, it is a little confusing to put the architecture of Fig2 to Fig1 directly.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The architecture is easy to reproduce.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Please modify Fig1 or Fig2 to make it clear how the DAFT structure is inserted into the general vision networks.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The result is convincing and the method proposed in this paper will be useful in many other clinical scenarios.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper introduces the Dynamic Affine Feature Map Transform (DAFT), a general-purpose module for CNNs that dynamically rescales and shifts the feature maps of a convolutional layer, conditional on a patient’s tabular clinical information. The extensive study provides valuable insights into the architectural properties of DAFT.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed DAFT is a generic module that can be integrated into any CNN architecture that establishes a two-way exchange of information between high-level concepts learned from images and tabular biomarkers.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1 This work provides a useful way of helping to exchange information between high-level images and tabular biomarkers in medical fields. However, compared with [1], the work is of limited novelty and lacks technical contribution. The current version needs a more in-depth analysis of the methodology itself to demonstrate its technical contributions.

2 This paper lacks running time comparisons, which are very critical and necessary.

[1] FiLM: Visual Reasoning with a General Conditioning Layer.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Lacking sufficient comparison results between different methods, such as the running time comparisons, which are very critical and necessary.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

This paper is well-written and easy to follow, with nearly no grammatical errors.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Please see my detailed comments regarding the strength and weaknesses of the paper for the scoring justification.

Overall, the proposed DAFT in this paper may be a versatile approach to integrating image and tabular data that is likely applicable to many medical data analysis tasks outside of dementia too, which requires further validations. It is a good try, but compared with [1], the work lacks technical contribution.

[1] FiLM: Visual Reasoning with a General Conditioning Layer.
What is the ranking of this paper in your review stack?

5
Number of papers in your stack

2
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The authors proposed a method called dynamic affine feature map transform that dynamically rescales and shifts the feature maps of a CNN layer along with patients clinical information. The work was proposed in such a way that the clinical information is utilized for fine-grained interaction with abstract MRI features in and around the hippocampal region.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Main strength is combining MRI data with clinical tabular data that dynamically scales and shifts the feature maps of a convolutional layer conditioned on both image and clinical data.
2. Validation of the proposed method a larger dataset.
3. Validated for both diagnosis and prognosis tasks along comparison with previous works.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The CNN model is not deep considering the fact that the authors have used residual blocks.
2. It could have been nice if they provided proposed model summary.
3. Computational resources were not disclosed.
4. It is not clear which part of MRI actually contributing to those differences, i.e. no 3D saliency maps.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The validation metrics probably be reproducible considering the datasets for diagnosis and prognosis are sufficiently large.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

ABSTRACT AND INTRODUCTION: Abstract and introduction are well written and easier to follow. Related work is sufficient to motivate the readers why this work should be interesting.

METHODS: What is the use of skip connections when you have only 10 convolutional layers. What is the effect when you remove those skip connections. Is the fully connected layer a softmax layer for diagnosis task? What is the software used for model development, training and validating the proposed models? In Figure 2, it is not clear what is the value of ‘r’ in first FC layer.

EXPERIMENTS: Sensitivity and Specificity values should be reported instead of balanced accuracy. The hyper-parameter tuning with respect to number of epochs is not implemented. Considering a 3D model, epochs of 30 and 80 for diagnosis and prognosis are very low. Plots indicating loss, performance metrics vs. epochs should be included to understand the motivation behind choosing those epoch numbers. Consider larger epochs (i.e. > 100) since the reported validation and test performance metrics are very low (for e.g. 0.622 of balanced accuracy). Again coming back to model complexity, why don’t use increase the number of layers to above 20? The results are compared with only few methods that combines image data with tabular data. However, other studies should be included for comparison (i.e. refs 6, 10, 20, 22, 25), especially comparisons with respect to performance. Any registration (rigid or affine) performed before feeding the images to FreeSurfer?

RESULTS: Considering you have higher c-index for tanh activation function, why can’t you propose different models for diagnosis and predicting time-for-dementia. It is not clear which regions in the 64x64x64 image actually contributing to those performance metrics. How do you support that they are actually from regions that are responsible for patient symptoms. It is interesting to see few slices of the input 3D image. Confusion matrices should be provided for both the tasks at least when validated/tested using DAFT. Considering you already have Hippocampal and amygdala sub-volumes from FreeSurfer, what is the effect on the performance when you use them in the tabular information? For a moment assume that the 3D CNN model is learning something different. The 3D saliency maps could be provided.

CONCLUSIONS: The data in ADNI is actually from different scanners, how do you account for scanner differences in this work? please comment.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

DAFT methods sounds interesting and proved to be performing better when compared to some previous related works. The validation/test metrics are low, however it is interesting that they are reported from a larger study population. Hyper-parameter optimization, especially, the number of epochs, could have been considered carefully. Visual Saliency maps could be provided.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

2
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposed a simple and effective network architecture DAFT to fuse 3D MRI and tabular data.

The key strengths include: 1) The idea of using tabular data to adjust feature maps is novel and different from the traditional way (e.g., concatenating). 2) The ablation study is sufficient and convincing. 3) The proposed method is a generic module that can be integrated into any CNN architecture that establishes a two-way exchange of information between high-level concepts learned from images and tabular biomarkers.

The key weaknesses include: 1) This study lacks running time comparisons between different methods.

Based on the above, I highly recommend accept once the authors modify the paper based on the suggestions from the reviewers.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

Author Feedback

We thank all reviewers for their valuable feedback and recognizing that we propose a “versatile approach to integrating image and tabular data that is likely applicable to many medical data analysis tasks outside of dementia too.”

R2 argues that our paper lacks novelty. We respectfully disagree with this view and want to clarify our contributions. DAFT provides an efficient two-way exchange of information between image and tabular biomarkers. In contrast, FiLM studies visual question answering and has 2 major differences to our approach. First, only the textual data modulates the image features, thus only a one-way exchange of information is established. Second, the textual and image features are tightly linked – the question relates to the image content – whereas in our setting, the tabular biomarkers are complementary to the image. Finally, our extensive experiments clearly demonstrate that a two-way exchange of information via DAFT is superior over FiLM wrt. test time metrics for both tasks.

R2 suggests performing a run time comparison. Fitting the linear models takes on average 235ms. For the deep models, using a GeForce GTX 1080 Ti, the total wall time required for training 1 epoch is for ResNet 8.9s, Concat-1FC 8.9s, Concat-2FC 8.9s, 1FC-Concat-1FC 8.9s, Duanmu et al. 9.0s, FiLM 8.7s, and DAFT 9.0s. For inference, times scale with the number of weights (see suppl. material) and processing a batch of 256 samples requires 1.8-2.2ms (without time for I/O). This shows that the runtime increase due to DAFT is negligible.

R3 argues that our experiments should include more complex network architectures by using an elaborate hyper-parameter search over network architectures and number of epochs. First, we want to emphasize that all reviewers, including R3, list our extensive ablation study as a main strength of our paper. Second, we note that SOTA networks used for dementia prediction very rarely have more than 10 layers and perform comparable to our unimodal baselines [4,28]. Moreover, we confirmed that training converged for all models. While it is possible that an exhaustive hyper-parameter search over all architectural properties could increase metrics, we believe it is more valuable to evaluate the architectural properties of DAFT, which R1 agrees with.

R3 claims that we did not include refs 6,10,20,22,25 in our experiments. As described in the introduction, these methods are represented by Concat-1FC, Concat-2FC, 1FC-Concat-1FC. We apologize for the misconception.

R3 suggests including saliency maps. We do agree that this could reveal interesting findings. However, we are not aware of attribution methods that can account for multi-modal inputs, i.e. image and tabular data. If we ignore tabular data when computing attributions, we would not obtain a faithful representation of what the network learned. More research on multi-modal attribution methods is needed.

R3 suggests including a confusion matrix for both tasks. The concordance index is not derived from a confusion matrix, thus we can only provide the confusion matrix for diagnosis. The overall confusion matrix across all test sets for DAFT is (rows=actual, cols=predicted): CN; MCI; Dementia CN: 414; 107; 19 MCI: 164; 242; 132 Dementia: 22; 70; 171

and for FiLM: CN; MCI; Dementia CN: 396; 117; 27 MCI: 155; 220; 163 Dementia: 16; 73; 174

Due to space constraints we were unable to include the suggested 6 sensitivity/specificity values per method, but they confirm our existing results.

R3 commented that reported metrics are low. Please note that CN/MCI/Dementia classification and progression analysis are very challenging tasks, and that many results reported in the literature suffer from inflated metrics due to data leakage [28]. Our validation scheme follows the pipeline of [28] to avoid data leakage and confounding bias, and accounts for inter-scanner variability via bias field correction and min-max rescaling. Therefore, our evaluation is unbiased.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposed a simple and effective network architecture DAFT to fuse 3D MRI and tabular data and in the rebuttal, the authors have addressed the main concerns, therefore, I recommend accept.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Most of the major concerns (such as running time comparison between different methods) have been well addressed.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper introduces an interesting idea of fusing imaging and tabular data in a novel way with the dynamic affine feature map transform (DAFT) proposed. The exhaustive ablation study supports the validity of the proposed method. However, as pointed out by a reviewer, it is of necessary to exploit the explainable methods available in the literature for better justification of the work. There are a few methods applicable even for multi-modal inputs, e.g., layer-wise relevance propagation.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

back to top

Combining 3D Image and Tabular Data via the Dynamic Affine Feature Map Transform