Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Jacob C. Reinhold, Aaron Carass, Jerry L. Prince

Abstract

Precision medicine involves answering counterfactual questions such as “Would this patient respond better to treatment A or treatment B?” These types of questions are causal in nature and require the tools of causal inference to be answered, e.g., with a structural causal model (SCM). In this work, we develop a SCM that models the interaction between demographic information, disease covariates, and magnetic resonance (MR) images of the brain for people with multiple sclerosis. Inference in the SCM generates counterfactual images that show what an MR image of the brain would look like when demographic or disease covariates are changed. These images can be used for modeling disease progression or used for downstream image processing tasks where controlling for confounders is necessary.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_75

SharedIt: https://rdcu.be/cyl6S

Link to the code repository

https://github.com/jcreinhold/counterfactualms

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The paper adapts a recently proposed conditional generative model (per authors, “structural causal model/SCM”) of images to MRI scans of Multiple Sclerosis patients. The SCM connects a pre-defined directional acyclic graph (DAG) of patient variables (some of which can be modeled as being intervened upon clinically) with a variational auto-encoder (VAE) of MR images. The image/VAE are effectively the terminal node of the DAG. The SCM is essentially the DAG plus the latent variables and transition probabilities between the graph nodes. SCM parameters appear to be fit separately from the VAE training. Essentially, the goal of this construct is the create a general tool to generate hypothetical images of a patient under counterfactual assumptions, e.g. a reduction in total ventricular or lesion volume. Practical applications listed are MS lesion “inpainting” or simulation clinical intervention.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper presents a solid extension of Pawlowski et al.’s work in NeurIPS 2020 on SCMs. There appears to be innovation in developing a VAE for high-resolution MR images, by using a two-level hierarchical binary latent encoder with Bernoulli sampling. There is, though, already a sizeable cannon of work in the area at this time (an area in which this reviewer is not expert). The most exciting thing about the paper, which is both its strength and its weakness, is the generality of the method. It is applicable to several practical problems, and can accept any series of cause-and-effect models.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Weaknesses:
1. No model selection – causal model fixed. This is a significant weakness, as proper causal inference requires an inference of the causal mechanism & structure itself. Optimizing the structure of the DAG as well as the SCM parameters would make this work far more interesting scientifically, at the cost of greater computational burden.
2. Validation: very little data, and no quantitative comparison. As a proof-of-concept, this is reasonable, but one really doesn’t get any sense how well the model works beyond the few illustrative examples. Although no tool as general as the proposed pipeline exist for VAEs of whole images (or even slice series) to the best of my knowledge, a number of open-source tools exist for (a) modeling intervention with coarser imaging data (e.g. http://proceedings.mlr.press/v97/antelmi19a/antelmi19a.pdf), and for inpainting lesions in MS scans for subsequent image processing with standard pipelines. A comparison of the proposed tool in at least one of these contexts with existing work would make the paper far stronger.
3. Images are still 2D slices, not full 3D. This is a notable weakness, which the authors acknowledge.
Minor: Page 4, first line of second paragraph: c = (s; v; b; l)^T should be c = (n; v; b; l)^T.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors promise to include a link to the code upon acceptance.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Please see “weaknesses”
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is interesting enough to justify acceptance in spite of the weaknesses.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Somewhat confident

Review #2

Please describe the contribution of the paper

This manuscript presents a new method for structural causal modeling on demographical and clinical variables and MR images. The proposed framework can be used for answering counterfactual questions to help personalized decision-making in clinics.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The authors attack a relevant and important problem in the field, finding causal relationships between clinical variables and biological measures such as MR images.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The clinical relevance is obscure: the authors motivate very well the clinical relevance of this method in the introduction, they would like to address the ‘clinico-radiological paradox’ by answering the counterfactual questions. Then, they developed the whole mechanics (VAR+SCM) to do that, but at the end and in the discussion, they say “counterfactual images should not be used for diagnosis and prognosis”. The question is if the outcome of your model cannot be used in clinical settings, then what is the clinical relevance and motivation. My fist impression is that the method does not serve the purpose.
- strong assumptions: the authors make few independence assumptions that might not hold in reality. for example the interdependence between the distribution of parent node and the function mapping it to the target node (I expect considerable dependency here). Or assuming spatial indepence between two neighboring voxels in an MR image.
- Weak evaluation: Only the results on the training set are presented. This is because the results on the test set are not satisfactory. I personally did not satisfy with the authors explanation in the discussion. The most important factor in evaluating an ML model is its generalization performance to new data.
- It is not clear how the authors came up with the specific values for model hyperparaeters (such as K, L, M, N). This puts reproducibility and reliability of results under question (Given the fact that the experimental data is very small)
- The paper lacks a quantitative comparison with similar approachs.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- The range of hyper-parameters the best hyper-parameter configuration, and specification of all hyper-parameters are not described in the text.
- Information on sensitivity regarding parameter changes are not provided.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- The clinical relevance of the presented method and also results must be clarified. How would the method help the diagnosis and prognosis of MS patients? How the results confirm the applicability of the presented method in clinical settings?
- The model must be evaluated quantitatively on the test data. It is very important to repeat the whole experiments on several randomization of training/validation/test sets to ensure the stability of results across different subjects.
- It must be clarified in the paper how a certain value is used for a specific hyperparameter. Grid search + Cross validation? Random?
- Figure 4 misses the brain image for vetricle size to 60 ml.
Please state your overall opinion of the paper

reject (3)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- Weak evaluation
- Weak results
- Lack of clinical relevance
- Weak experimental design
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

3
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

This study extended the structural causal model (SCM) on brain MRIs to include multiple sclerosis related covariates. It proposed synthesizing counterfactual images with a variational autoencoder embedded in the SCM. The authors validated the proposed method with four causal factors: WM lesion load, atrophy, disease duration and the expanded disability severity score.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is well written. It reads well.
- The proposed methodology technically sounds and would be of interest for MICCA audience.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Demonstrating the practical clinical impact via experimental results is limited
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- If codes can be shared, it would be useful to reproduce the work.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- Please discuss and relate to other generative models for MRI data in MS
- Introduction: Although WM lesion is the hallmark of MS, but there are also other important pathologies in MS.
- Ablation study for BLS would be interesting
- Is the used MS duration period after diagnosis? Which criteria were used?
- Which modality was used for ventricle segmentation?
- Fig 4: there are only 4 counterfactual images; it seems that 1) is missing?
- Discussion: what could be the main reason for poor counterfactual images outside the training set?
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- The potential of the SCM framework for precision medicine is nicely presented
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The reviewers found this work on causal generative modelling generally interesting and technically sound. The main remarks concern the comparison with respect to other generative approaches from the state of the art, and the lack of clarity on the choice of hyperparameters and structure of the causal graphs. Another common remark is about the limited experimental validation.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Author Feedback

[Model] (R1) Although not extensively discussed in the paper due to space limitations, we did methodically select the structure and parameters of the graph. We started from Pawlowski et al.’s model, which they demonstrated to be effective, and then used common sense to reduce the search space (e.g., age does not cause slice number). We then iteratively added reasonable edges, trained the model, and examined the counterfactual images qualitatively. This is how we selected the SCM, which we briefly mention in Sec. 4.2. We agree that model selection is vitally important for causal inference, and intend to explore this—including structure learning—in further depth in future work.

(R3) To the best of our knowledge, there are not generative models explicitly for generating MR images of MS. An ablation study was conducted but not included in the paper due to length restrictions. Without the BLS, reasonable counterfactual images could not be generated at higher resolutions.

[Evaluation] (R1, R2) Evaluating counterfactuals is an open research question. Our evaluation of counterfactuals with MS lesion segmentation was chosen to demonstrate the extension of Pawlowski et al. to MS. Further evaluation by comparison to existing lesion filling methods is important but outside the scope of this paper, since this paper’s main contributions are an extension of the SCM proposed in Pawlowski to MS and a methodology to generate higher-resolution counterfactual images.

[Clinical Impact] (R2) The goal of our counterfactual model is to improve clinical outcomes; however, that can be achieved by various means. As mentioned above, evaluating counterfactuals is an open problem. Consequently, care should be used when deploying an SCM for clinical applications. Our SCM could improve the performance of common image processing tasks, like lesion filling because it provides a principled way to control for confounding, which is especially prevalent in medical imaging due to small sample sizes, sample bias, etc. See “Causality matters in medical imaging” for an extensive discussion of this point.

[Assumptions] (R2) The independence of the distribution of the parents and the function mapping them to the child node is a strong but common assumption in causal inference, known as “independence of cause and mechanism”. A discussion of why this is reasonable is in Sec. 2.3.5 of “Elements of Causal Inference”.

[Generalization] (R2) Generalization is important in ML, however, we argue that our model is first-and-foremost a causal model not an ML model. Causal models are used to understand the effect of one variable on another to answer scientific hypotheses. The VAE and NFs we use are nuisance models that we do not directly care about. We only need them to study causal effects.

(R2, R3) More training data would improve the generalizability of our model.

[Miscellaneous] All code and a full listing of hyperparameters will be linked to upon acceptance of the manuscript.

(R2) Hyperparameters were chosen based on qualitative evaluation of the counterfactuals.

(R3) MS duration is the period after diagnosis (McDonald criteria). T1-w images were used for ventricle segmentation.

[Typos] The equation on page 4 (R1) and the Fig. 4 caption (R2, R3) will be corrected.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal focuses on the clarification of the proposed methodology, in particular concerning statistical assumptions, the choice for the causal graph and the problem of model selection. The issue of evaluation and assessment is also discussed.

Overall, the proposed approach is judged innovative and original. While the experimental results seem preliminary and rather qualitative, the contribution seems solid enough to justify acceptance to the conference.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors present a structural causal model (extension of a recent work by Pawlowski et al) to generate high resolution MR images of multiple sclerosis. Reviews were mixed but clearly leaning towards acceptance, and authors rebutted the main criticisms. I therefore recommend acceptance.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This manuscript presents a new method for structural causal modeling on brain MRI to include relevant Multiple sclerosis variables such as lesion load, atrophy, etc. The proposed framework could be used to support personalized decision-making in the MS assessment. Small though on my side: I think the use of EDSS score is on one hand justified, as is the current overall disability scale, but I’ve always found its direct link with MR controversial, as EDSS is also rater dependent and basically reduce to one single number a lot of different disabilities.

Results are explored on small dataset and only for the training set (authors justify this in their rebuttal). I think the authors tackle a very challenging and interesting problem that could have a huge scientific impact to the better assessment of patients in MS but also in other domains involving MRI biomarkers and clinical variables.

However, my impression is that this work is still at a very preliminary stage. It is unfortunate that the authors did not give more room to the important description/justification of the model selection, structure, and hyperparameters choice/influence (only qualitative evaluation was done, but then how this would affect the outcome). The latter is not successfully addressed in the rebuttal. And though there are some further explanations in the rebuttal I think the model justification and hyperparameter influence study should have been included in the main paper. Moreover, supplementary material space could have been used to this end. I might miss understood but it seems no cross validation is performed, so it is unclear to be if the same results/trends would prevail if changing the training folds.

In its current form, in my opinion, this new SC model does not proof in fact its practical value nor added value as regards other methods. Based on this I would not recommend the paper to be accepted.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

14

back to top

A Structural Causal Model MR Images of Multiple Sclerosis