Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Sumedha Singla, Stephen Wallace, Sofia Triantafillou, Kayhan Batmanghelich

Abstract

Model explainability is essential for the creation of trustworthy Machine Learning models in healthcare. An ideal explanation resembles the decision-making process of a domain expert and is expressed using concepts or terminology that is meaningful to the clinicians. To provide such an explanation, we first have to associate the hidden units of a Block-box to clinically relevant concepts. We take advantage of radiology reports accompanying the chest X-ray images to define concepts. We discover sparse associations between concepts and hidden units using a linear sparse logistic regression. To ensure that the identified units truly influence the classifier’s outcome, we adopt tools from Causal Inference literature and, more specifically, mediation analysis through counterfactual interventions. Finally, we construct a low-depth decision tree to translate all the discovered concepts into a straightforward decision rule, expressed to the radiologist. We evaluated our approach on a large chest x-ray dataset, where our model produces a global explanation consistent with clinical knowledge.


Link to paper

DOI: https://doi.org/10.1007/978-3-030-87199-4_49

SharedIt: https://rdcu.be/cyl4J

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose an approach to derive concept-based explanations to black box models. The approach uses three main components: ¬¬concept associations, where correlated concepts (extracted using NLP) to hidden units of a model; causal concept ranking, where ranking of concepts is expected to lead to better results, and surrogate explanation function, where a simpler model, such as RF model is fitted to derive a rule-based explanation. The approach was tested on the publicly available MIMIC-CXR dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -Novel approach, on a very important topic -Well written paper; nice illustrations -Approach combines simple yet effective components

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Ablation experiment to see the contributions of the components
    • Possibility to compare to another concept-based approach
    • Difficulty to assess the produced rule-based approach (i.e. Would an MD agree with it? Or does it resemble existing rule-based ones?).
    • After checking results with an experienced radiologist, the approach doesn’t seem to yield good explanations.
    • Not much information about what the do(x) ended up doing, nor how critical this part of the process is.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Excellent reproduciblity

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    I enjoyed the study and its presentation. The topic is very important and the methodology is sound. I particularly appreciated the simplicity of the individual components, making the approach more easily reproducible. I would have appreciated some more ablation study to see the actual benefit of component, such as the ranking.

    Also, a comparison to TCAV could have been added (not to compare at the level of the rule-based, cause TCAV doesn’t do that, but at the level of concept activation).

    It would have been nice to assess the outcomes of the model from a medical point of view. Else, it is rather hard for me to tell, as a non-trained radiologist, whether the results in fig. 4 and 6 are actually state of the art, or resemble a good explanation. I asked a colleague of mine, experienced radiologist with expertise in lung diseases, to check the results presented in Figure 4. He commented that the visual explanations were not really pointing to the areas it should. He did saw better results for the “blunt costophrenic angle” but the rest he commented that the explanations did not make sense. He also commented that the “vascular prominence” was pointing to the aorta, which obviously is prominent but has nothing to do with the condition. He finally commented that the example “pleural effusion” + “pleural fluid” was somewhat redundant. I suggest the authors to check these results with their local clinical collaborators.

    Typically several conditions co-appear for a single subject. Would the approach require major modifications do deal with multi-class data? (I understand here binary models were trained)

    Minor: I recommend adapting the title since “black box explanations” are understood as explanations to models where their internal gradients are unavailable. The approach, on the other hand, requires access to \Phi.
    What about “Using Causal Analysis for Conceptual Deep Learning Model Explanation”?

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Relevant topic Interesting approach Results/explanations seem not to provide proper explanations, upon showing them to a clinical expert Potential to improve by adding ablations and comparisons to saliency maps Potential to improve by adding radiologists feedback on the explanations.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper
    • The authors propose an approach to associate hidden representations of neural networks to high-level clinically relevant concepts. To determine the most important concepts for the decisions, the authors use causal inference techniques. Finally, rule-based explanations are produced using decision trees.

    • Approach is validated with MIMIC-CXR dataset, one of the standard chest x-ray datasets openly available.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Motivation for this work is clear and well presented. The problem this work addresses is particularly relevant to increase clinical usage of existing algorithms. The application of machine learning algorithms in the clinical context is hindered by a lack of interpretability. Several interpretability algorithms have been proposed in the last few years but most of them provide saliency map type of explanations, which sometimes are difficult to understand. This work, by associating hidden representations to clinically relevant concepts, is able to provide simpler and clearer rule-base explanations (imitating more truthfully the language of radiologists).

    • The causality-based ranking of the concept importance leads to more accurate explanations.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Concepts extracted from the radiology reports are very close to classification labels (e.g., concept=pleural fluid, classification label=pleural effusion), which may lead to poor explanations (e.g, there is pleural effusion because there is accumulation of pleural fluid). Explanations produced by radiologists usually mention location of problem, and that is missing in the explanations produced in this work.

    • As illustrated in Fig.7, explanation could have redundant rules, and the authors are not addressing that, leading to explanations including irrelevant concepts for the decision.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • If the authors provide all the material they said they will provide, their work is reproducible.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • It would be interesting to also have saliency map explanations produced by state-of-the-art interpretability methods (e.g., Deep Taylor, LRP), and compare them with the visual explanations produced.

    • It also would be of great value to add location information in the explanations (maybe based on the visual saliency maps produced).

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • It is true that there are some similar works in the literature (e.g., Been Kim’s works), but this one has several novel aspects such as the way it associates hidden layers with clinically relevant concepts, and the causality-based ranking approach that allows to select the most relevant concepts for the clinical decision.

    • Moreover, the problem it tackles is of major importance to the MICCAI community.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    The authors develop a novel framework to generate causal explanations from a black box deep learning model. They use ideas from causal inference and mediation analysis to construct a grounded interpretation of DL outputs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Novel framework
    • Use of causal inference to bridge the gap between black box results and interpretability.
    • Good initial results and visualizations
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Experiments and results are not very clear
    • Counterfactual generations are not explained. Just reference is provided.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Not reproducible.
    • No code is provided
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • Explanation of how this is different from status quo is missing
    • How is it different from a baseline of using a regression based method as opposed to the do-calculus?
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • Good direction of research
    • Acceptable results but not enough comparison
    • Hard to follow in parts
  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    2

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers found this paper to be a novel approach to an important subject. The paper was also thought to be well written. On the other hand, there were concerns relating to the lack of ablation experiments, no analysis whether the explanations are radiologically meaningful, no comparisons to saliency map based approaches. The concerns about lack of ablation studies and comparisons to other methods are beyond the scope of rebuttal since new results should not be added to the paper at this point. The authors should address the remaining concerns especially the concerns relating to whether the explanations generated are meaningful from a clinical point of view.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5




Author Feedback

We thank the reviewers for their thorough review of the manuscript and their constructive remarks. Please find our response below:

The clinical meaningfulness of the explanations

We thank R1 for consulting with a radiologist and sharing their feedback. In the concept association step, lasso regression identified multiple hidden units as relevant for a concept. While in Fig.4, we visualize only one hidden unit per concept. Consistent with R1 suggestion, our local clinical collaborator also agreed that saliency map from a single concept unit is insufficient to explain a concept thoroughly. For example, clinically, the entire heart region is important for evaluating an enlarged cardiac silhouette, while the visualized unit in Fig.4 focuses only on the left-heart border. Similarly, the hidden unit for vascular prominence highlights the aorta, which is clinically appropriate but insufficient to explain vascular prominence.

We propose to modify Fig.4. to illustrate multiple units for each concept. Our updated figure highlights multiple important regions for cardiac silhouette, as identified by different hidden units, and together they cover the entire heart region. Further, for a localized concept such as “blunt costophrenic angle,” our updated visualization shows multiple relevant units, all focusing on the lower-lobe region.

Also, we agreed with the reviewers that the current decision trees in Fig.6 and Fig. 7 show redundant rules. To improve readability, we merged the three trees to create a single decision tree. The resulting tree provides a global explanation, which partially explains the diagnosis in terms of concepts. We will also update the text to refer to the existing literature that confirms that the learned decision rules are clinically meaningful.

Details of the counterfactual generation process

We used existing work to generate counterfactual images. A counterfactual is a small modification of the input image that flips the classification decision. We used a generative adversarial network to conditionally create a realistic counterfactual image that achieves a desired prediction from the classifier. We will briefly update the manuscript to add these details.

Comparison against concept-based method such as TCAV

We agree with R1 that a comparison with TCAV will strengthen our manuscript. We used the concept sensitivity score from TCAV to rank concepts for each diagnosis. The top-10 concepts identified by our in-direct effect and TCAV are the same, while their order is different. The top-3 concepts are also the same, with minor differences in ranking. We will update Fig. 5 to present results from TCAV.

Comparison against saliency-based methods

Most saliency-based methods highlight salient regions for the final class label or diagnosis (e.g., cardiomegaly) and not for fine-grained concepts (e.g., cardiac silhouette). Hence, two models are not comparable.

Ablation Study

We support R1 suggestion to include an ablation study. But due to space limitations, we cannot have such a study. Nevertheless, the existing experiments present a partial ablation study. In Fig.5 (bar-graph), we demonstrate that the in-direct effect of a random concept is very small. A random concept represents an ablation of the concept-association step. Here, rather than performing lasso regression to identify relevant units, we randomly select units.

Further, in Fig.5 (bottom trend-plot), we demonstrate the effectiveness of our ranking. The results show that we achieved the best recall for diagnosis classification when we considered top-ranked concepts.

Minor comments

We thank the R1 1’s suggestion to update the manuscript’s title. Our new title is “Using Causal Analysis for Conceptual Deep Learning Explanation.”

R2 suggested including location information in the explanations. However, adding such information requires additional annotation, which is beyond the scope of the current study.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper makes an interesting and novel contribution to the literature on explainability in medical image analysis. The rebuttal clarified the concerns raised by the reviewers in particular the clinical relevance of explanations, comparisons to TCAV and the reason why comparison to saliency maps is not meaningful.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    8



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addressed the reviewer’s and AC’s concerns regarding the explanations generated from a clinical viewpoint. The rebuttal about ablation studies and comparisons with other methods is also convincing. Therefore, I recommend the acceptance of the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This is a novel and well-presented paper to derive casual concepts from black-blox models. The authors have sufficiently addressed the major concerns from reviewers, including radiological meaningful explanation, and partially to the ablation study and comparison with other methods. I believe the topic and novel idea of this paper would be a valuable addition to MICCAI this year.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3



back to top