Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Ashkan Khakzar, Yang Zhang, Wejdene Mansour, Yuezhi Cai, Yawei Li, Yucheng Zhang, Seong Tae Kim, Nassir Navab

Abstract

Neural networks have demonstrated remarkable performance in classification and regression tasks on chest X-rays. In order to establish trust in the clinical routine, the networks’ prediction mechanism needs to be interpretable. One principal approach to interpretation is feature attribution. Feature attribution methods identify the importance of input features for the output prediction. Building on Information Bottleneck Attribution (IBA) method, for each prediction we identify the chest X-ray regions that have high mutual information with the network’s output. Original IBA identifies input regions that have sufficient predictive information. We propose Inverse IBA to identify all informative regions. Thus all predictive cues for pathologies are highlighted on the X-rays, a desirable property for chest X-ray diagnosis. Moreover, we propose Regression IBA for explaining regression models. Using Regression IBA we observe that a model trained on cumulative severity score labels implicitly learns the severity of different X-ray regions. Finally, we propose Multi-layer IBA to generate higher resolution and more detailed attribution/saliency maps. We evaluate our methods using both human-centric (ground-truth-based) interpretability metrics, and human-independent feature importance metrics on NIH Chest X-ray8 and BrixIA datasets. The code is publicly available.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87199-4_37

SharedIt: https://rdcu.be/cyl4k

Link to the code repository

https://github.com/CAMP-eXplain-AI/CheXplain-IBA

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose an extension of the Information bottleneck attribution mechanism to better understand what parts of the input image are used for disease prediction on Chest X-ray Images. Their experiments on two publicly available datasets (NIH and the BrixIA datasets) show that the proposed methodologies better help explain the different parts of the image that contribute to a certain prediction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The problem statement and the motivation is clear.
    • The proposed novel extensions of the different IBA techniques by the authors have a logical extension to the original IBA and the results show likewise.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Please see below in Section 7 for details on my comments.

    • Some of the experiments and their intuitions are a little hard to understand.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Major:

    • For the differnt sets of experiments they authors mention that \Lambda is determined such that the noise is minimized using mutual information between Z and the removal of predictive regions. It is not clear how the \Lambda is optimized. Is it a trainable hyperparameter governed by L or is it a heuristic search for within a range?

    – Also is the \Lambda pixel wise specific for every different layer and every different application? More details on this would help understand the method better.

    – For the BrixIA evaluation what are the two horizontal lines draw across the CXRs. Are they determined by the algorithm or by some scoring mechanism in the dataset?

    – Authors mention that for Localization evaluation for the BrixIA score regions a lung segmentatio network was used. But no further details are provided. Was it something pre-trained? What architecture was used? And wouldn’t the evaluation of the score be dependent on how well the network performed and wouldn’t that become a factor? A few lines on this would help the readers.

    Minor: – Typos - MES loss, crips – Figure 1 comparison of the IBM and Inverse IBA – Is it to be interpreted by the slightly more red region in the right image of the inverse IBA? That’s hard to see. – In general the bits/pixel maps are hard to see on the small CXR images. Maybe a change in the color maps or a few more descriptive words in the figure would help the readers.

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The extensions to the IBA and the concept of quantitative evaluation that is human agnostic plus the overall set of experiments and results informed my decision.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Somewhat confident



Review #2

  • Please describe the contribution of the paper

    This paper extends the method of Information Bottleneck Attribution (IBA) and proposes Inverse IBA, Regression IBA and Multi-layer IBA to interpret the prediction results of deep learning models on chest X-rays.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is overall easy to follow.
    2. The paper propose three novel methodology extension based on IBA.
    3. The results of the proposed methods are comprehensively compared to previous methods and demonstrate significant improvement.
    4. The visualization heatmaps show representative regions of chest X-ray captured by the IBA extended methods that correspond to underlying disease conditions.
    5. The paper provides the companying code for reproducibility, though not well documented.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are a few serious mathematical issues related to the proposed Inverse IBA method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper provides the companying code/configuration for reproducibility, but is not well documented, e.g. the README file is almost empty.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    I have major comments related the mathematical correctness of the proposed Inverse IBA method.

    1. Please clarify the exact correspondence between cross entropy loss LCE (Y, Z) and mutual information I(Y, Z), as the authors states “The I(Y, Z) term in Eq. 2 in classification setting corresponds to cross entropy loss LCE.” Cross entropy CE(Y, Z) is not mathematical equivalently to mutual information I(Y, Z), where the author seems to assume the equivalence. Therefore, there is a gap between the first term in Eq. (2) and the first term in Eq. (4).

    2. In Eq. (5) related to Inverse IBA, I don’t see any difference algorithm-wise compared to Eq. (1) if we re-parameterize lambda with (1 - lambda), given lambda is fixed in range [0, 1]. Eq (1) and Eq (5) only differs in terms of the interpretation of lambda, which then leaves the difference between IBA and Inverse IBA is only the negative sign in front of LCE in Eq (6).

    3. I don’t understand the motivation of putting negative sign in front of LCE in Eq (6). The authors state “The search for lambda that adds the least possible noise while minimizing the mutual information between Z and the objective”. The mutual information I(Y, Z) goes to 0 when Y and Z are completely independent. Therefore, if the purpose is to minimize I(Y, Z), then lambda = 1, since that gives Z full random noise according to Eq. (5), which is completely independent of Y. However, if the purpose is to minimize -LCE(Y, Z), i.e. maximize LCE(Y, Z), then the classification predication based on Z should give exact opposite prediction compared to the ground truth Y, since Y is binary in the classification model. In either ways, the authors again need to clarify the exact correspondence between cross entropy loss LCE (Y, Z) and mutual information I(Y, Z) as pointed out above.

    Minor comments:

    1. Can author please clarify if lambda is jointly trained end-to-end together with other model weights during training or just added and optimized during the inference step with other model parameters fixed, which I assume is the second case?

    2. Please add more details related to Eq (3), e.g. what are P(Z R), Q(Z) = N(mu_R, sigma_R)?
  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There are major mathematical issues related to the proposed Inverse IBA method.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    3

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    The paper address an important aspect related to the application of neural networks to the medical domain - interpretability. More specifically, the authors propose variations based on Information Bottlenect Approach.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written and technically sound. Provides simple improvements on original formulation to address the issue of interpretability.

    The proposed methods are compared on two different datasets on both classification and regression tasks. Reasonable improvement in both qualitative and quantitative results is observed.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The authors suggest that idenfitying all informative regions is more desirable than regions with sufficient predictive power and hence reverse IBA is preferred and provides better results than original IBA.

    However, that same notion is carried further to Multi-Layer IBA? Considering the multi-layer version takes the intersection of regions identified by different layers, wouldn’t the intersection of all the informative regions across layers be more useful than sufficiently predictive regions?

    Authors use different samples to qualitatively compare different methods. However, comparison of all the methods across specific samples would be a more fair and allow the user to compare and contrast different methods easily.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Availability of the data and code makes the work quite reproducible

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Addressing the weaknesses mentioned above should help strengthen the paper.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Reasonably novel contribution that addresses an important issue in the medical domain.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper has its merits especially with regard to improvements to IBA and demonstrating the effectiveness of two tasks on multiple datasets. However R3 has concerns around the mathematical notations and descriptions for IBA. The authors are advised to address these comments and other issues raised by R4 and R1 in their rebuttal

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3




Author Feedback

We thank all reviewers for their valuable feedback. The reviewers state that our methodological extensions are novel (R2, R4) and logical (R1), and technically sound (R4). Our novel contributions address an important issue in the medical domain (R4). All reviewers state that the results demonstrate improvements. R2 mostly requires clarification regarding one of our three extensions, Inverse IBA.

*

R2: The correspondence between cross-entropy L_CE and Mutual Information I(Y,Z) is not clear.

Eq. 2 and Eq. 4 are from the original IBA [16] method. The correspondence is proven in [Ref1], and IBA [16] references [Ref1]. Eq. 12 in [Ref1] shows -L_CE is a lower bound for I(Y,Z). Minimizing L_CE corresponds to maximizing -L_CE and thus maximizing the lower bound of I(Y,Z).

[Ref1] Deep Variational Information Bottleneck, ICLR2017

*

R2: Inverse IBA is equivalent to IBA if we re-parameterize Z. Also, the motivation behind the negative sign in front of LCE in Eq. (6) is not clear.

IBA optimization (Eq. 2) has two terms, I(X,Z) and L_CE, and uses the same Z in both terms. If we reparametrize Z in both terms, then yes, it would be an equivalent algorithm.

However, in inverse IBA (Eq. 6) the two terms I(X,Z) and -L_CE do not use the same Z. The Z is inverted (using Eq. 5) only for the -L_CE term (not both terms).

The motivation for minimizing -L_CE (maximizing L_CE) is adding noise to feature maps to block the predictive regions.

Let us illustrate the differences further, and show the role that the inverse mask (in -L_CE) and the minus sign in -L_CE play together:

IBA) Minimizing L_I corresponds to replacing the feature map with noise. For IBA, Using Z in Eq. 1, this translates to the mask \lambda becoming 0. Minimizing LCE keeps predictive regions and pushes the \lambda towards 1. If keeping a portion of predictive regions satisfies minimizing LCE (i.e. “sufficiently predictive” region), it receives \lambda = 1, and the \lambda for the rest of the regions is pushed to 0 by minimizing L_I.

Inverse IBA) Again, minimizing L_I corresponds to the feature map becoming noise. Same as IBA, L_I uses Z in Eq. 1. L_I minimization pushes \lambda to 0. Minimizing -L_CE (maximizing L_CE) in Eq. 6 corresponds to removing “all” predictive information. In the L_CE term in Inverse IBA, Z is inverted and is defined by Eq. 5, thus removing features corresponds to pushing their \lambda to 1 (if we instead use Z in Eq. 1 for -L_CE term, \lambda moves to 0, and as L_I also pushes \lambda to 0, we get 0 everywhere).

*

R2: Clarify whether you minimize -L_CE (maximize L_CE) or minimize I(Y,Z) for Inverse IBA.

According to Eq. 6, we minimize -L_CE. Minimizing -L_CE (maximizing L_CE) differs from minimizing I(Y,Z) as R2 also mentions. We do not minimize I(Y,Z) in inverse IBA.

*

R1 and R2: How is lambda obtained?

The lambda is the mask that is applied to feature maps of a chosen layer (has the same dimension as the feature maps). The lambda is optimized in the inference step by solving the optimization in Eq.2 (IBA) and Eq.6 (Inverse IBA) for each image when model weights are fixed.

*

R4: Using notion of inverse-IBA in Multi-Layer IBA

The Multi-layer inverse-IBA can be investigated in follow-up work. In this work, we conduct ablation studies to investigate the improvement from each of our proposed extensions.

*

R4: Different samples visualized for different methods.

We provide samples to highlight each method’s merit. E.g. Multi-layer IBA is proposed for high-resolution visualization, and we provide an x-ray sample with a small ground truth annotation.

*

R1: Horizontal lines on BrixIA images.

In the BrixIA dataset, lungs are divided into 6 sub-regions. The lines denote the 6 lung regions.

*

R1: Details of the lung segmentation network for BrixIA.

We use the UNet architecture in BrixIA’s associated paper [BS-Net, MedIA 2021].

*

We will clarify Inverse IBA’s description and address all suggestions in the final version.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In the original reviews R1 and R4 agreed on the paper’s merits, but R2 did not rate the paper highly , citing explanations of mathematical concepts as the primary factor. In the rebuttal the authors have addressed concerns raised by all reviewers, especially those by R2 around the mathematics. In my view this addresses the core concerns. Additionally the authors have clarified concerns raised by other reviewers regarding results. The paper solves and important aspect of explainability and should be of interest to the MICCAI community.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes novel extensions to the Information bottleneck attribution (IBA) algorithm. The results demonstrate potential improvements over the baseline algorithm in the context of COVID prediction. The most significant criticism from the reviewers was related to some of the mathematical definitions and differences from the baseline IBA algorithm. In my opinion, the rebuttal clarifies these concerns and the paper would make a good contribution to MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors present an interesting approach to feature attribution that is built off of information bottleneck attribution (IBA). The main thrust of their modification is that the authors change the formulation from being about finding a sufficiently predictive region to about finding all predictive regions. While I am highly encouraged to see works tackling this problem, I tend to agree with R2 that the mathematical derivations and reasoning were quite problematic and that fundamental details were not explained.

    — I appreciate that minimizing L_CE maximizes the lower bound of I(Y,Z), but the authors’ language in the paper is very loose, stating that there is an equivalence. — The paper does not make it clear at all that the “inverted” Z is only inputted into the L_CE term. The authors clear this up in the rebuttal, but this is a fundamental change that cannot be verified post-rebuttal — The original IBA has some strong theoretical bases for its derivation. The change to equation (5) may break these, but this is not commented on or discussed — The regression IBA is simply of replacement of L_CE with a regression loss. But again, this breaks many of the theoretical bases behind the IBA, which should be commented upon — Two reviewers were confused whether \lambda was learned or fixed or how it was optimized. I shared this confusion, as it was not explained in the paper. The authors clear this up in the rebuttal, but again this is a fundamental aspect of the method that cannot be verified post-rebuttal.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    17



back to top