Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Anna Zapaishchykova, David Dreizin, Zhaoshuo Li, Jie Ying Wu, Shahrooz Faghihroohi, Mathias Unberath

Abstract

Pelvic ring disruptions result from blunt injury mechanisms and are often found in patients with multi-system trauma. To grade pelvic fracture severity in trauma victims based on whole-body CT, the Tile AO/OTA classification is frequently used. Due to the high volume of whole-body trauma CTs generated in busy trauma centers, an automated approach to Tile classification would provide substantial value, e.\,g., to prioritize the reading queue of the attending trauma radiologist. In such scenario, an automated method should perform grading based on a transparent process and based on interpretable features to enable interaction with human readers and lower their workload by offering insights from a first automated read of the scan. This paper introduces an automated yet interpretable pelvic trauma decision support system to assist radiologists in fracture detection and Tile grade classification. The method operates similarly to human interpretation of CT scans and first detects distinct pelvic fractures on CT with high specificity using a Faster-RCNN model that are then interpreted using a structural causal model based on clinical best practices to infer an initial Tile grade. The Bayesian causal model and finally, the object detector are then queried for likely co-occurring fractures that may have been rejected initially due to the highly specific operating point of the detector, resulting in an updated list of detected fractures and corresponding final Tile grade. Our method is transparent in that it provides finding location and type using the object detector, as well as information on important counterfactuals that would invalidate the system’s recommendation and achieves an AUC of 83.3\%/85.1\% for translational/rotational instability. Despite being designed for human-machine teaming, our approach does not compromise on performance compared to previous black-box approaches.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87199-4_40

SharedIt: https://rdcu.be/cyl4o

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces an automated and interpretable pelvic trauma decision support system to assist radiologists in fracture detection and tile grade classification.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed approach improves over a previously described black-box method [11] both in terms of Tile grade accuracy, and its added interpretability.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1, since the main contribution of the paper lies in its interpretability, there is a lack of more in-depth discussion in this topic in section II, which makes its contribution unclear in relation to relevant literature.

    2, the added interpretability by the use of the proposed method should be discussed further in the experimentation, in comparison with related interpretable deep neural nets approach.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper performed 5-fold cross validation - the results may be less biased if multiple runs of 5-fold cross validation are repeated with random seeds.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    This paper introduces an automated and interpretable pelvic trauma decision support system to assist radiologists in fracture detection and tile grade classification. Whilst the main contribution appears to lie in its enhancement over a previously described black-box method [11], the contribution is not clear in the wider literature, nor is well discussed in the experimental section.

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The contribution of the paper is unclear from the interpretability perspective.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    1

  • Reviewer confidence

    Somewhat confident



Review #2

  • Please describe the contribution of the paper

    The authors present a novel automated yet interpretable algorithm for first-order tile AO/OTA grading from trauma CT. The system is designed to decrease the workload of the attending radiologist by facilitating validation and refinement.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A novel framework is presented that establishes causal relationships between tile grade and fracture presence. The framework provides a transparent inference pipeline that supplies fracture location and type, as well as information on counterfactuals, e. g. missed or misclassified fractures, that would invalidate the system’s recommendation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    n/a

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Use of open source data

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Nice work! Bleeding edge techniques applied to medical imaging decision making, e.g. The framework provides a transparent inference pipeline that supplies fracture location and type, as well as information on counterfactuals, e. g. missed or misclassified fractures, that would invalidate the system’s recommendation.

  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Excellent technical innovation

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    The paper aims to provide an interpretable approach that automatically detects and grades pelvic fracture (pelvic ring disruption) for an ER setting, using CT images. The process is based on an object detection algorithm (Faster RCNN) that acts as a fracture candidate and a bayesian model that grades the fracture (A, B, C as combination of translational T and rotational instability) while taking into account the fracture findings. Both detection and grading systems are improved by considering high- and low-confidence fracture detection. The box from the object detection acts as interpretable mechanism. Training is done on diverse set of data / acquisition protocols.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The strength of the paper is the combination of the Bayesian Model for grading with the object detection. It creates a sequential pipeline with two different approaches that serves them well. The process to refine the BM model is allows to refine the object detection model and concurrently helps improve the final grading model. it’s an iterative approach that can probably used in partnership with physician as an active learning mechanism.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There’s a lack of clarity in the paper on the features that are used for BM. This is partly due to the fact that the feature features are collapsed through the paper and the AUC for each fracture type is not super readable.

    Considering the AUC for the object detection for each fracture type, one can wonder whether a supervised approach using a larger selection of candidate for fracture could have worked better than the Bayesian Model, using as target the provided annotations. As mentioned in the conclusion, the the false false positive issue will have to be addressed in the future. However, more importantly, although the refinement step works, one can wonder in fact whether it’s not recovering from a not so strong object detection. From the 2-step training process, it could have been good to explain the gain of the second step. Wondering about the collapse to 8mm MIP thickness, how much resolution are you losing and how does it affect your object detection.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper seems to be reproducible, the data set is available upon request, the training uses public packages. If one can get similar annotation, the paper is clear enough to follow for reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    This is a promising work. Collapsing the fracture features took away from readability. Since you have 3 main steps (object detection, BM and Refinement), it would have helped to show metrics on each step.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Some shortcuts in the paper do not make it very clear. The approach is interesting but seems that the principal flaw comes from a so-so object detection, the rest looks like playing catch up. It’s interesting but not an optimal solution. The interpretability comes from Faster RCNN, so nothing was implemented specifically for it, a little bit underwhelming

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper aims to provide an interpretable approach that automatically detects and grades pelvic fracture (pelvic ring disruption) for an ER setting, using CT images. The strengths of the paper include: 1) a novel framework combining Bayesian model and object detection algorithm; 2) add interpretability in the model; 3) large performance improvement over existing methods. The points should be addressed in the revision:1) in depth discussion of interpretability; 2) clarification of features for Bayesian models;

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3




Author Feedback

We are thankful for all suggestions, and based on reviewers’ comments, we have further clarified the paper.

Reviewer #2 appeals for a broader categorization in the interpretable ML literature to better assess innovation. Concisely, we understand the primary benefit of the proposed approach to be an innovative technique for the explicit separation of imaging findings and diagnostic recommendation. This separation allows for the straight-forward and interpretable fusion of imaging and non-imaging, contextual findings via the Bayesian model. We have also added a sentence and references to better contextualize our work.

Reviewer #4 criticizes the method as being solely designed for the purpose of dealing with poor object detection. We politely disagree, and provide two arguments in support of our method: First, even if our method only was a fail safe (which as per our second argument, it is not) methods are not guaranteed to perform perfectly. Consequently, in high-stakes decision tasks such as this one, having such a fail safe is useful in and of itself. Second, this method is not solely a fail safe. However, our method does more than this. Concisely, it separates the identification of imaging findings from diagnoses, which provides an immediate and clear interface for the integration of non-imaging context which may be needed for diagnosis, but more importantly, clinical decision making. Regarding the justification for using 8mm MIP thickness for training the object detector, we do not provide an ablation study, however, this a priori design choice was grounded in a combination of expert radiologist domain knowledge and works in the clinical radiology literature showing that MIP images improves detection of subtle fractures while preserving attenuation information.



back to top