Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Riddhish Bhalodia, Ali Hatamizadeh, Leo Tam, Ziyue Xu, Xiaosong Wang, Evrim Turkbey, Daguang Xu

Abstract

Localization and characterization of diseases like pneumonia are primary steps in a clinical pipeline, facilitating detailed clinical diagnosis and subsequent treatment planning. Additionally, such location annotated datasets can provide a pathway for deep learning models to be used for downstream tasks. However, acquiring quality annotations is expensive on human resources and usually requires domain expertise. On the other hand, medical reports contain a plethora of information both about pnuemonia characteristics and its location. In this paper, we propose a novel weakly-supervised attention-driven deep learning model that leverages encoded information in medical reports during training to facilitate better localization. Our model also performs classification of attributes that are associated to pneumonia and extracted from medical reports for supervision. Both the classification and localization are trained in conjunction and once trained, the model can be utilized for both the localization and characterization of pneumonia using only the input image. In this paper, we explore and analyze the model using chest X-ray datasets and demonstrate qualitatively and quantitatively that the introduction of textual information improves pneumonia localization. We showcase quantitative results on two datasets, MIMIC-CXR and Chest X-ray-8, and we also showcase severity characterization on COVID-19 dataset.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_53

SharedIt: https://rdcu.be/cyl24

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper, the authors propose a weakly-supervised approach for pneumonia localization, or grounding, by leveraging the textual medical reports. The proposed method 1) extracts a set of textual attributes and uses a pretrained Retinanet to extract bounding box proposals; 2) assigns weights for boxes based on an attribute prediction task; and 3) learns box-textual query similarity with a cross attention module.

    The proposed method is evaluated on MIMIC-CXR and Chest X-ray-8 dataset and achieve slightly better/competitive results with three baseline methods. The authors also conduct a correlation analysis on pneumonia severity and classification probability and validated the correlation results which is attractive. Qualitative results are given. Some ablative results are given.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) The proposed paper proposes a weakly supervised method for X-ray image grounding based on textual report queries with no box ground truth in training, which could potentially reduce labeling efforts.

    2) The authors also conduct a correlation analysis on pneumonia severity and the predicted probability values of ‘sever’, ‘mild’ or similar attributes. The results show a strong correlation between large positive values and ‘sever’, large negative values and ‘mild’, indicating the effectiveness of the prediction branch.

    3) Quantitative results indicating that the proposed method is able to achieve better results over two weakly supervised methods, CAM and GradCAM, and competitive to a supervised method RetinaNet.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) For the Attribute Classification module, I wonder how the model is able to learn the attributes effectively, e.g. positional/shape: ‘middle’, ‘lower’, ‘small’, when there is no spatial information, like coordinates, aggregated in the feature. From the quantitative results in table 1, the overall IoU is 0.5 even at the 0.25 threshold where figure 2 shows some really good results. Can the authors show some failure cases?

    2) For the Attribute Classification module, how are the ROI Weights alpha learned?

    3) For comparison, the authors adopt CAM and Grad-CAM as weakly supervised method baselines which may be too simple. Can the authors compare their method with Wang et al., in ChestX-Ray8 paper at least on the ChestX-Ray8 dataset? Also, can the authors also conduct comparison of CAM and Grad-CAM on the ChestX-Ray8 dataset?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No code is uploaded. The authors provide training parameters.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Please refer to the above comments. It may be better to have another figure closer to section 3.1 where the authors first mention figure 2. Currently it is too far from section 3.1.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I believe this is an interesting paper that leverage language attributes for weakly supervised pneumonia localization. However, my main concern is that the design of the model lacks awareness of certain attributes for the attribute prediction tasks. Another concern is that the experiments conducted currently may not be convincing to validate the effectiveness of the proposed method.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    6

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    Annotated Pneumonia localized CT data is hard to get and requires domain expertise. Medical reports contain information of both pnuemonia characteristics and its location. This paper proposes a weakly-supervised attention-driven deep learning model that leverages encoded information in medical reports during training to facilitate better localization. Proposed work also performs classification of attributes that are associated to pneumonia.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    In the proposed work, both the classification and localization are trained in conjunction which can be utilized for both the localization and characterization of pneumonia using only the input image.

    In this paper, MIMIC-CXR and Chest X-ray-8 X-ray datasets are being used to demonstrate superiority of proposed work via introducing textual information for pneumonia localization.

    COVID-19 dataset is also used to showcase results of the proposed work.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Overall the paper is well written and easy to follow. Proposed pipeline is interesting. Addressing some comments would make the paper to publish in MICCAI-2021.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Dataset used in the paper is available online. Ideally, one should be able to reproduce results if code is made publicly available

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Manuscript is easy to read and follow. Addressing below comments would make the manuscript more understandable:

    • Please mention novelty of proposed work clearly in an introduction section compared to related work.
    • Please clarify “A Word2Vec [12] model is trained on the entire set of medical reports, and the ones corresponding to the limited attribute set are used as text-features.”
    • Details regarding generation of N ROIs (anchor boxes) should be mentioned in section 3. Generally, anchor boxes are predefined based on the dataset.
    • Formulation of phi is not clear from the line 5 in section 3.3. Please elaborate. Why can not r directly be clubbed with text features?
    • Training is stopped at 185 epochs. Why it is being stopped?
    • Comparison of results with CAM and GradCAM using the activation heatmap is not clear from the text.
  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Proposed idea of utilizing text from medical reports along with labelled data is useful for pneumonia localization

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    4

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    In this paper, a novel weakly-supervised deep learning based architecture is proposed for the classification of attributes associated to pneumonia, localization of pneumonia and severity characterization of COVID-19 pneumonia in chest radiographs. For this purpose, the textual information from the medical reports is leveraged to train a cross-attention model by using the extracted attributes and the corresponding image ROIs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper has a clear aim and focus. The application discussed is clinically relevant.

    • The paper provides an expansive overview of the literature and makes a distinct technological contribution.

    • The authors carry out an extensive evaluation. The quantitative and qualitative results presented provide a good insight. The proposed method is compared with existing methods and is evaluated on multiple datasets (MIMIC-CXR and Chest X-ray-8) along with an application of the trained model on COVID-19 CXR dataset.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper is hard to follow with regards to the explanation on the dataset (number of images, what information is available as ground-truth and in which form etc.). The explanation on dataset could be improved (see comments in section 7)

    • Limitations of the method are not discussed. As a reader, such discussion helps in understanding where the method could fail.

    • From the reproducibility point-of-view, the open-source implementation of the proposed framework is not provided, which in-general benefits the community.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • From the reproducibility point-of-view, the open-source implementation of the proposed framework is not provided, which in-general benefits the community.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • It is not clear from section 3.1 / Figure 2, how “large”, “multi-focal” or “” attributes result in localization input or green boxes.

    • It would be clearer if the outputs of attention classification and cross attention modules are exemplified in Figure 1, similar to input image and report text.

    • It is mentioned in section 3.1 that the attributes were extracted for each disease class. While the current work focuses on pneumonia, the information for other diseases was simply ignored? If it is not relevant, the part of the sentence could be removed.

    • Section3.2 mentions about 2560 pneumonia annotations in ChestX-Ray-8 dataset. It is not clear in which form the annotations are available: pixel-level or bounding-box?

    • From section5.1, it remains unclear regarding the source of bounding-box annotations of MIMIC-CXR dataset.

    • Does Table 1 report numbers on MIMIC-CXR datset on 5% test dataset or hold-out 169 images, the information provided is confusing.

    • Similarly, it was confusing while reading for the first time where 94 images mentioned for the analysis in Table 2 come from, probably little explanation of this dataset could be helpful.

    • The description on dataset could be structured in a better manner. For example, the numbers of images are described for MIMIC-CXR dataset in section 4. However, the same information for ChestX-Ray-8 dataset is spread over multiple sections (sections 3.2 and 4). It would be easier for reader to understand in which form the bounding-box information is available in each dataset and in which form.

    • Page4: Attribute classification: The information on how the weight vectors are calculated from the ROI features is available in the supplementary file, which made the discussion clearer. Probably, include some of these details in the final version of the paper.

    • Figure2: The cases demonstrated are selected randomly? Probably, images with good, average and bad scores could be visualized to gain more insights into the best, average and worst results obtained using the proposed method.

    • I would have liked to see some numbers on how the proposed method compares with the compared approaches in terms of inference time.

    Minor suggestions:

    • Page1: grammatically incorrect: underlying malady (e.g., for COVID-19)
    • Page2: Maintain uniformity for a term: box-detector, bounding-box, weakly-supervised, Retina-net (avoid using “box detector”, “bounding box” etc.)
    • Page2: grammatically incorrect: Class activation mapping (CAM) [22] and its variants [16] is an essential body of work that relies on a surrogate classification task to localize the regions of interest (ROI) and have been utilized
    • Page3: grammatically incorrect: Figure 2(green boxes and attributes) show
    • Page3: grammatically incorrect: This Retinanet box detector also act as -> … acts as
    • Page6: grammatically incorrect: Due different acquisition centers, scanners and protocols across the data the intensity profiles and chest positioning has a huge variation among them.
    • Page7: grammatically incorrect: CAM [22] and GradCAM [16] that uses -> …that use
    • Page7: Reformat: Shows the images, predicted attributes and localization, text snippet form associated report.
    • Page7: Rephrase in two sentences: (a) shows two scans taken at day 0 and day 4 of the same subject and notice that our model
    • Reformat Figure 4 (left image is clipped at right) and Tables 1 and 2 (increase the spacing between columns) for better readability
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novel methodology and a detailed evaluation on multiple large datasets.

    The description of the dataset and experiments is sometimes hard to follow.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    8

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    In this paper, a novel weakly-supervised deep learning based architecture is proposed for the classification of attributes associated to pneumonia, localization of pneumonia and severity characterization of COVID-19 pneumonia in chest radiographs. The work is novel with clear focus and results are superior.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2




Author Feedback

N/A



back to top