Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Hieu T. Nguyen, Hieu H. Pham, Nghia T. Nguyen, Ha Q. Nguyen, Thang Q. Huynh, Minh Dao, Van Vu

Abstract

Radiographs are used as the most important imaging tool for identifying spine anomalies in clinical practice. The evaluation of spinal bone lesions, however, is a challenging task for radiologists. This work aims at developing and evaluating a deep learning-based framework, named VinDr-SpineXR, for the classification and localization of abnormalities from spine X-rays. First, we build a large dataset, comprising10,468 spine X-ray images from 5,000 studies, each of which is manually annotated by an experienced radiologist with bounding boxes around ab-normal findings in 13 categories. Using this dataset, we then train a deep learning classifier to determine whether a spine scan is abnormal and a detector to localize 7 crucial findings amongst the total 13. The VinDr-SpineXR is evaluated on a test set of 2,078 images from 1,000 studies, which is kept separate from the training set. It demonstrates an area under the receiver operating characteristic curve (AUROC) of 88.61%(95% CI 87.19%, 90.02%) for the image-level classification task and a mean average precision (mAP@0.5) of 33.56% for the lesion-level localization task. These results serve as a proof of concept and set a baseline for future research in this direction. To encourage advances, the dataset, codes, and trained deep learning models are made publicly available.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_28

SharedIt: https://rdcu.be/cyl52

Link to the code repository

https://github.com/vinbigdata-medical/vindr-spinexr

Link to the dataset(s)

https://vindr.ai/datasets/spinexr


Reviews

Review #1

  • Please describe the contribution of the paper

    A new large spine X-ray dataset. A deep learning framework for detecting spine lesions.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Experiment on large dataset. Comparison with popular detection methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The framework design seems straight-forward. The proposed method is also not compared on existing public datasets with other methods (a bit concern for result reproducibility).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Overall framework seems reproducible, but dataset is not publicly available yet.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. Some lesion types are with very limited number of samples, so I doubt performance on those types are affected by the sample sizes. Hopefully the authors can provide some insights for performance per lesion type.

    2. A bit confused that whether each scan is annotated by “3 participating radiologists” or just by one expert radiologist? If it annotated by three people, how’s the agreement rate? If just annotated by one expert, how to confirm his/her annotation is correct?

    3. Table 4, denotation L1-L7 are misleading (although footnote explanation is provided), especially for spine paper.

    4. Table 5, it seems except L2, whole framework performance is close to detector only. Justification of using the whole framework is thus needed.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    For the effort of conducting experiments on large spine dataset and the comparison across SOTA methods

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    2

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    The paper presents a new spine X-ray dataset consisting of 10,468 images which can be used for algorithm development. In the images spine anomalies are annotated. Together with the dataset, baseline results for image classification task and lesion-level loca

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A new large scale dataset of spine X-ray images is presented A baseline for detection and classification task is provided

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is unclear to the reader wether the data were acquired with one type of imaging system, or systems from various vendors. Were the systems flat panel systems or also II systems? Typically for X-ray images augmentation (contrast enhancement, smoothing, shift, rotation, etc) is performed which was not performed here. Unfortunately, the different lesion types are very unevenly distributed, so that a unbalanced training will be the result. The lesion type number in the experiments does not match the ids in Table 2

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    With data will be provided and evaluated algorithm are described reproducibility of the paper should be given

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    More details on the data would be helpful. The lesion type identifier L1-L7 should be renamed to e.g. LT1-LT7 since L1-L5 also describe the lumbar vertebrae which was a little bit confusing when having a first look on the paper The numbering of the lesion types should be consistent over the whole paper The training could be improved by hyper-parameter optimization. The discussion of the results give no additional insight.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The data set might be interesting to the community. Most of the weaknesses can be cured. The setup of the training is quite simple and borderline to state of the art (e.g. no augmentation)

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    The authors present a new large- scale dataset of 10,469 spine X-ray images from 5,000 studies that are manually annotated with 13 types of abnormalities by radiologists.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors provide a base-line performance of state-of-the-art deep learning approaches on the released dataset.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are no significant technical novelty in the deep learning framework presented.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Large open source data.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    While the collation of such a large annotated dataset and establishing a benchmark on which new frameworks can be tested is a major contribution to the field, MICCAI is not the most suitable source to publish work that lacks innovative technical contribution.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a major contribution to the field in terms of collating a large dataset and establishing a benchmark on which new frameworks can be tested. The justification for the borderline decision is that MICCAI is not a suitable source to publish work that lacks innovative technical contribution.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The main contributions of this paper is presenting a new, useful, physician-labeled dataset which can be merits if publicly released to the field. The experimental results are largely solid and authors did well on citing most recent literature.

    The negative side is the rather limited technical novelties that will be addressed in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5




Author Feedback

Summary of main Reviewers’ concerns

Thank you for reviewing our manuscript. We appreciate the Reviewers for their comments and suggestions. In the following, we summarize three major critiques raised by the Reviewers and our point-by-point responses to the Area Chairs and Program Chairs:

1 - While the collection of such a large annotated dataset and establishing a benchmark on which new frameworks can be tested is a major contribution to the field. The framework design seems straight-forward. The negative side is the rather limited technical novelties that will be addressed in the rebuttal.

Answer: We thank the Reviewers for recognizing the importance of our dataset and the benchmark results when training and evaluating deep learning models on it. In addition to those contributions, the technical novelties of our work lie at the proposed fusion of an image-level classifier and a lesion-level detector (see Section 2.3 Decision fusion rule). This rule uses the result of the classifier to influence the detector. As shown in Table 5, the proposed fusion rule, although very simple, helps improve the mAP of the detector by about 0.5%. We have also experimented with a counterpart fusion rule to use the result of the detector to influence the classifier, which led to boosting the AUROC of the classifier by about 1.5%. Even more important than improving the performance of each individual model is the fact that those fusion rules maintain the consistency between the outputs of the two models at two different scales. Namely, you cannot say an image is abnormal if no lesions are localized and vice versa. We plan to additionally present this result in the revised paper to highlight the effectiveness of the proposed ensemble mechanism.

2 - The proposed method is also not compared on existing public datasets with other methods.

Answer: To the best of our knowledge, no existing public datasets and methods have been devoted to the localization of multiple spine lesions from X-ray scans (see Table 1). The lack of such a dataset prevents us from comparing the proposed method to any others.

3 - Overall framework seems reproducible, but the dataset is not publicly available yet.

Answer: To encourage new advances in this research direction, our dataset, codes, and trained deep learning models will be made publicly available. The MICCAI Conference review process is double-blind, i.e., the names of the authors will be hidden from the Area Chairs and Reviewers. Therefore, the dataset hasn’t been opened yet because it contains identifying information of authors from our project web page. At the time being, we have built a data descriptor for the released dataset. Additionally, a web page for visualization purposes has also been designed. To protect the patient’s privacy, all personally identifiable information associated with the images has been removed. Textual information appearing on the image data (i.e. pixel annotations that could include patient’s identifiable information) have been fully removed and were then manually verified. Moreover, the dataset has been submitted to PhysioNet (https://physionet.org/) for public download and will be made freely accessible after the final decision of the paper.

Minor concerns

We have carefully considered all other Reviewers’ comments and have addressed all minor concerns raised by Reviewers to further improve the quality of our manuscript.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper is significantly strong in novelty but this is a very solid work (rated as three borderline accept by three reviewers). The performance benchmark is well executed and the public release of this dataset is helpful for the MICCAI community on this clinically important problem.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    9



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper collected a large radiograph data set for spinal lesion and will make it public. This will benefit the research community. They developed an ensemble model and validated on an independent data set, and seems to be reproducible. The rebuttal sufficiently addresses the novelty issue raised by the reviewers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The main contribution of the manuscript is the annotated dataset published after acceptance. While I find that collecting and annotating such a large set of images, as well as making it available to the community, is an effort that should be rewarded, I don’t think the MICCAI manuscript is the most appropriate to present a dataset. I would advise the authors to send their manuscript to an appropriate medical journal, where they might also receive feedback from domain experts on both the representativeness and annotation quality of the data set.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    9



back to top