Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zhiming Cui, Changjian Li, Lei Yang, Chunfeng Lian, Feng Shi, Wenping Wang, Dijia Wu, Dinggang Shen

Abstract

Accurate localization and identification of vertebrae from CT images is a fundamental step in clinical spine diagnosis and treatment. Previous methods have made various attempts in this task; however, they fail to robustly localize the vertebrae with challenging appearance or identify vertebra labels from CT images with a limited field of view. In this paper, we propose a novel two-stage framework, VertNet, for accurate and robust vertebra localization and identification from CT images. Our method first detects all vertebra centers by a weighted voting-based localization network. Then, an identification network is designed to identify the label of each detected vertebra in leveraging the synergy of global and local information. Specifically, a bidirectional relation module is designed to learn the global correlation among vertebrae along the upward and downward directions, and a continuous label map with dense annotation is employed to enhance the feature learning in local vertebra patches. Extensive experiments on a large dataset collected from real-world clinics show that our framework can accurately localize and identify vertebrae in various challenging cases and outperforms the state-of-the-art methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_27

SharedIt: https://rdcu.be/cyl51

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors develop a two-stage algorithm, which they call VertNet, for localization and identification (ie, labeling) of vertebra. Numerous works have previously addressed this problem; however, there are multiple known failure modes, such as pathological cases, imaging artifacts (from metal implants), and limited field of view. The system proposed in this paper is designed to leverage both local and global information, and incorporates voting in multiple stages, to address these remaining challenges. The authors evaluate their method against the SOTA algorithms on a large proprietary real-world dataset, and show excellent performance of their method on both normal and challenging cases.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Method

    • The methodological approach is very carefully reasoned, and clearly justified. The role of each step/model in the overall two-stage pipeline is very clear.
    • Furthermore, the authors conduct a thorough ablation analysis, evaluating the performance gain from each of the steps in their algorithmic pipeline.

    Comparison

    • The authors implement three SOTA methods and compare the performance of their model to each. In each case, their model demonstrates improved performance.
    • The authors show specific example cases where the SOTA methods fail, and their model produces the correct vertebra localization and identification.

    Maturity:

    • This method and approach is at a level of maturity that would have potential applicability in real-world clinical scenarios.

    In addition, the presentation of the paper is excellent.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Comparison to SOTA:

    • In addition to the comparison that the authors provide on their in-house dataset, they should also include a comparison on one of the open-source spine datasets, such as CSI 2014. This would help to substantiate the author’s claim of establishing a new SOTA method for the important task of vertebra localization and identification.

    Algorithm runtime requirements:

    • The authors mention potential clinical applicability of their algorithm, based on the excellent performance. However, for clinical applicability, the runtime requirements will also be an important consideration. The algorithm involves multiple stages: how long does it take to run on CPU? What is the memory consumption? Etc…
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    There are a number of “affirmative” answers in the reproducibility checklist that don’t seem to jibe with the paper: (1) The run time requirements were not presented in the paper (2) Memory footprint of the algorithm was not discussed in the paper (3) Statistical significance of the results was not presented in the paper (4) Failure modes of the proposed algorithm were not presented/discussed.

    In addition, the authors state that the training and evaluation code, as well as the new dataset will all be made publicly available. This will be extremely beneficial for others to reproduce the results presented in this paper. However, I don’t see links within the paper to indicate that the code and data will be shared.

    These issues should be addressed in the final manuscript.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    In addition to the comments in Field 4 (above), please see the following:

    Evaluation:

    • The results presented in Tables 1, 2, 3 should be explicitly evaluated for statistical significance.

    Parameter sensitivity

    • Equation 1 introduces two parameters: delta (=2) and lambda (=5). How sensitive is the overall performance of the pipeline to the choice of these parameters?

    Minor edits:

    • Last sentence in the Intro: “…, giving the high usability in real-world clinical practices.” Seems to be an incomplete statement.
    • Figure 4: “mental artifacts” should be “metal artifacts”
    • Last sentence in Results: “mental artifacts” should be “metal artifacts”.
  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well reasoned, and the presentation is excellent. The proposed method is evaluated, both against SOTA methods, and through a careful ablation analysis. The final results of the method are excellent.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    This paper provides a new two-stage framework for vertebral localization and identification of CT images. The framework first uses a weighted voting location network to detect the center of the vertebrae and then designs an identification network to identify each vertebrae category in leveraging the synergy of global and local information. Specifically, a bidirectional relation module is designed to learn the global correlation among vertebrae along with the upward and downward directions, and a continuous label map with dense annotation is employed to enhance the feature learning in local vertebra patches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This work makes good use of the top-down sequence relationship of the spine when identifying vertebrae categories. This provides an interesting method for classifying objects based on their label relations. This work provides extensive experiments to prove the effectiveness of the detection framework by a large data set containing 1000 chest CT images. The inter and intra comparison experiments are well done. The English writing of this paper is good.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Firstly, the authors do not clearly explain why the existing literature is not capable of detecting vertebrae, which may not convince the readers about the novelty of this paper. Secondly, some important implementation details are missing in the methodology part, which harms the logical flow of the paper. Also thirdly, the necessity and principles of some methods are not clearly explained. These issues make the paper very hard to follow. Please see question 7 for details for the above-mentioned issues. Lastly, the reference indices are wrong. The first reference should be [1], not [5].

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of this article is ordinary. The paper provides the setting value of each hyperparameter, and the form of the loss function is also given in the paper. As shown in questions 4 and 7, some implementations are not explained in detail, however, this may be compensated if the authors make their codes available as stated in their Reproducibility Response.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Following question 4, I would like to give some suggestions on how to improve the significance, readability, and reproducibility of the paper, which I hope could benefit the authors for future research: (1) Why the existing literature is not capable of detecting vertebrae from CT images. For example, the authors mentioned that “such a model may not fully capture the global dependency among all vertebrae and usually limit to local regions” (P2), however, it seems that reference [8] can capture global dependency using RNN’s. If the authors explain this more clearly, the significance of this paper would be more clearly explained. (2) Some important implementation details are missing. For example, in Section II, how to convert heatmap H and offset map O to weighted vote map M? It seems that O provides a location where heatmap value should be mapped to, however, the readers are still confused about how this is conducted because there are no in-depth discussions. What is the shape of O? What does the value in each element of O tensor mean? Also, what if the O value points to a location that is out of the range ? In this case, how to guarantee the reliability of the predictions? (for example, if the image height and width are normalized to 1, what if the network gives an O value of 1.2? Do you just clip it to 1? Although I have not tried by myself, but this is intuitively prone to yield wrong localization results) Similar issues happen in the other modules, e.g., what are the inputs and outputs to the fast peak search clustering method? What are the shapes of p_i and how are corresponding f_i extracted? How does the Post-label Voting algorithm function? … All these are not clearly mentioned, which is not conducive to the reproducibility of the work. (3) The necessity and principles of some methods. For example, in the Vertebra Proposal Generation part: a) Why not simply use methods such as RPN for generating vertebra proposals? b) How can a Gaussian filter be used to generate image patches? To the best of my knowledge, Gaussian filters are used for low-pass filtering or high-pass filtering in the image processing community. So, how can they be used to generate patches? Similar issues happen in the other modules, e.g., why can the Bidirectional Relation Module capture global dependency which can’t be captured by RNN modules (as mentioned in the paper)? How is the floating label obtained? How do they interact with the predicted label generated by the FC layers? What if the predicted floating labels are wrong? If the authors explain this more clearly, the readability of this paper would be more clearly explained.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    In general, the authors have made some efforts in collecting data, programming and writing this paper. Also, I know that the acceptance of this paper may be important to the authors, thus, I would like to rate “borderline accept (6)” despite the novelty, readability, and reproducibility issues listed in questions 4 and 7. However, to be frank, this paper still needs a lot of modification before it can be published. Also, hope that the authors could upload the codes and data as they promised in their Reproducibility Response if the paper is accepted.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    This paper presents a method for localization and identification of vertebrae from CT using a two stage framework according to these two tasks. For each of the tasks, the paper presents several dedicated adaptations such as a bi-directional search module and a continuous label map to support labelling. Results are presented on a comparable large (1000 cases) but not publicly available cohort which limits comparison of the method to others. Still, encouraging results are presented on quite challenging cases.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall, the paper presents an interesting approach to a still relevant but intensively addressed application. The method is sound and contains many interesting extensions. At the same time, given the comprehensive framework with its several extensions carefully designed to fit the purpose, some components are hard to get (probably due to the limited space).

    The ablation study is nice to really see the benefit of the different modules.

    Nice to have evaluation per spine region, but surprising to see performance is best in the cervical region which is usually most challenging.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The clinical problem is still highly relevant, although several work has been presented very recently. For that reason, I find it extremely important to know performance of the method compared to others. Unfortunately, authors have decided to not use a public cohort. Personally, I am wondering why e.g. VERSE 20 data set has not been considered.

    The approach has been compared to other state of the art method. However, it remains unclear how the methods have been obtained/implemented. I suspect authors are contributors to benchmark approaches as otherwise a fair comparison is probably not possible. Have methods been used out-of the box or any re-training, parameter adaptation etc been applied? It would be great to get a feeling why performance of current framework is better than previous ones - is it better design or just more training data/better parameter adjustment.

    It would have been great to have even a more detailed evaluation or more description about the data. This is important to judge how challenging the data set is.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Although the method has been applied to a quite sizeable number of cases (1000), reproducibility is limited as the a non-public cohort has been used. Comparison to other methods is done, although it is unclear how implementations were obtained. In general, the level of detail provided is not sufficient for re-implementation of the method. Code will not be made available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    I missed information about the data and the achieved results in the abstract. At the same time, the abstract already contained many information about the framework which were hard to grasp from the abstract.

    In the intro you write “be pathological or with metal artifacts” - just minor, but I found it very closely related as for me metal artifacts are also caused by pathologies.

    It would be really good to have more infos about the data? What pathologies with what incidences, FOV etc…

    Would be good to see associated error in Fig. 4

    Please clarify the post-label voting. Maybe implementation details could help here. I found it pretty high level.

    What is the runtime during inference?

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, I found the paper interesting to read with interesting components. The paper is evaluated nicely (although still there are also some limitations). One of the major limitations for me is the use of a separate not public cohort. As a result, it really remains challenging to judge performance of the method, especially for a problem that has been addressed intensively in the recent years. Still, I find the presented results encouraging on at least selected very challenging cases.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    6

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Dear Authors, I am happy to inform you that the reviewers agreed that your work meets the quality needed to be accepted at the MICCAI conference.

    Localization and identification of vertebrae is a challenging task mainly due to the similar appearance of the vertebrae in the image. On a large in-house dataset of 1000 CT images, the authors have shown that by leveraging both local and global information the number of failure cases can be reduced.

    Despite the positive evaluation, the reviewers also expressed several concerns that authors should address in their camera-ready version of the manuscript. Although I agree with the reviewers that evaluation of the method on the VerSe20 challenge dataset would better justify the authors’ claim in outperforming SOTA methods, we cannot expect the missing experiment to be performed in the rebuttal phase. Newertheless, authors could discuss how their method goes beyond the top-ranked methods at the VerSe19 and VerSe20 challenges. They should put special attention to those methods of the challenge that also explore local and global information for vertebral localization and identification. Moreover, this comparison will also strengthen the motivation section and clarify the limitations of existing methods. As much as possible due to page limitations, the authors should provide more information on the representativeness of the dataset and clarify the method section as requested by reviewers.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2




Author Feedback

N/A



back to top