Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Jiancheng Yang, Shixuan Gu, Donglai Wei, Hanspeter Pfister, Bingbing Ni

# Abstract

Analyzing ribs in computed tomography (CT) scans is clinically critical but labor-intensive, as 24 ribs are typically elongated and oblique in 3D volumes. Rib segmentation plays an important role in rib visualization tools to assist reading. However, prior arts (with or without deep learning) are computationally inefficient, as they generally work on dense 3D volumes; Besides, in-house datasets are used in these studies. To address these issues, we first develop a large-scale dataset for rib segmentation, named \emph{RibSeg}, with a computationally intensive approach to generate rib annotations using morphology-based image processing algorithms and several hand-crafted parameter groups. To our knowledge, it is the first open dataset in this research topic, including 490 CT scans with their annotations after manual checking and refinement to ensure the annotation quality. Considering the sparsity of ribs in 3D volumes, we design a point cloud-based baseline, to segment ribs on binarized sparse voxels. It achieves high segmentation performance (Dice~$\approx95\%$) with significant efficiency ($10\sim40\times$ faster than prior arts). The RibSeg dataset, code, and model in PyTorch are available at \url{https://github.com/M3DV/RibSeg}.

SharedIt: https://rdcu.be/cyhMB

# Reviews

### Review #1

• Please describe the contribution of the paper

Rib understanding in CT is essential for rib fracture and bone lesions, and rib segmentation and rib centerline extraction are helpful for rib visualization to assist rib reading.

The paper proposed a geometric deep learning model to segment ribs, i.e., RibPoint. The method performs robustly even in extreme cases, e.g., in complete rib cages and pathological cases. The backbone of RibPoint is PointNet++. Its input is the sparse 3D point cloud of binarized ribs, and its output is point prediction which can be convert back to segment the volumes. And then, Rib centerlines could be easily obtained from the rib segmentation using geometric algorithms. The paper also provides a large-scale public CT dataset for rib segmentation and centerline extraction.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This article integrates a variety of methods, and the method is reliable and reasonable..

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

but it is not very innovative. And the article is not written fluently. Modeling the RibPoint is the main contribution. however, the method is described too simple. for example, after thresholding the non-bone tissue in the original image with the 200 HU, the remaining voxels are mainly from the bone volume, which are randomly downsampled and converted to point sets. These points are input to PointNet++ for training 3D point cloud segmentation task, i.e., RibPoint. why does the RibPoint have the robustness for the incomplete rib cage? how is the centerline reflected in the qualitative analysis?

• Please rate the clarity and organization of this paper

Satisfactory

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

This article provides a detailed method implementation, so it is easy to reproduce this method.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The authors should compare their method with the ones in [6] [9] and [16].

borderline reject (5)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This article integrates a variety of methods, but it is not very innovative. The article is not written fluently.

• What is the ranking of this paper in your review stack?

4

• Number of papers in your stack

5

• Reviewer confidence

Very confident

### Review #2

• Please describe the contribution of the paper

This work deals with fast rib and centerline segmentation in 3D CT images. 490 images were used. Mean Dice is 95%. A database on the methodological aspect of this paper of about ca. 500 images shall be made publicly available. A order of 10 speedup compared to other methods is claimed from sample studies.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

-there is a lack in statistic evaluation: no standard deviations given -there seems to be no cross-validation scheme in use (-> not standard deviations) -it is unclear, if test and training sets were compiled randomly or picked -the metric Dice^(P) calculus is unclear -in the absence of statistical testing the holy wording similar to “significantly better” is unfounded and should be removed from the text

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

-statistics, there are no standard deviations with the means for statistical testing -the hardcoded HU-threshold of 200 is reasonable for bones, but needs better motivation. Why this threshold? Does it work for every patient? -do you use cross-validation in the evaluation, if not, why not? -methods seem borrowed from other publications (UNet and PointNet)

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

-as method description is not self-contained here, reproducibility is limited

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

-what is the meaning of “understanding ribs”? -what is the term “prior arts”, please choose other words -use italic font for e.g. “RibPoint” consistently in the paper -with 9 pages, the paper is over page limit

Probably accept (7)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The data base initiative is the positive aspect. However there are shortcoming in terms of method news.

• What is the ranking of this paper in your review stack?

4

• Number of papers in your stack

5

• Reviewer confidence

Very confident

### Review #3

• Please describe the contribution of the paper

The paper presents both a newly labeled dataset for rib segmentation as well as a novel approach to segment the ribs using a combination of sparse geometric deep learning model followed

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The approach seems novel and fast. Additionally, the creation of a labeled dataset is a great addition for the community.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper has two main weaknesses that could be resolved with further clarifications and by stating results more directly. First, some the claims made are not well backed and overly positive. Reducing the assumed claims or providing further studies will easily resolve this concern. Second, the several key details are missing that make it difficult to evaluate the manuscript and its results.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Sharing of the dataset and code goes a long way in terms of reproducibility. Key details are missing in the paper but may be found in the source material.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The following lists elements from the manuscript that can help strengthen the manuscript.

Abstract

• The ribs are oblique relative to slice planes, not relative to volumes. This can be worded more precisely.

Dataset

• Adding further descriptions of the datasets as well as the labeling process can greatly strengthen the manuscript.
• What are the characteristics of the volumes? Slice spacing, reconstruction kernel, contrast injected, manufacturer? Without these details it is difficult to gauge the robustness of the model or the potential impact of the data.
• How was the data labeled - was adjudication used? How many radiologists were used per dataset?
• What are the characteristics of the patients - some have missing ribs (not all people have a complete set of ribs)- what is the break-down of abnormalities of the data?

Robustness tests do not cover cases where patients have missing ribs, How where these selected, why is it limited to 3 kinds?

Many sentences are quite positive without firm evidence.

• “perfectly segmented”
• “great potential..for downstream tasks, such as diagnosis of rib fractures”
• “totally acceptable in consideration of its performance boost”

The paper avoids comparisons with other approaches stating that it is “unfair”. These should be listed. It is not at all clear why speed of the approaches aren’t approximately comparable. Also in reference to speed, the paper makes very limited comparisons to arrive at the conclusion that it is superior in terms of speed.

The image preparation for the model training contains ad-hoc parameters that are poorly explained. How are point set sizes arrived at and are they considered anisotropic? These details would be valuable in the manuscript. Equation (2) contains no details about the meaning of the variables shown.

Table 2 shows no confidence intervals for any of the results. There is no clearly statistically significant approach.

borderline accept (6)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper lacks key analysis of the results while making rather strong claims based on weaker evidence. Descriptions of the data and a lack of confidence intervals on the results make it difficult to review. Overall the approach seems somewhat novel and dataset shared can be useful as a basis for further research. The papers claims should be reworded unless additional evidence is provided.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

4

• Reviewer confidence

Very confident

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This work aims at rib segmentation and extraction of rib centerlines. A strong point is the introduction of a novel dataset. Methodological contributions seem to be limited, and there were issues identified regarding the experimental evaluation, especially regarding statistical significance claims. Overall, reviewer indicate a borderline assessment, therefore authors are invited to do a rebuttal to clarify issues. Authors are encouraged to identify and address the issues that they find important in their rebuttal, while also dealing with the aspects of methodological contribution and experimental evaluation mentioned above.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

# Author Feedback

Dear meta-reviewers,

We appreciate all (meta-)reviewers (R2, R3, R4, MR2) for the high-quality reviews. All discussions in this rebuttal will be addressed in the revision.

The primary concerns from the reviewers could be summarized as:

1. Methodology novelty (R2&R3&MR2). As the title implies, the focus of this paper is the dataset, which is the first open benchmark for rib segmentation / centerline extraction and resource to develop downstream applications. We provide strong baselines not to claim novelty, but to examine current 3D deep learning methods to better understand the computational challenges of the task. The merits of our proposed baselines lie in the engineering effort to achieve reasonable performance with standard deep learning architectures to better foster future methodology and application development. With our strong baselines, we found the point cloud representation makes the model focus on the geometry of rib structrures better compared to voxel-based CNNs, which may explain the robustness of the proposed RibPoint.

2. Evaluation

• Comparison with other methods (R2&R4). The code and data in [6,9,16] are not publicly available, which makes it hard for our reproduction of these methods to have a fair comparison. Instead, we implement a voxel-based 3D CNNs on our dataset, to represent the most related work [6] based on 3D CNNs. This difficulty motivates us to develop an open dataset as benchmarks and open methods to advance research in this task.
• Statistical evaluation (R3&R4&MR2). We provide the mean and standard deviation (std) of results by repeating the experiments 3 times, which reaches the same conclusion as in the main text. It will be added in the revised paper. RP: RibPoint. Methods | Dice^P (std) % | Dice^V (std) % 3D UNet | - | 86.2 (0.11) RP30K | 92.3 (0.04) | 91.0 (0.03) RP250K | 91.6 (0.08) | 92.4 (0.09) RP30K+aug | 94.8 (0.05) | 93.2 (0.03) RP250K+aug | 94.6 (0.04) | 95.2 (0.02)

• CT scans and dataset splits (R3&R4). We use the CT scans and the official dataset split (train, dev and test) from RibFrac challenge for rib fracture detection [4], and we develop rib segmentation and centerline annotations on the dataset. The CT scans were enrolled with high standards for clinical applications, please refer to RibFrac challenge [4] for patient characteristics. We did not perform cross-validation since the official split is used. Notably, our dataset and models could be directly integrated with RibFrac challenge to develop downstream applications (e.g., rib fracture detection)

Other issues: a. Method details (R2&R3&R4). We leave some method details as supplement due to page limit. R3 and R4 also mentioned that details could be found in the supplement. However, we will revise the main text with key details. As for hyperparameters (HP), we tune all the HP on the development split.

b. Annotation verification (R4). At present, the annotations are generated with hand-crafted morphological algorithms, and then manually checked by a junior radiologist with 3D Slicers. If an annotation is identified as unqualified, it will be removed from the dataset without refinement. This pipeline greatly reduces the annotation cost, but it is also the reason why only 490 cases are included.

c. Thresholding of bone (R3). The HU threshold of 200 is reasonable for human, as HU values are measured by comparing water and air. The HU of cancellous and cortical bones are [300,400] and [500,1900], respectively, thus a threshold of 200 is enough to filter out the sparse voxels of bone.

# Post-rebuttal Meta-Reviews

## Meta-review # 1 (Primary)

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

After assessing the reviews and the rebuttal, which adequately addresses most of the detailed comments from reviewers, to me the paper is still on a borderline range. Most importantly, the main contribution boils down to making available the rib dataset including its annotation, which are simultaneously points (1) and (3) in the contribution section of the paper, while the baseline (contribution (2)) is a straight forward re-implementation of existing work and moreover comparison to other state of the art methods is neglected. While releasing data plus annotations is to be cherished in principle, I think that this contribution on its own can not be the sole reason for acceptance at MICCAI.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

20

## Meta-review #2

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

While everyone (including the authors) agree that there is little novelty, a solid MICCAI contribution can also present a novel dataset. To my understanding, this is the case for this paper and the data (rib point cloud data) is of great interest to the MICCAI community. The employed baselines are appropriate. One point that should be addressed in a final version (if accepted) its the so far limited critical discussion, as mentioned by reviewer #3 “The paper lacks key analysis of the results while making rather strong claims based on weaker evidence”.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

## Meta-review #3

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The main contribution of this paper about the dataset release and the annotation provided will make MICCAI field forward. The value of this open dataset is very valid. The more or less issues on the method section can be improved if authors carefully follow reviewers’ detailed suggestions. Authors also need to tune down what they claimed as well.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10