Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zimeng Tan, Jianjiang Feng, Jie Zhou

Abstract

Airway semantic segmentation, which refers to segmenting airway from background and dividing it into anatomical segments, provides clinically valuable information for lung lobe analysis, pulmonarylesion localization, and comparison between different patients. It is technically challenging due to the complicated tree-like structure, individualvariations, and severe class imbalance. We propose a structure-aware graph-based network (SGNet) for airway semantic segmentation directlyfrom chest CT scans. The proposed framework consists of a feature extractor combining a multi-task U-Net with a structure-aware GCN, and an inference module comprised of two convolutional layers. The multi-task U-Net is trained to regress bifurcation landmark heatmaps, binary and semantic segmentation maps simultaneously, providing initial predictions for graph construction. By introducing irregular edges connecting voxels with the sampled points around corresponding bifurcation land-marks, the two-layer GCN incorporates the structural prior explicitly. Experiments on both public and private datasets demonstrate that the SGNet achieves superior and robust performance, even on subjects affected by severe pulmonary diseases.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_15

SharedIt: https://rdcu.be/cyhLH

Link to the code repository

N/A

Link to the dataset(s)

https://cloud.tsinghua.edu.cn/d/43bbc05fb9714f71a56f/


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents an end-to-end deep learning framework, called Structure-aware Graph-based Network (SGNet), for airway semantic segmentation directly from chest CT scans. Its contributions are using a modified U-Net followed by a structure-aware graph convolutional network to simultaneously detect landmarks, binary airways and perform semantic segmentation. The approach uses graphs to introduce structural prior knowledge. The authors promise to release 60 of their CT annotation that they made on public CT scans to promote further study of airway semantic segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strengths of this paper include combining a multi-task U-Net model with a structure-aware graph convolutional network. The U-Net gives a rough estimate of branch endpoint landmarks and binary branch segmentation locations. The GCN then refines these rough estimates to produce the final output. The GCN uses graphs to give structure/ordering to the detected branch point landmarks. The graphs consist of two sets of randomly sampled points. The first set is randomly sampled around the branch endpoints and the second set is randomly segmented from the branch segmentation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The construction of the graph is not fully described (See 7 below). The paper does not motivate why their branch point graph structure improves their annotation. It would be good to see results based on varying numbers of edges in the graph. It is unclear how many sampled edge landmarks are needed per branch end point. It is unclear how sensitive the method is to varying number of sampled endpoint landmarks, branch landmarks and graph edges. It is not clear what influence the addition of the distance to the carina had on the performance. The authors made claims that certain contributions of their approach boosted performance, but did not demonstrate these boosts in performance. For example, the authors claim that using 32 anatomical segments boosted performance but did not show how much improvement this made. They did not show how much their novel edges boosted performance.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors will provide 60 of 100 of the annotations that they used for training.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. Section 2.2 refers V1 and V2 in Fig 2, but these are not labeled in the Figure.
    2. It is not entirely clear how the graphs are constructed. The first set of links is clear, i.e., points are connected to their k-nearest neighbors. However, it is not clear how the second set of edges are defined, i.e., “Specifically, if vertex vi is classified on the l-th segment (i.e. S1(vi) = l), it should be connected with the vertices sampled around the two endpoints xA and xB (see Fig. 2).” How many connections are there? Is each internal node connected to a single landmark associated with xA and/or xB?
  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method seems sound. The result appears good and are slightly better than existing methods with respect to DSC, TPR and FPR.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    The paper presents a method which combines CNN and GCN for airway segmentation from CT images. The verification test is set for a challenging problem which is to recognize 32 anatomical segments in airway. The results show that the method achieved higher DICE scores with 2-5% margins to the baseline algorithm.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Segmenting airway branches to fine categories of 32 segments is a challenging task. This will be an early work of semantic segmentation dedicated for airway structure. The verification tests were done with a relatively large number of cases for this purpose. The author shows plans to be the annotations of 60 public CT scans in public.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The ablation study is not convincing because the difference is too small. Since segmenting 32 branches is very a difficult task, mis-recognition of a branch will reduce a significant ratio in the metrics. I wonder that the network have overfitted to the data.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The details are described sufficiently. The reproduction of this algorithm will be possible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    It would be better to show the remaining failure cases in airway segmentation with the reasons. Also, it is interesting to compare the proposed method with conventional graph based (not DL based) branch labeling methods.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This will be an early work semantic segmentation dedicated for airway structure that consists of 32 fine segments.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    This paper proposed an airway semantic segmentation method for simultaneous airway segmentation and semantic segment classification. It incorporates the 3D U-Net backbone with graph neural network for refinement of initial segmentation results. Multi-task learning strategy was proposed, together with irregular neighbour edge connecting in graph construction. The authors claimed they will release annotations for future study.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Major strengths: 1) This paper tries to solve a relatively new task of airway semantic segmentation. It has great clinical value but related methods have not been fully investigated in the deep learning era. 2) This paper introduces multi-task learning strategy into backbone network, including the landmark (bifurcation point) detection, dilated semantic segmentation, and binary segmentation. 3) This paper adopted a new graph construction approach in GNN that connects one airway segment voxel to the two end nodes of this segment. 4) This paper is well-written and easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) Although the proposed method solves the airway semantic segmentation directly, it does not fully describe the difference and compare the performance with previous “indirect” methods such as Ref. [12] that first segmented airways and then classified each segment. Ref. [12] Zhao, T., Yin, Z., Wang, J., et al.: Bronchus segmentation and classification by neural networks and linear programming. In: MICCAI (2019) 2) This paper does not consider the anatomical variation of airways. It seems that the proposed method can only segment properly normal structures without pathological changes. For example, sometimes the segments B8 and B9 are complicated, containing only subsegments instead of segments. The authors referred to such variation in failure case study. However, anatomical variation is one of the major challenges in this topic that needs to be solved. 3) The reviewer is skeptical about the role of GCN in improving performance, especially for the gains in trachea, primary, and lobar airways. These airway segments are thick and large, which are easy to segment well directly via U-Net. Considering that the final segmentation inference module uses features from both UNet and GCN, it is not clear how much effect the GCN exerted on semantic segmentation.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    1) The authors did not release all the details about the CNN and GCN architecture, such as feature map channels. 2) The authors did not explain well the sampling procedure for graph construction. No details about the number of voxels sampled from whole segmentation region N_1 and that of voxels sampled from bifurcation landmarks N_2, as well as k in k-NN computation, are presented. It is unclear about the hyper-parameters. 3) The authors did not explain well the projection process that transforms the output features of voxel nodes in GCN into corresponding full-resolution feature maps of all voxels. 4) Codes and data are yet to be released.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Please refer to the comments above for major questions. Some detailed comments are below: 1) The reviewer does not understand the whole training procedure of the proposed method. Is it end-to-end? It seems to me that the UNet and GCN are two independent components that the GCN relies on the UNet prediction results for graph construction. 2) How do you implement the voxel sampling and graph-construction process? Too many details (such as hyper-parameter settings, sampling methods) are unclear. 3) Is it differentiable for the graph construction process that samples voxels from the output landmark heatmaps and airway semantic segmentation? If the answer is no, how does it work for joint-training of the whole framework? 4) Same as 3), it is unclear about the projection mechanism that projects feature vector of voxel nodes in GCN to full-resolution feature maps of all CNN voxels. Is it differentiable? If not, how does it work for training the entire framework? 5) For the baseline U-Net, it is not clear why its performance on trachea and primary bronchi is less than 90% and 80% since both these segments are very thick. 6) The reviewer does not understand why GCN could improve segmentation of trachea, primary airways over 2% on average in metrics DSC and TPR. Besides, why multi-task (m)UNet performed worse than vanilla UNet on these two major large airway segments? 7) It would be more convincing if state-of-the-art methods are compared (including Ref [12] and other network architectures such as V-Net, Attention UNet) and detailed ablation study is conducted on the GCN.

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Concerning the clinical value, motivation, methodology development and paper writing, the reviewer believes it is worth a notice to MICCAI society although there exist some weaknesses and unclear/confusing details.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    2

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This work proposes an airway segmentation method that in addition to a binary segmentation, also gives a semantic labeling of the airway tree. This problem is challenging, and attempting a solution using modern deep learning based methods is a relevant contribution to the MICCAI community. All reviewers comment favorably on the approach, however, a number of detailed issues are raised regarding the method and claims from the experiment. In case of acceptance, authors are encouraged to clarify as many of these issues as possible, however, in the opinion of the meta reviewer, two crucial issues to clarify are (1) the motivation behind the specific choice of graph structure used (given that there is anatomical variation in practice) as well as the construction details of the graph, and (2) the lack of comparison to conventional techniques, which have to be at least stated as limitations of the work, if these have not been performed.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2




Author Feedback

We thank all the reviewers for their time and valuable feedback. General Response: Meta Reviewer/R1/R3 -Graph Construction: We would like to emphasize that the motivation to combine the GCN with irregular edges is to incorporate the structural prior explicitly, which helps to locate each branch. Specifically, we randomly sampled 1024 points for each airway tree to form the first vertex set and sampled 8 points around each bifurcation landmark to form the second vertex set. Each vertex in the first set is connected with its k-nearest neighbors and the vertices located around the corresponding two endpoints in the second set. We set k=16 empirically. Meta Reviewer/R2/R3 -Anatomical Variation: Thanks for pointing out that anatomical variation is indeed one of the major challenges in anatomical semantic segmentation task. We agree with reviewer 3 that our work have not given special consideration to this point, for the topology of airway is relatively fixed at high levels (i.e. semantic classes are rarely missing). Therefore, we did not take mis-recognition of branch as an evaluation metric either (raised by reviewer 2). In the future work, we will explore more on semantic segmentation of other tubular structures with more anatomical variations, such as cerebrovascular and coronary artery. The accuracy of branch recognition will also be an important metric. Meta Reviewer/R2 - Comparison to Conventional Techniques: We agree that the lack of quantitative comparison to conventional techniques is one of limitations of the work and we will state it in the final version. Specific Response: R1 - Figure 2: We thank the reviewer for pointing this out. We will fix this in the final version. R3 - Difference from Related Work: We would like to emphasize that our motivation is to perform semantic segmentation directly from chest CT scans. Previous “indirect” methods (e.g. Ref. [12,13]) focus on centerline extraction and labeling, and the semantic segmentation is based on labeled centerlines. These methods are multi-stage and the final performance depends on previous stages. Differently, our method works in an end-to-end manner. What’s more, we evaluated the results of semantic segmentation, rather than the centerline. Ref. [12] Zhao, T., Yin, Z., Wang, J., et al.: Bronchus segmentation and classification by neural networks and linear programming. In: MICCAI (2019) Ref. [13] Automated lobe-based airway labeling. International Journal of Biomedical Imaging, 2012 (2012)

  • Training Strategy: We note that the multi-task U-Net was pretrained firstly to obtain robust prediction, and then the whole framework was trained jointly. The gradient propagates along the feature vector at each position rather than the voxel coordinates, and therefore the end-to-end differentiable framework can be realized.
  • Performance on Trachea and Primary Bronchi: We agree that the performance on trachea and primary bronchi is unsatisfactory which may be due to the discontinuous prediction on the interfaces of adjacent semantic classes (see Fig. 3). Introducing the structural prior by combining the GCN indicates the interfaces explicitly and contributes to the improvement. However, this is still a problem to be solved and will be further explored in future work.



back to top