Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Heqin Zhu, Qingsong Yao, Li Xiao, S. Kevin Zhou

Abstract

Detecting anatomical landmarks in medical images plays an essential role in understanding the anatomy and planning automated processing. In recent years, a variety of deep neural network methods have been developed to detect landmarks automatically. However, all of those methods are unary in the sense that a highly specialized network is trained for a single task say as- sociated with a particular anatomical region. In this work, for the first time, we investigate the idea of “You Only Learn Once (YOLO)” and develop a univer sal anatomical landmark detection model to realize multiple landmark detection tasks with end-to-end training based on mixed datasets. The model consists of a local network and a global network: The local network is built upon the idea of universal U-Net to learn multi-domain local features and the global network is a parallelly-duplicated sequential of dilated convolutions that extract global features to further disambiguate the landmark locations. It is worth mentioning that the new model design requires much fewer parameters than models with standard convolutions to train. We evaluate our YOLO model on three X-ray datasets of 1,588 images on the head, hand, and chest, collectively contributing 62 landmarks. The experimental results show that our proposed universal model behaves largely better than any previous models trained on multiple datasets. It even beats the performance of the model that is trained separately for ev ery single dataset. Our code is available at https://github.com/ICT-MIRACLE- lab/YOLO Universal Anatomical Landmark Detection.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_9

SharedIt: https://rdcu.be/cyl5C

Link to the code repository

https://github.com/ICT-MIRACLE-lab/YOLO_Universal_Anatomical_Landmark_Detection

Link to the dataset(s)

https://github.com/ICT-MIRACLE-lab/YOLO_Universal_Anatomical_Landmark_Detection


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper presents a deep learning system that learns to detect anatomical landmarks from multiple datasets at once. The proposed architecture, GU2Net, contains a local universal component and a global application-specific component. GU2Net has been evaluated using three datasets (head, hand, chest) and yields state-of-the-art performance.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • GU2Net presents a novel idea to train the landmark detection network on multiple datasets at once.
    • GU2Net achieves state-of-the-art performance for head and hand dataset (difficult to judge for chest dataset).
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The alternative methods in Table 1 are incomplete: [6] does not mention the head results listed for Ibragimov et al. (potentially a wrong reference?); [13] does not mention the head or hand results listed for Lindner et al. (potentially a wrong reference?); [20] and [16] used slightly less images and a 3-fold cross-validation set-up to obtain their hand results.
    • The public head dataset contains annotations for two doctors, in [21] the average annotations were used; it is not clear whether this is also the case in this study.
    • Judgig performance for the chest test results is difficult given the limited amount of alternative results provided; a public landmark-annotated chest dataset is available and would have allowed comparison to alternative methods (JSRT Database, http://db.jsrt.or.jp/eng.php, https://www.isi.uu.nl/Research/Databases/SCR/).
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • The image datasets are publicly available.
    • No reference to the landmark annotations of the chest dataset is provided.
    • Parameter details are specified but no link to code is provided.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • The authors may want to correct the references in Table 1 and highlight differences in the experimental set-up.
    • The authors may want to check/clarify why in Table 2 the MRE is provided in mm but the SDR in px.
    • It is not clear how the chest image results feed into the results from Table 2 as they were reported as pixel distance (instead of mm) in Table 1.
    • The authors may want to add how the examples for Figure 2 were chosen (at random?, best performing?).
    • The authors may want to experiment with applying some smoothing to the output heatmaps.
  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper presents a novel idea of utilizing multiple datasets at once to train a landmark detection network. I believe the results are of interest to the MICCAI community. However, the comparison to alternative methods in Table 1 lacks detail. Overall the paper is largely well written.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    The paper claims two contributions: First, it learns to detect landmarks on multiple anatomies at once using a global-local architecture. And second, it shows state-of-art performances on these anatomies, thanks to this architrecture.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the paper, in my opinion, is the state-of-art performances on the three datasets used, for a parametrically cheap price. Such a strong performance offers interesting insights into the commonalities of these vastly different anatomies. Also, the paper is well-written, structured, and sensible baselines are chosen.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. My main concern with the paper is the premise that ‘it is desirable to develop a model which is learned once and works for all the tasks’. In medical image analysis, where failures can have severe consequences, incorporating as much domain-specific knowledge as possible is always an advantage. However, I agree that an effort towards generalisability across anomalies could open new research directions. Thus, I’ve ignored this reservation for the rest of my review.

    2. Architecture: The local network’s separable convolutions seem to be the key component for this work. However, they were introduced in [4], and adapted here to predict Gaussian heatmaps. The philosophy of local-global processing is inspired from [16], as mentioned in the paper. Addition of the global network with domain-specific parallel branches is thus of incremental novelty.

    3. Universality: Calling this architecture ‘universal’ is misleading. The set of anatomies / domains has to be know before training. And the test sample should fall within this domain. I do not find any component in the method that enables universality. Moreover, the work is missing a study of the contribution of universality. I mean, “did the fact that the network learns on multiple anatomies help it in anyway”?

    [4] Huang, C. et al.: 3d u2net: A 3d universal u-net for multi-domain medical image segmentation. MICCAI 2019 [16] Payer,C. et al:Integrating spatial configuration into heatmap regression based cnns for landmark localization. Medical image analysis 2019

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No public implementation is provided. Section 3.1 and 3.2 are informative enough to facilitate reproduction. Although I request for two details:

    1. It is not clear what LR being [1e-4, 1e-2] means. I imagine this is a cyclic scheduler component? If so, please mention it. Also mention other scheduling parameters and initialisation related minutiae.

    2. How was the final inference model chosen?

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. As mentioned above, the work has merit thanks to the SOTA performance. However, a study of WHY this performance is achieved would be interesting. In table 1, for example: performance of \phi_LN is on par with GU2Net. At least the MRE is within a standard deviation. So, how is \phi_GN contributing?

    2. An interesting baseline would be a domain-wise comparison with U-Net: ‘GU2Net vs U-Net with one anatomy -> GU2Net vs. U-Net with two anatomies -> and so on.’.

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The novelty of the approach is limited, not warranting a clear acceptance at MICCAI in its current form. The community could, however, certainly benefit from learning about the work due to its SOTA performance. So, I strongly recommend a workshop submission and a possible extension.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    The authors developed a network that jointly extracts local and global features from the 2D X-ray images to detect anatomical landmarks, the network was directly trained using mixed datasets from multiple anatomical regions. By evaluating the proposed network on three datasets, i.e., 2D X-ray images from head, hand and chest, the authors demonstrated that their method performs better than the related models that are specially designed for each dataset separately.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed network is an universal model to landmarks detection and can be trained using mixed datasets containing different anatomical structures.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The evaluation of the proposed method is less than adequate, it is unclear whether or not the model can be applied to 3D volumetric images or training based on mixed datasets with multiple imaging modalities.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of this paper is high with its open code and data.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. The authors assumed all landmarks to be detected would appear in each image, but if partial landmarks are missing from the testing image, does the model still work? i.e., Can the model automatically recognize missing landmarks from the testing image?

    2. The statement “we believe that there are common knowledge among the seemingly different anatomical regions, observing that the local features of landmarks from different datasets share some characters (such as likely locating at corners, endpoints, extrema of curves or surfaces, etc.)” seems be incorrected, the landmarks defined in clinical practices have their own unique clinical meanings and do not necessarily share the same local features in images.

    3. In the experimental results, ratios of false positive and false negative for the detections should be reported.

    4. The authors resized the images during training, does this operation is also needed during testing?

    5. In the description of “exclude cases that are labeled as abnormal lungs to form our experimental dataset” in Section 3.2, what “abnormal lungs” refer to?

    6. For the statement “since the physical spacing is not known, we use pixel distance to measure model’s performance” in Section 3.2, why the image spacing can not be obtained, I think the image spacing as a meta information can be directly read from the images.

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The idea of proposed method seems to be straightforward: 1) The strategy that jointly extracts local and global features from images is not hard to found in the related literatures; 2) The separable convolution used in the local network is very similar to that reported in Ref. [4].

    2. The authors claimed that their method is trained on multiple domains, while the evaluation is only performed on the images have the identical imaging modality, i.e., X-ray images. Besides, the method was verified to performing well on 2D images, but it is unclear whether it can yield consistent good performance on 3D volumetric images, which are more common in clinical practices.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This work attempts to train a “universal” landmark detector, which is applicable in different scenarios (in the paper, head, hand and chest X-ray data is evaluated), while not requiring retraining for each specific scenario. Reviewers agree that this is a meaningful contribution, however, some concerns are raised by reviewers regarding the actual methodological contribution of their network architecture since local global architectures are abundant, as well as aspects of the experimental evaluation which need clarification. Authors are encouraged to identify and address the main points of the criticisms from the reviews, and also incorporate the two main issues mentioned above in their rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    6




Author Feedback

We sincerely thank all reviewers and AC for their time and efforts. Per AC’s request, we address the main points of the criticisms from the reviews. Also, we will release the source codes and labeled chest landmarks if this paper is accepted.

Common concern: Q1: The novelty of the architecture. The local-global architectures are abundant. How is \phi_GN contributing? A1: When dealing with landmark detection for three different anatomical regions (head, hand, and chest in this paper), conventional wisdom [13][16][20] is to design one network architecture but train three separate networks. Our GU2Net is the first single network in the literature that records SOTA detection performances on the three datasets, thus rendering the universality. Further, it uses much fewer parameters. We are proud that our network, inspired from SCN[16] and U2Net[4] but with seemingly simple yet important modifications, achieves the effect of ‘1+1=3’!

Local-global processing can effectively combine both local and global features and further improve performance. Table 2 shows that GU2Net achieves a gain of 0.06 (5%) and 1.17 (50.6 %) in terms of MRE when compared to \phi_LN and \phi_GN, respectively. Specifically, \phi_GN is a light-weighted network (with 0.45M parameters) designed to compute global heatmap and help locate landmarks roughly. \phi_LN uses parallel separable convolutions to learn multi-domain knowledge and compute accurate local heatmap. The above two empowers GU2Net to integrate the common knowledge of multi-datasets while allowing domain specificity and reaches SOTA performances (see Table 1) on all three tasks, while U2Net[4] doesn’t reach SOTA performances for segmentation.

In the following, we address reviewers’ specific concerns.

Reviewer 1: Q1: In [21] the average labels of the head dataset were used. A1: We also used the average labels and we will specify this in the final version.

Q2: Limited amount for chest test results. A2: Thanks for providing the datasets. We will extend our chest dataset in the final version.

Other details:

  • In Table 1, ‘-‘ means that no experimental results can be found in the original paper.
  • In Table 2, the MRE should be in px (mm is a typo).
  • Figure 2 was randomly selected from good performance results.

Reviewer 2: Q1: The work is missing a study of universality. A1: Thank you for the concern. We believe that it is easy to distinguish domain (head, hand, and chest). We will add a classification network to enable total universality in the final version, and the GU2Net will automatically assign the corresponding convolutions according to the classification results.

Other details:

  • LR being [1e-4,1e-2] means the base_lr and max_lr of the cyclic scheduler.
  • The inference model was chosen as the one with minimum validation loss.

Reviewer 3: Q1: How to deal with missing landmarks? A1: There are not missing landmarks in the three datasets. To evaluate this, we crop the head image on the bottom and right sides by ratios of 0.7, 0.8, 0.9, and then feed it into our inference model. We observe that landmarks inside the cropped image can still be precisely detected while the missing landmarks have an extremely lower belief in the heatmap. Thus, we can use a threshold to filter out missing landmarks.

Q2: The statement about local features seems to be incorrect. The landmarks do not necessarily share the same local features in images. A2: We agree that the landmarks don’t necessarily share the same local features. It doesn’t conflict with our method. Our GU2Net has domain-specific parameters to learn unique features for each domain as well as shared parameters to learn common features.

Other details:

  • We resized images during training and testing.
  • Chest dataset only contains png images and offers no information about spacing.
  • We will extend GU2Net to support 2D, 3D, and other modalities at the same time in future work.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Authors provided a convincing rebuttal regarding their claim of a universal landmark detector. Concerns regarding the experimental evaluation were addressed and clarified. I disagree with reviewer 3 that the evaluation of the proposed method is less than adequate. This meta reviewer thinks that the contribution of the work is sufficiently interesting to be presented at MICCAI. The authors are encouraged to incorporate all indicated changes, and to perform a final proofread to ensure the high quality of MICCAI papers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In the rebuttal, the authors addressed all the important points of criticism. However, I agree with Reviewer 2 that “universality” is overstated and could mislead the reader. The sentence “a model which is learned once and works for all the tasks” gives the impression that their “You Only Learn Once” method is capable of localized arbitrary landmarks, also in anatomies for which it was not trained. This, of course, is not true. However, the authors in the response only deal with the question of how to recognize one of the three anatomies the model is trained. Further, I don’t think that the authors have shown that learning multiple tasks is beneficial. As Reviewer 3 noted and confirmed by the authors, it is a question of how much different tasks have in common. In ablation studies, the experiments are concentrated on showing the benefit of using the architecture proposed in [16], while the question of how much simultaneous learning of tasks improves the performance is neglect. Experiments where their models are trained on each task independently and a combination of two tasks should answer the question. Finally, I think the work could be better motivated if the model is trained to localized the same landmark in different modalities. In that case, it would be intuitive to have a combination of unique and common features. Having said that, I think that manuscript is mature enough and has the quality needed to be published at MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The submission proposes a DL-based approach for anatomical landmark detection on multiple anatomies trained on multiple datasets. The key strength of the method is that only one single network is needed for multiple tasks, achieving SOTA performance. While this is impressive (and of interest to the MICCAI community), the reviewers raised the concern that the methodological novelty may be limited and it would be more insightful to investigate the “why” it achieves SOTA in most of the cases. In the rebuttal, the authors addressed some of the concerns, e.g. clarified that the proposed architecture is a single architecture for all tasks with less parameters. However, major concerns regarding the “universal” character of the proposed method, the evaluation and the potentially wrong references remained rather open.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    20



back to top