Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Qingsong Yao, Quan Quan, Li Xiao, S. Kevin Zhou

Abstract

The success of deep learning methods relies on the availability of a large number of datasets with annotations; however, curating such datasets is burdensome, especially for medical images. To relieve such a burden for a landmark detection task, we explore the feasibility of using only a single annotated image and propose a novel framework named Cascade Comparing to Detect (CC2D) for one-shot landmark detection. CC2D consists of two stages: 1) Self-supervised learning (CC2D-SSL) and 2) Training with pseudo-labels (CC2D-TPL). CC2D-SSL captures the consistent anatomical information in a coarse-to-fine fashion by comparing the cascade feature representations and generates predictions on the training set. CC2D-TPL further improves the performance by training a new landmark detector with those predictions. The effectiveness of CC2D is evaluated on a widely-used public dataset of cephalometric landmark detection, which achieves a competitive detection accuracy of 81.01% within 4.0mm, comparable to the state-of-the-art fully-supervised methods using a lot more than one training image.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_17

SharedIt: https://rdcu.be/cyl1H

Link to the code repository

https://github.com/ICT-MIRACLE-lab/Oneshot_landmark_detection

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a deep learning framework for landmark detection, CC2D, which learns from a single annotated image and a number of unannotated images. The framework first uses a self-supervised learning step to learn to approximate the landmark positions for all training images and then trains a CNN-based landmark detector using the approximated landmark positions. CC2D was evaluated on a public cephalometric dataset and achieves good performance.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- CC2D is a novel framework to automatically generate landmark annotations using a single annotated training image.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The authors report the performance to be “comparable to the state-of-the-art fully-supervised methods”. However, looking at the results in Table 1 this does not seem to be the case. The best performing 4mm SDR is 89.85% compared to 81.01% for the proposed method. Looking at the clinically relevant 2mm SDR the best performing method achieves 73.33% compared to 49.81% for the proposed method. While the results are impressive for just using a single annotated image, they are not comparable to the state-of-the-art.
- The training/testing set-up is not clear. The authors state that they use a 150/250 image split (as in the related literature). However, from the wording it is not clear whether these are the very same images within the train/testing sets as previously reported, or whether the sets just contain the same numbers of images. It is also not mentioned how the single annotated image was chosen.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- The dataset is publicly available.
- Parameter details are specified but no link to code is provided.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- I would recommend for the authors to rephrase the statement about achieving comparable results to fully-supervised state-of-the-art methods.
- “On one hand” should be “On the one hand”.
- In Figure 2 it says “1th” instead of “1st” on one occasion.
- [6] is mentioned as having “satisfactory performance”. However, looking at the paper suggests that [6] achieves state-of-the-art performance.
- “testset” should be “test set”.
- [10] and [14] do not list performance values as per Table 1 - are these the wrong references?
- Should “most of the landmarks in Fig. 2(b)” be “most of the landmarks in Fig. 4(b)”?
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This work presents an interesting first step towards automatic landmark annotation from very limited labelled data (i.e. a single annotated image), which will be of interest to the MICCAI community. However, currently the results are not competitive to fully-supervised state-of-the-art methods. While it would definitely be beneficial to be able to learn from very few training samples, for clinical applications, performance would be the criterion of choice to decide on a method to be used.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

3
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

The paper proposes a novel method for one-shot anatomical landmarks detection. The idea of the paper is to 1) create the pseudo-labels via template matching, where the matching is done in the feature space of a self-supervised model 2) training an end-to-end model for landmark detection based on the pseudolabels.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Novel method and excellent idea
- Clean approach
- Ablation study presented
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The paper lacks proper statistic reporting. The authors need to re-run the experiments multiple times, and report the standard errors over the runs.
- Evaluations in other domains could also strengthen the paper
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The method seems to be easily reproducible, but I cannot really guarantee that the results in the tables are easily replicable.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Dear authors. Please look at the weaknesses section and address the preporting issues in the camera ready. Otherwise, good work!
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

New general method, clean approach, and good empirical results.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

6
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

This paper made a preliminary attempt to learn landmark detection from only one labeled image. A coarse-to-fine self-supervised learning strategy is proposed to train the feature extractor. Experiment results show that the proposed method achieves a competitive detection accuracy compared to fully-supervised methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The motivation of using very few labels to train a landmark detection network is good.
2. The idea of using self-supervised learning to train the feature extractor is interesting.
3. The manuscript is well-written.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Not compared with other few-shot landmark detection methods, such as [1].
2. The usefulness of the first stage CC2D-SSL is not validated. It can be validated by comparing with other one-shot methods, or unsupervised methods like kNN, or even the performance before training.
3. It is important for one-shot methods to be able to generalize to the few-shot setting, but no few-shot experiment result is provided.
4. It is unexpected that fine-tuning with pseudo labels could significantly improve the performance (from 68.38% to 81.01% in terms of 4mm SDR), but the reason is not clearly explained. [1] B. Browatzki, and C. Wallraven. “3FabRec: Fast Few-shot Face alignment by Reconstruction.” CVPR 2020.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Need the original code to reproduce the result.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. Please compare with at least 1 one-shot method and 1 unsupervised method.
2. Please provide results of using 5 and 10 labels in training.
3. Please provide more insightful analysis on the improvement of Stage 2, if possible.
4. There are some typos of “1th” and “2th” in the manuscript.
5. In stage 1, simply using randomly rotation and color jittering to simulate unseen test images is not realistic enough, which may limit the generalization ability of the feature extractor. The information of unlabeled images may help if it can be naturally incorporated into stage 1.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper makes an early attempt on few-shot landmark detection. Although there are some flaws in the experiment, it can be an important baseline in this field.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

3
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The work presents a one shot learning approach for medical landmark detection, evaluated on a cephalometric data set. One shot learning is a relevant idea in medical image analysis, since it allows costly and tedious expert annotation effort to be largely reduced. This work presents a step towards this goal. Reviewers agree on the relevance of the work and are all in favour of acceptance. Some weaknesses are mentioned, including mentioning the latest results on the benchmark dataset, and discussing the clinically relevant results of 2mm detection accuracy, where there still is quite a gap to fully supervised methods. In case of acceptance, the final paper should be revised to address at least these issues, and if possible other issues raised by the reviewers. Overall, very good work that is of interest to the MICCAI community.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Author Feedback

Thanks to all of the three reviewers for providing such valueble comments ! We will address the concerns in the following:

Q1: While the results are impressive for just using a single annotated image, they are not comparable to the state-of-the-art. A1: Yes, we will correct this in the final version. Moreover, we have solved a bug in the training code and further improved the performance.

Q2:The training/testing set-up is not clear. The authors state that they use a 150/250 image split (as in the related literature). However, from the wording it is not clear whether these are the very same images within the train/testing sets as previously reported, or whether the sets just contain the same numbers of images. It is also not mentioned how the single annotated image was chosen. A2: We chose 125# image in the training set. We use the same images provided by the official website. Also, we will report the mean and std performance when choosing 10 template images randomly in the footnote.

Q3: The paper lacks proper statistic reporting. The authors need to re-run the experiments multiple times, and report the standard errors over the runs. A3: We will repeat our experiments and report std in the final version.

Q4: Not compared with other few-shot landmark detection methods. The usefulness of the first stage CC2D-SSL is not validated. A4: We will compare our methods with those methods in the future work.

Q5: It is unexpected that fine-tuning with pseudo labels could significantly improve the performance (from 68.38% to 81.01% in terms of 4mm SDR), but the reason is not clearly explained. A5: As shown in the first paragraph in page 3: “On theother hand, recent findings show that training an over-parameterized network from scratch tends to learn noiseless information firstly. In our case, as wecannot predict every training point as accurate as ground truth in the SSL stage,a newly trained landmark detector can improve the performance by capturingthe regular information hidden from the noisy labels produced by the CC2D-SSLstage”

Q6: It is important for one-shot methods to be able to generalize to the few-shot setting, but no few-shot experiment result is provided. A6: We will support few-shot setting in the future work.

back to top

One-Shot Medical Landmark Detection