Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Yankun Lang, Hannah H. Deng, Deqiang Xiao, Chunfeng Lian, Tianshu Kuang, Jaime Gateno, Pew-Thian Yap, James J. Xia

Abstract

Dental landmark localization is a fundamental step to analyzing dental models in the planning of orthodontic or orthognathic surgery. However, current clinical practices require clinicians to manually digitize more than 60 landmarks on 3D dental models. Automatic methods to detect landmarks can release clinicians from the tedious labor of manual annotation and improve localization accuracy. Most existing landmark detection methods fail to capture local geometric contexts, causing large errors and misdetections. We propose an end-to-end learning framework to automatically localize 68 landmarks on high-resolution dental surfaces. Our network hierarchically extracts multi-scale local contextual features along two paths: a landmark localization path and a landmark area-of-interest segmentation path. Higher-level features are learned by combining local-to-global features from the two paths by feature fusion to predict the landmark heatmap and the landmark area segmentation map. An attention mechanism is then applied to the two maps to refine the landmark position. We evaluated our framework on a real-patient dataset consisting of 77 high-resolution dental surfaces. Our approach achieves an average localization error of 0.42 mm, significantly outperforming related start-of-the-art methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_46

SharedIt: https://rdcu.be/cyhQ1

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This papers presents some improvements of the MeshSegNet method (described in [11]) in order to better localize anatomical landmarks on 3D mesh representing dental models.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Interesting results, especially on patients with defect.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The improvements of the MeshSegNet method are not very clearly presented. Evaluation of results in too short.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The results are not directly reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. Introduction
- Digitalization is not localization: in general, digitization gives a 3D mesh.
- One way to get dental model is also to scan a dental cast.
- What is a “standard” care which requires to localize 60 landmarks?
- What is the “training” you mention in “consumption during training”?
- Detail the modalities of “medical images” (2D/3D, US, CT, X-ray…).
- Another ways to localize landmarks is 1) to map a template (i.e. an atlas or an average model) on the dental model by using a non-rigid registration algorithm. If you have previously identified landmarks of the template, you can transfer them on your data (see for example https://www.pnas.org/content/108/45/18221 ) 2) to analyze the local shape of the 3D mesh, in general by using curvature values (see for example http://www.cad-journal.net/files/vol_9/CAD_9(6)_2012_747-769.pdf ). Did you analyze this kind of methods?
- The recent reference https://arxiv.org/pdf/2012.12946.pdf could be of interest.
1. Method
- In fact, you work is an improvement of the MeshSegNet architecture. You should emphasize this point by:
  - Showing in Figure 2, the differences with Figure 1 of [11].
  - Discussing the improvements. In 2.1, you add Gaussian, maximum and minimum curvatures. But, is Gaussian curvature useful as it is the product of maximum and minimum curvatures? How do you compute precisely these values? In 2.1, you talk about the localization path which is new. Explain its interest. We can understand what are A_S1 and A_S2 by referring to [11] but A_L1 and A_L2 are new and we only know that A is an “adjacent matrix” without any explanation about how to compute it. 2.2 is new and should be justified: detail what is the objective (we have only one sentence about it at the end of section 1) and after you can give all the technical details.
1. Experiments
- Why do you perform data augmentation? How do you choose the rotation, translation and scaling parameter ranges?
- Why do you normalize the feature matrix? What is GNC?
- Explain precisely your evaluation parameters: is RMSE computed between corresponding landmarks (estimated and true ones). What is “misdetection”?
- Figure 5 is very interesting but the discussion is much too short. How do you detect the absence of some teeth? Is it possible that some landmarks are missing on a tooth (for example a cusp which was broken) and how do you process it?
1. Conclusion
This part is uninteresting as there is neither perspective, nor future work.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The improvements of the MeshSegNet method are not very clearly presented. Evaluation of results in too short.
What is the ranking of this paper in your review stack?

4
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper presents a method for dental landmark localization named DLLNet. It hierarchically extracts multi-scale local contextual features along two paths: a landmark localization path and a landmark area segmentation path. It uses a coarse-to-fine framework to first conduct coarse detection followed by refined detection. The coarse detection is built upon a published work named MeshSegNet for teeth segmentation. The main contribution from the paper is the proposition of DLLNet, which claims to extract multi-scale local contextual features. The authors concluded that the addition of curvature features and attention mechanism had led to improved performance.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The main strengths of the paper are as follows:
- the proposition of DLLNet
- the use of Gaussian curvatures, in addition to regular geometric features
- use of attention mechanism
- results are convincing (quantitatively and qualitatively) through ablation study and visual presentations
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
This paper lacks clarity in certain areas. In particular, there are questions arising from reading the paper:
1. Although the DLLNet is proposed to work on meshes with different number of cells, it is unclear what the DLLNet models are trained on. In particular for Fig. 1, the first DLLNet seems to be trained and then applied on the teeth partitions, whereas the second DLLNet seems to be trained and then applied on the sampled meshes (obtained from the coarse detected landmarks). Are the two DLLNet models separate and different from each other? Is it possible to train a DLLNet for individual tooth, rather than one for all? In Fig. 1, what do the rectangles mean above “Teeth partition”?
2. Robustness of the proposed approach. The 3D scan seems to be smooth and complete. In reality, 3D scans are noisy and may contain holes and complex geometries. How will these complications affect the model performance? If there is one or more teeth missing, will the method be able to recognize that?
3. 3D scans can have texture information recorded (e.g., color). How will that information be potentially integrated into the proposed framework for improving performance?
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The network structure seems to be clearly presented. There are questions on what data are used for training the two DLLNet models.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The authors can provide discussions on robustness and generalization capability of their method related to more practical data scenarios (e.g., imperfect input scans, missing teeth, teeth from different ages of population, etc.).
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Built upon a known work (MeshSegNet), the paper proposes the DLLNet through integrating data features including curvature information and attention schemes, and conducted cross validation and ablation studies in comparing with other approaches in the field. Results are promising quantitatively and visually.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

This work targets at the landmark localization on 3D intraoral surface models. The task is meaningful for automating the preprocessing procedure of orthognathic surgeries, while challenging for the high data resolution and the nature of landmarks. While there have been a few existing works on the task, the work proposes a novel, simple yet seemingly effective method to tackle the challenges.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The work proposes a novel method for the landmark localization on 3D intraoral surface meshes. To localize on high-resolution data, the method utilizes two cascaded steps, where a the first step localize regions of interest on low resolution, and the second steps take the regions and attentions to produce accurate local predictions on high resolution. The idea is reasonable, and seems effective according to experiments. The method is compared with serveral recent landmark regression works, and outperforms them with a large margin.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. In the Table 1, the proposed method and all its variants are way better than existing methods in accuracy. What is the main contribution to such advantage? More explanation in the ablation study is needed.
2. There is also reproducibility concerns about the work. See below.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
1. What are the inputs to other methods included in the comparison? Are they working on the low resolution or high resolution? What is the highest resolution those methods can work with?
2. How the other methods were trained? More details on the training setup can help improve the reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

More details on experiment and training setup for the baseline models should be added. More explanations on the performance boost of the proposed method (and its variants) than the existing methods should be better explained.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The work proposes a novel method on the landmark localization on 3D intraoral surface meshes. The ideas of low-to-high resolution and attention mechanism look reasonable, and the experimental results show their effectivess. However, there are certain reproducibility issues, and more detailed explanations on performance boost can help readers.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

Dear authors, your work has intrigued the reviewers, however further clarifications of the method as well as results are needed. Given that your approach relies heavily on MeshSegNet [11], it is important that the comparison to MeshSegNet and other SOTA methods is fair and clearly explained. The limitations of the method should also be mentioned, e.g. how robust the method is.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Author Feedback

Improvements over MeshSegNet are not clearly presented. (From Reviewer #1) A: DLLNet extended MeshSegNet to a multi-scale landmark detection network. The segmentation path in Fig. 2 detects areas where landmarks may exist (landmark RoI). With the same modules but different receptive fields, the localization path detects landmarks from these areas. By concatenation, the two paths collaboratively explore multi-scale contextual feature vectors, and output a landmark area segmentation map S and a coarse regressed heatmap H. Then, an attention mechanism is performed to further refine initial landmark localization results, where S is used as an attention map. By performing Hadamard product on H with S, the refined heatmap \hat{H} forces landmarks localized in landmark areas to eliminate misdetection due to feature similarities. This procedure also constrains the training of S and H by each other. Finally, the landmark localization results are determined by \hat{H} as the coordinates of the mesh cell with the largest probability value. Additionally, computing \hat{H} can be regarded as a local coarse-to-fine processing since S can be viewed as a coarse landmark detection result.

Lack of explanations on performance. (From Reviewer #1, Reviewer #2, Reviewer #3) A: DLL-C outperforms MeshSegNet and other compared methods partly due to the curvature features, which are added into the input matrix by considering that landmarks are located on cusps or fossa. DLL-CS further improves the accuracy by collaboratively performing multi-scale landmark detection via the landmark area segmentation path. For DLL-SA, its overall performance is slightly worse than the other two variants, because curvatures are not considered. However, DLL-SA still outperforms the other compared methods via local coarse-to-fine processing, where the landmark area segmentation results are used as attention maps, forcing each landmark to be localized within the landmark area, thus reducing misdetections. By integrating all these strategies into our framework, DLLNet ultimately achieves the highest accuracy and the lowest misdetection rate when compared to related methods.

Lack of discussion on the robustness of proposed approach. (From Reviewer #1, Reviewer #2) A: Our approach has been successfully tested on 77 patients. For the 15 patients who are partially edentulous, tooth absence can be directly detected from the pre-segmentation results.

How do you choose the rotation, translation and scaling parameter ranges? (From Reviewer #1) A: Each case has 50% possibility to be rotated, translated or rescaled along the X, Y, and Z axis with a random parameter sampled between the range mentioned in Section 3.1.

More details on experiments and training setups for the baseline models should be added. (From Reviewer #1, Reviewer #3) A: The architectures of the compared methods were implemented following their original papers. For PointNet++ and PointConv, the input is a N × 9 matrix, each row containing 3D coordinates of a mesh cell. N is the total number of cells. For MeshSegNet, the input is a N × 15 matrix as described in the original paper. All the compared methods were trained in the same way: In the coarse stage, each teeth partition was down-sampled to get 3000 mesh cells (N = 3000) as an input and the network was trained for 30 epochs with the mini-batch size set as 10; In the refinement stage, 150 mesh cells around each predicted landmark with various offsets (≤ 0.5mm) were sampled and all sampled mesh cells are then taken as input. The network was trained for 25 epochs with mini-batch size 10. All networks in the coarse and refinement stages have the same architecture.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

In the rebuttal, the authors clarified the important points of the reviewers’ criticism. Namely, the authors clarified the extension of the MeschSegNet such that an accurate landmark localization can be archived. Although the method is evaluated on a relatively small dataset of 77 images, the improvement over baseline methods is clear. The approach is sensitive on missing tooth, which can be seen as a limitation, however, I see this as a future work which can be tackle with a pre-processing step as authors indicated. I find the manuscript acceptable for publication, however, the authors should include the first three rebuttal points into the camera-ready version of the manuscript.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

In this paper, the authors’ extended the original MeshSegnet by incorporating the ‘multi-scale’ and ‘attention mechanism’. The results show that these additional common techniques futher improved the landmark localization accuracy. I am still not fully convinced that these additions are novel enough but its clinical usefulness for patiets with defect was good. Overall, the paper has limited novelty but could be interesting to MICCAI community.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper shows a landmark localization approach for dental 3D scans, which is heavily inspired by recent works, but experimentally shows a step forward in prediction performance. Reviewer concerns regarding the explanation of the performance improvement and the lack of details in the experimental setup have been addressed in the rebuttal. Unfortunately authors have not responded on how they will improve their manuscript to incorporate clarifications to the raised concerns. In case of acceptance, authors are encouraged to shorten the introduction to make room for the missing details in the experimental setup. Authors also should use the conclusion section not just as a pure summary of the approach, and make sure that the references section is formatted according to MICCAI standards (no use of et al., there is no problem with having another page of references this year!).
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8

back to top

DLLNet: An Attention-based Deep Learning Method for Dental Landmark Localization on High-Resolution 3D Digital Dental Models