Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Shuntaro Mizoe, Yoshito Otake, Takuma Miyamoto, Mazen Soufi, Satoko Nakao, Yasuhito Tanaka, Yoshinobu Sato

# Abstract

We aim to elucidate the mechanism of the foot by an automated measurement of its multiple bone movement using 2D-3D registration of bi-plane x-ray video and a stationary 3D CT. Conventional analyses allowed tracking of only 3 large proximal tarsal bones due to the requirement of manual segmentation and manual initialization of 2D-3D registration. The learning-based 2D-3D registration, on the other hand, has been actively studied and demonstrating a large capture range but the accuracy is inferior to conventional optimization-based methods. We propose a fully automated pipeline using a cost function that seamlessly incorporates the reprojection error at the landmarks in CT and x-ray detected by off-the-shelf CNNs into the conventional image similarity cost, combined with the automated bone segmentation. We experimentally demonstrate that the pipeline allows a robust and accurate 2D-3D registration for tracking of all 13 tarsal bones including the metatarsals at the foot arch which is especially important in the foot biomechanics but has been unmeasurable with previous methods. We evaluated the proposed fully automated pipeline in studies using a bone phantom and real x-ray images of human subjects. The real image study showed the registration error of 0.38 $\pm$ 1.95 mm in translation and 0.38 $\pm$ 1.20 degrees in rotation for the proximal tarsal bones.

SharedIt: https://rdcu.be/cyhPZ

N/A

# Reviews

### Review #1

• Please describe the contribution of the paper

In this study, an automated pipeline for the 4D (3D CT data over time) to X-ray video data is presented. The pipeline comprises a CNN-based segmentation, CNN-based landmark detection and an optimization-based registration part.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• The full automation of the pipeline is a particular strength, making the approach likely suitable for clincal use. The application of 2D/3D registration on a patient-specific basis would be very interesting for treatment decision and optimization.
• The authors present a novel optimization function which integrates landmark-projection into the objective function
• Extensive evaluation
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The pipeline exists mainly of state-of-the-art networks (which have been correctly referenced). Neveretheless, the application and overall approach can be considered as novel and useful.

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors provide the code which is a plus. The authors perform a validation of the methods on bone phantoms in addition to clinical data, which additionally supports reproducability.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Method: A bone-specific weightening of the registration error was not described in the objective function. However, given the different sizes of the bone it could make sense.

Please provide more information about the data. What was the resolution of the CT data? What devices were used for acquisition. Please state that ethical approval and informed consent was obtained.

Chapter 3.3: Was any “motion” simulated on the phantoms ? Chapter 3.4: What motion was acquired for the patient data?

Probably accept (7)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The full automation of the pipeline is a particular strength, making the approach likely suitable for clincal use. The application of 2D/3D registration on a patient-specific basis would be very interesting for treatment decision and optimization.

• What is the ranking of this paper in your review stack?

3

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

### Review #2

• Please describe the contribution of the paper

The paper proposes a system for fully automatic analysis of foot bones from biplane X-rays and CT images. It combines multiple established state-of-the-art technologies (UNet for bone segmentation, DeepLabCut for landmark tracking, CMAES for intensity-based registration) to achieve this. The intensity-based registration is extended with a landmark error term. The pipeline is evaluated on a custom dataset of 18 x-ray videos from 5 volunteers.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• Proposal of system for actual clinical problem with state-of-the-art technologies, extending the level of automation and thus
• Good quantitative evaluation of the steps in the pipeline
• Clear visualizations of the method, experimental setup and results
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• Evaluation of one of the claimed main contributions insufficient: it is unclear what the landmark cost function contributes to the registration results. No baseline method is provided for this step. For the other steps, it is ok not to provide baselines, since they are established
• The paper is good as a system for solving a clinical problem (contributions 1 and 3). The methodic contribution of adding the landmarks to the intensity-based registration itself would not be sufficient for MICCAI (contribution 2).
• Underwhelming results for distal tarsal and metatarsal bones for the fully automatic approach
• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
• Dataset is not open
• Data collection and annotation process is clearly outlined
• Code will be published
• Reuse of publicly available methods
• Hyperparameter values for registration weights are not given, no sensitivity analysis, no way how to be determined…
• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
• The abstract mentions 13 tarsal bones, while the contributions and the literature only list 12. Typo in the abstract?
• P.2: “significantly superior accuracy”: compared to what is the accuracy superior?
• P.3: what is the actual 6 DOF parametrization used? Could be Euler angles, Rodriguez vector, quaternions, …
• P.4 Eq (1): It would be helpful if the cost functions whould show what is used as the input (image, landmarks, only parameters)
• P. 4: Missing values for alpha and lambda, and how they are determined. Also not in the later experimental results
• P. 4: N was previously defined as the number of bones. Seem to be the number of landmarks here. Please use a different variable
• P. 4: Keep page margins
• P. 4: Since you are processing videos, clarify how this is handled. It seems to me the frames are processed independently. Then what is the initialization for each frame? The initial or the previous frame?
• P.5: Results are given with a precision of 3 digits. That is likely not warranted for a dataset of 35 CTs
• P.8: the small size of bones can explain increases in rotational errors. However, translational errors should not be affected as much by this. A more detailed analysis of this effect would be interesting. A related issue could be the definition of the rotation center for the evaluation. It is not clearly specified. Is it centered on each segmented object? Or is there a single rotation center at the center of the CT volume? This has a big influence on “translation errors”. This is also the reason why often target registration errors are reported in other papers.

Probably accept (7)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The overall paper and approach is valid and evaluated thoroughly.

A better score is not given due to the bad results on the tarsal bones and the missing evaluation of the extension of intensity-based registration with landmarks.

• What is the ranking of this paper in your review stack?

3

• Number of papers in your stack

5

• Reviewer confidence

Very confident

### Review #3

• Please describe the contribution of the paper

This manuscript proposes a completely automated pipeline for dynamic 4-D analysis of all the bones in the foot complex using static CT and bi-planar fluoroscopy images. The proposal integrated CNN and multiple other techniques to provide a 2D-3D registration from CT scans onto X-ray images. The authors claim to contribute a pipeline for 4D foot analysis of all metatarsal bones along with tarsals and introduction of 2D-3D registration cost term that incorporates reprojection of error of the landmarks. While the targeted application is clinically relevant, the manuscript itself is not very well organized and has flaws throughout its presentation - either in the methods or their description. This reviewer also feels that the novelty of this pipeline (2D-3D registration using DRR and a combined cost function of image similarity and landmark positions) is not sufficiently focused upon in this manuscript.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) A fully automated system to derive 4D foot motion including all of its bones is presented. 2) Cost function to minimize 2D-3D registration errors is novel in the sense that it combines the gradient correlation similarity of the fixed and moving images along with the landmark registration errors. While such approach has been proposed before, it is novel in terms of using it in combination with CNN and DRR.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1) Clarity and language. Organization of the manuscript is confusing to understand the flow. While readers can understand the rationale of the study, I got lost in all different methods borrowed from multiple resources and in the end it was very difficult to follow what is exactly author’s contribution. 2) Manuscript needs a review by native English speaker to correct numerous grammatical and syntax errors. 3) Methods are either insufficiently explained at places or ask the reader to go to another study to understand those. This itself creates a difficulty in understanding the research.

• Please rate the clarity and organization of this paper

Poor

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

multiple resources needed for reproducibility were not provided - for example the list of hyperparameters used.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

While the clinical problem targeted by the authors is good, the organization and subsequent write-up of the manuscript is confusing. Multiple flaws were found during reading as follows:

Introduction: Authors directly start with aim of the study without providing any rational/prevalence/statistics/problem statement etc. Recent studies similar to the proposed tsudy using DRR were not included in the introduction (https://openaccess.thecvf.com/content_CVPR_2019/papers/Liao_Multiview_2D3D_Rigid_Registration_via_a_Point-Of-Interest_Network_for_Tracking_CVPR_2019_paper.pdf).

Method: Landmark details and detection method details are not provided in section 2.2. How the landmarks were placed? What was the placement accuracy/consistency? Section 2.3 about 2D-3D registration is not clearly explained. Multiple terminologies in the equation 1 were unexplained.

Experiment and Results: It is not clear how did you perform a 3-fold cross validation with 35 CTs? Each group of cross validation should have equal number of datasets. Also, its is not clear how did you manage to get landmark cost when you have 17 landmarks on CT and 12 on X-ray. How the DRR was reconstructed? what were the parameters used? No comment about the error that might get introduced due to digital synchronization of x-ray views? Errors related to your landmark detection in CT and X-ray were high (not sub-millimeter). So, how did you assume them to be clinically acceptable? Figure 6 is completely non-readable due to its smaller size (even after zooming in my screen). 60,000 function evaluations for registration optimization does not sound normal to me? Have you checked your results for overfitting?

borderline reject (5)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Poor organization and poor clarity with no focus on the novelty of the study.

• What is the ranking of this paper in your review stack?

4

• Number of papers in your stack

5

• Reviewer confidence

Very confident

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The majority of the reviewers provided a positive review for this work. However, there are some important questions raised as well. Therefore the authors should be given a chance to address the major points before this work can be accepted. Specifically, the below points should be addressed.

Please explain how the how he landmarks were placed, and how you assessed the accuracy of this process.

Rev 4 mentions landmark detection errors were very high (not sub-millimeter). How does this relate to the targeted clinical application error rates? In other words, acceptable error limits for the clinical problem should be provided and discussed in the context of the obtained error metrics.

Please explain how the landmark cost function was obtained if you have 17 landmarks on CT but only 12 on X-ray.

If the authors are using prior developed work please either include sufficient details in the text explaining each parameter or mention if you are using the same values described in the literature and use a reference. For example how was the DRR reconstructed using prior work or have you developed your own method. If prior work is used please reference this and explain this in text/.

Minor More information about the CT data should be provided (in slice resolution). Mention ethics was obtained in the text. All the terminologies for the equations should be explained. Figures should be readable (font is too small)

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

# Author Feedback

We appreciate all the reviewers for their constructive feedback. We first address the three main comments summarized by the meta-reviewer, mainly raised by Reviewer #4, then answer to the comments by Reviewers #2 and #3.

(1) Landmark detection errors were very high (not sub-millimeter). The landmark placement process and clinical relevance of the detection error are not clarified. (2) Please explain how the landmark cost function was obtained if you have 17 landmarks on CT but only 12 on X-ray (3) How was the DRR reconstructed using prior work?

Answers to each point: (1) We described the accuracy evaluation of our automated landmark detection in section 3.2, which showed a 3-4 mm error in 3D (CT) and 2-3 mm error in 2D (x-ray video). Training data set of both landmarks were placed manually by an expert surgeon, and another surgeon assessed the placement independently. The state-of-the-art method, which we cited as [17], reported the best-performed landmark detection accuracy in the spine CT data set as 2.9 +- 4.4 mm. Although the accuracy heavily depends on the characteristics of the data set, such as the voxel resolution and local appearance, we believe our results are comparable to the state-of-the-art. Identifying the landmarks is an intermediate step in the proposed pipeline. The landmarks help to improve the cost function landscape, resulting in robust optimization. We experimentally demonstrated that the registration results were insensitive to the landmark detection error (see Table 1), suggesting that the error was in an acceptable range in our 2D-3D registration application. In order to clarify these points, we will add the following sentences in the final paper. (in section 3.2) The landmark detection errors in CT and x-ray video were comparable to those reported in [17], where the authors applied their state-of-the-art method in the spine CT data set. (in section 3.4) The insensitivity of the registration results to the landmark detection error suggests that the error was in an acceptable range in our 2D-3D registration application.

(2) We apologize for the confusion caused by a lack of our explanation. The total number of landmarks we used was 17 as identified in 3D. We acquired two 2D views simultaneously. Since we could not find a sufficient number of 3D landmarks visible in both views, 5 landmarks were used only in one x-ray view, the other 5 were used only in the other view, and the remaining 7 were used in both views. Thus, (5+7)=12 landmarks were used in each 2D view, which amounts to 17 in 3D. We will add this explanation in section 3.1.

(3) Our DRR reconstruction and optimization methods followed the prior work [12]. We mistakenly cited [12] only in explaining the optimization part in the initial submission. We will add the citation appropriately.

[Other points raised by Reviewers #2 and #3] Q1: Was any motion simulated on the phantoms? What motion was acquired for the patient data? A1: Our target motion is the gait. Specifically, we acquired the phase from heel contact to toe-off. We moved the phantom bones by hands to simulate the gait. We will add this explanation in sections 3.1 and 3.3.

Q2: No baseline method is provided to evaluate the contribution of the landmark cost term. A2: One of the main contributions of this work is automation to eliminate manual initialization on each video frame, while all existing methods require manual initialization, whose cost is hard to evaluate quantitatively, making a fair comparison difficult. We will add this discussion in section 4.

Q3: Underwhelming results for distal tarsal and metatarsal bones A3: Our current landmarks are placed only on the surface of the proximal tarsal bones, as shown in Fig. 1. The distal tarsal and metatarsal bones are associated with landmarks placed on the bones close to them, which is one cause of accuracy degradation. Adding landmarks on each bone is one of our ongoing work. We will add this discussion in section 4.

# Post-rebuttal Meta-Reviews

## Meta-review # 1 (Primary)

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal addresses most of the major concerns especially the landmark localization error results. The authors are encouraged to update the final camera-ready version by including the explanations provided in the rebuttal and expanding the discussion section as promised.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

## Meta-review #2

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This work introduces automated 2D 3D registration of the foot bones for gait analysis. It is a well designed pipeline, which uses appropriately placed state of the art techniques in its different stages. Reviewers agree that the work has merit despite it combining existing work and I am of the same opinion. While there is room for improvement in terms of proofreading the final manuscript, I do not agree with reviewer 4 that the quality of the writing and the structure of the paper are problematic. The authors have clarified a number of issues (e.g. confusion on the number of landmarks used for registration, which work was used for DRR and the real world effect of registration errors) in the rebuttal, which will make the final manuscript very well to understand after revision. I encourage the authors to incorporate all mentioned clarifications for revising the work in case of acceptance.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

## Meta-review #3

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors addressed the main points of criticism listed by reviewers. My main concern is the lack of comparison with other methods. Namely, the authors presented interesting results using the 2D/3D registration method [15] proposed in 2006 and the similarity measure [16] proposed in 1998. Since the main contribution of the manuscript does not lie in a specific technical contribution but in the whole pipeline, I think that a comparison with some newer methods would give a better perspective of the proposed approach. I don’t consider the requirement for manual initialization to be a strong argument against having a fair comparison, because the question is always how demanding manual initialization is.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8