Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Srikrishna Jaganathan, Jian Wang, Anja Borsdorf, Karthik Shetty, Andreas Maier

Abstract

Deep Learning-based 2D/3D registration methods are highly robust but often lack the necessary registration accuracy for clinical application. A refinement step using the classical optimization-based 2D/3D registration method applied in combination with Deep Learning-based techniques can provide the required accuracy. However, it also increases the runtime. In this work, we propose a novel Deep Learning driven 2D/3D registration framework that can be used end-to-end for iterative registration tasks without relying on any further refinement step. We accomplish this by learning the update step of the 2D/3D registration framework using Point-to-Plane Correspondences. The update step is learned using iterative residual refinement-based optical flow estimation, in combination with the Point-to-Plane correspondence solver embedded as a known operator. Our proposed method achieves an average runtime of around 8s, a mean re-projection distance error of 0.60 +/- 0.40 mm with a success ratio of 97 percent and a capture range of 60 mm. The combination of high registration accuracy, high robustness, and fast runtime makes our solution ideal for clinical applications.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_37

SharedIt: https://rdcu.be/cyhQz

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper proposes a 2D/3D registration method that uses prediction by deep neural networks iteratively to improve robustness and accuracy.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Novelty of this work is an integration of multiple previously proposed networks (i.e., RAFT, PointNet++) into a well-known 2D/3D registration approach called Point-to-Plane correspondence (PPC) model. The CNNs seem to nicely fit the conventional PPC framework to complement the non-robust pieces in previous methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Evaluation is weak. As the authors briefly noted at the end of section 4, the proposed method heavily relies on the training data set, however, the method was evaluated only on a small number of CBCT data set (55 patients) of a specific target anatomy (vertebrae). The protocol of x-ray projection imaging (e.g., kVp, mA, filter material, post-processing, etc.) generally varies significantly between facilities, operators, and patients. The experiment using only a single protocol and single anatomy (i.e., training and test images are pretty similar) may not provide a fair evaluation. Evaluations with images in a slightly different domain (e.g., image from different facility, manufacturer, scan protocol, etc.) are highly preferable.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This reviewer thinks there is no problem with the checklist. In terms of reproducibility, evaluation with a publicly available data set is preferable.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The proposed idea of integrating CNNs into PPC model seems to be interesting, but the requirement for a large number of annotated training data set for each domain (each anatomy, each view direction, each imaging protocol) is prohibitive and may be difficult to apply to a clinical routine. This reviewer hopes the author to add experiments evaluating this aspect using, for example, application to images from a new hospital or of a new anatomy (not existing in the training data set).

  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Clinical applicability is questionable. The improvement shown in Table 1 may heavily depend on the training data set. Conventional optimization-based methods, not requiring training data set, would be more realistic in a real clinical setup, where preparation of large number of annotated data set for each image domain is prohibitive.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    3

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    This paper proposes a deep learning-based iterative 2D/3D registration framework. It learns the update step using iterative residual refinement-based optical flow estimation, combined with Point-to-Plane Correspondences. The proposed method was tested on real CBCT X-rays and was superior in accuracy, robustness and speed among comparison methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) This paper applies optical flow estimation based RAFT architecture for point correspondence estimation and PointNet++ architecture for correspondence weighing and build up an end-to-end iterative registration framework.

    2) It compares with the ablation PPC-based architectures on real CBCT X-ray scans.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The contribution of this paper is mostly an incremental change of architecture from Ref[19] by replacing the correspondence estimation block with RAFT and correspondence estimation with PointNet++.

    2) The training/testing X-ray images are clean CBCT acquisitions of pelvis scans. It does not consider the challenging 2D/3D registration tasks when there is occlusion in the scene (like surgical tools). Please see detailed comments below.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It does not claim any code release/publicity of the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • In the introduction review of DL-based registration methods, the authors may want to consider involve the listed researches which are end-to-end learning-based pipelines:

    [1] Gao, C., Liu, X., Gu, W., Killeen, B., Armand, M., Taylor, R., & Unberath, M. (2020, October). Generalizing Spatial Transformers to Projective Geometry with Applications to 2D/3D Registration. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 329-339). Springer, Cham. [2] Gu, W., Gao, C., Grupp, R., Fotouhi, J., & Unberath, M. (2020, October). Extended Capture Range of Rigid 2D/3D Registration by Estimating Riemannian Pose Gradients. In International Workshop on Machine Learning in Medical Imaging (pp. 281-291). Springer, Cham.

    • One of the biggest challenges in 2D/3D registration is the occlusions of surgical tools or the change of the structure from the CT scan such as bone fractures. For example, in its Ref[10], the images contain various surgical tools and have severe scattering/blurring conditions. The training and testing data of this paper is limited to a clean background pelvis scan. It is unclear how robust the method will be when there is a deteriotion of the intra-operative X-ray images. The authors may want to comment on it.

    • The authors claim the advantages of the proposed method is both speed and accuracy. However, the traditional optimization-based methods are not included into comparison, such as CMAES + GradNCC. I would be interested to see how the performance is compared to one of the traditional methods.

    • From the Results table, the performance of the proposed method is very close in the aspects to mRPD and SR, however the capture range is substantially increased. I am interested to learn why the proposed method is superior in capture range (from 20-25mm to 55-60mm) and in what conditions (like view angles or object shapes).

    • The proposed method is anatomy-specific (in this case is pelvis). If it is for a different anatomy, it will likely need a re-train of the model. How long is the training time? How sensitive is the model performance with respect to the hyperparameters in equation (3). the authors may consider discussing these since it will affect the effectiveness of translation to clinical application.

    In Section 4, the authors acknowledge one of the limitations of the proposed method is that it requires a heavy annotation of the real data. I am interested to understand why the authors did not use easy to generate DRR simulation images for training.

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend borderline reject, because it is heavily developed based upon the related work in literature (Ref[19]). The results are encouraging but still unclear of the performance on more sophisticated clinical data.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    The authors propose a 2D/3D registration method between 2D X-ray fluoroscopy and 3D pre-operative volume. The method use the point-to-plane correspondence directly integrated in a neural network. The neural network is then by itself a full registration update step. It enables an end-to-end learning with no combination with other method to refine the results. Such method has been already described in a previous paper but the architecture is changed to obtain a better accuracy, and better robustness to initial position of the registration.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The major strength is to propose a method which combine high robustness (training), good accuracy (iterative step) and fast computation (neural network).

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Following the results, it is obvious that the proposed method is better than the method in [5] (which seems kind of preliminary work of this paper). But this is not really clear why it is the case and what are the significant changes between the two methods. The differences I see between [5] and the methods here, are the computation of optical flow estimation (iterative or not) and pointnet vs pointnet++. It could be nice to clarify this part to explain better what are the problems in [5] and how it is solved in the paper.
    • One weakness would be the data that is limited on CBCT volume reconstructed with the 2d x-ray images used as well as the images to aligned. Thus they can have automatic ground truth. But in reality, motion/deformation of the body and breathing is hampering perfect rigid registration. So all of these factors are not evaluated. But I can understand that ground truth here is difficult (impossible) to get so evaluation is problematic.
    • I find the title a bit misleading: deep iterative w.r.t what? If understand well, at the neural network level, the iterations are fully independent (except the new position as input). And, during training phase the iterations are not related at all?
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • Most of the (hyper)parameters and training/evaluation process are well described.
    • How many N_FL iterations do you used for RAFT network during training/testing. If I understand well, N_FL iterations are done in RAFT per one back-propagation update.
    • Pose dependent apparent contour points w; with the corresponding gradients g are selected from the surface points [22] -> maybe precise here if the parameters used to do this are the same as [22]
    • No mention to make the code available
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Page 2

    • “Learning such an update step prediction, proposed in [5], showed significant improvement (around two times) for single-step update prediction” -> Can you precise the type of improvement. What does single-step update mean?
    • “However, the iterative application using the learned update step to the actual 2D/3D registration problem was lacking” -> I don’t understand, the 2D/3D registration problem is already approached in [5]?
    • “but make significant architectural changes to incorporate another domain-prior, which approximately models the iterative nature of the problem” -> Can you precise domain-prior? What is the iterative nature of the problem? It is not clear to me what is missing in [5].

    Page 3

    • Figure 1: It should be possible to understand the figure with only the caption. It is not the case. Can you please add more information in the caption/figure (e.g. what is RAFT? where are phi_f (RAFT), phi_w (ptnet++), L_reg, L_flow?) You forgot a comma before ‘W’ in your DNN phi formulation. I don’t understand why {w,g} is going to f (output of the optical flow estimation). Are they used to compute to flow dp and the p’? It could be nice to have an equation in 2.2 DL-Based Update Step Prediction section.
    • “2D correspondences search is performed” -> Can you detail how it is performed? (optical flow estimation?)
    • “vector b are computed from the N_cp point correspondences (p,p’)” -> How was it computed? Does N = N_cp?
    • “The weight matrix W is a diagonal matrix providing individual weights for each estimated correspondence” -> How each individual weights are estimated?

    Page 4

    • “however we have significant changes in the network architecture to model the iterative nature of the problem more precisely” -> Same as the introduction, can you describe the problem not solved in [5] in order to understand the proposed improvement. Also, I don’t see significant changes in the network architecture from [5]. Maybe I miss something?
    • “PPC solver layer Kppc which has no learnable weights” -> I guess the layer is differentiable?
    • “We use a version of the RAFT architecture with shared weights between the updates” -> I’m not sure I understand? Isn’t the goal of the recurrent component (update operator) to loop in between the same module? It is possible to have a RAFT architecture without shared weights?
    • “Correspondence Weighing” -> Here you are computing weights for the weighted variant of PPC but it is not clear to me why you want to do this and how these weights are computed and based on which metric? What is giving more weight to a specific point p’?
    • “which takes in as input feature vector fφw = {w; g; n; p’} \in R^{Ncp×10}” -> I don’t understand how you come with 10 dimensions. What is n? Is this the plane normal? so 3 dimensions? I would think that w is 3d, g is 3d, n is 3d and p’ is 2d -> 11 dimensions?

    Page 5

    • Equation 3 -> You should replace registration loss by L_reg. What is the difference between the norms   x   _2 and   x   ^2 ?

    Page 8

    • “Our proposed technique performs significantly better, compared to [5] which fails for the iterative registration task using a similar learned update step. This shows modeling the iterative nature of the problem is essential for such a learned update step” -> So the difference is coming from the optical flow which is not iterative in [5]?
    • “Future research direction can be extending the proposed method to fully automatic registration” -> What do you mean by fully automatic registration? Is there any manual part in your proposed method?
    • “One challenge for using the proposed method, is that, it requires a large number of annotated training data.” -> You didn’t use any annotated training data here, right? As the fluoroscopic images comes from the images used to reconstructed the CBCT. If yes, maybe you should clarify it in the experiment section.
  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The integration of point-to-plane correspondence method directly in the neural network in order to have an end-to-end learning update step is interesting.
    • The following results enable better robustness and faster computation than the state-of-the-art.
  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The work proposes an approach for 2D/3D registration, specifically, between radiographs and CBCT images. Reviewer concerns should be addressed in the rebuttal. In particular, there are several main concerns: 1) The writing of the paper is not easy to follow. In particular, it is not entirely clear what the contribution of the work is in relation to references [5], [19] and [22] and what drives the performance improvements. E.g., Sec. 2 states “The proposed DL-based solution acts as a drop-in replacement for the update step prediction in the original framework.” Does this mean the framework is the same and the search-based correspondence is simply replaced by the existing RAFT optical flow approach? 2) What are the significant changes that were made in comparison to the existing approaches? (See comments of R2 regarding similarities / differences wrt. [5]). 3) The approach appears to be tested on clean images only on a moderately sized dataset. How robust is the method expected to be in the presence of surgical tools, for different acquisitions, or under non-rigid deformations? 4) In what sense is the method not fully-automatic?

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    6




Author Feedback

Thanks to all the reviewers (R) and the meta-reviewer (MR) for their valuable and generally positive comments to improve our work. As stated by R3, our “major strength is to propose a method which combine high robustness (training), good accuracy (iterative step) and fast computation (neural network).” Compared to the previously proposed state-of-the-art 2D/3D registration techniques [17,19,21], our performance is significantly better in terms of robustness (50% higher capture range-CR) and runtime (twice faster) without sacrificing registration accuracy.

R2, R3, MR have pointed out the differences are not so clear between our work compared to [5,19]. In [19], 2D correspondence search is learned in isolation using ground truth flow annotations. It also requires a refinement step using patch-matching-based correspondence search (ref Sec 3.3) which is computationally expensive (ref Tab.1). Here we use the registration loss and back-propagate through the PPC solver (embedded as a known operator) to directly learn correspondence relevant to the registration task. [5] was the pioneering work in this direction, however, it is clear from Tab. 1 that it doesn’t work when the network is applied in iterative steps. The choice of modeling optical flow estimation using iterative refinement is crucial (ref Sec 1), as this allows the network to learn intermediate flow steps, thus allowing the network to perform well for both large and small displacements. A simple optical flow architecture fails as it was used in [5]

According to R1, R2, MR, the method needs to be robust to region imaged (anatomy visible in image) and occlusions. This is already the case since our dataset consists of data from both the thoracic and lumbar regions (ref Sec. 3.1) where the image content varies significantly. Occlusions in the form of surgical tools are present in some of the X-ray views used for reconstruction (Fig. 2-S1: tool present, Fig. 2-S2: no tool) and not in the volume (Fig. 2). Exemplar images from our dataset shown in Fig.1 and Fig.2, illustrates both the presence of surgical tool and variation in the body region imaged. Results indicate that our method is robust despite the test set consisting of all the variations.

R2 - “data of this paper is limited to a clean background pelvis scan.” seems to be a misunderstanding, as we use clinical vertebra dataset (ref Sec. 3.1) with no mention of pelvis scans.

R1 - concerns of obtaining annotated training data and having variations in dataset: Ground truth annotations are obtained automatically since we use reconstructed CBCT volumes (also pointed by R3). Our dataset already includes a collection of data from 3 different hospitals, and we have 55 patients which gives adequate variation.

R1 - “Conventional optimization-based methods, not requiring training data set, would be more realistic in a real clinical setup”: The requirement of training data is inherent for any DL-based technique. Tailored procedure-specific DL models can significantly outperform conventional methods when such training data is available, as we demonstrate for an example procedure (spine intervention). We think this would be clinically relevant as many medical imaging algorithms target specific procedures and are not general.

R2 - “traditional optimization-based methods are not included into comparison”: [21] compares against many traditional methods and achieves the best performance. We are significantly better than [21] (ref Tab.1)

R3, MR – non-rigid deformation: Our current work is targeted towards rigid registration which has important clinical applications as evident from [10,17,19,21]. MR, R3 - “not fully-automatic?”: Our method is automatic if the initial registration error is within the CR (here 60 mm). To obtain a fully automatic registration, an infinite CR is required.

We would like to especially thank R3 for many minor suggestions to improve the paper. We will directly address it in the revised version of the paper.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work proposes a nice and practically relevant end-to-end deep learning-based method for 2D/3D registration. In particular it includes establishing point-to-point correspondences and an optical flow estimation based on iterative refinement. This allows to accomplish the entire registration for small and large displacements using the deep network approach (i.e., no subsequent refinement is required). There were some concerns raised by reviewers, but they have all been addressed by the authors in the rebuttal. As it wasn’t as clear as it could have been initially how the proposed approach is situated with respect to previously proposed approaches it would be highly useful to address these similarities / differences in a more easily accessible way for a final version.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal has addressed all issues raised by the reviewers and the AC. The paper can be safely accepted.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Strengths: interesting idea: combing dense projection alignment with point cloud weighting that can be considered novel. Very robust results with clinical impact. Negative: one quite critical reviewer requested more discussion of own (reviewer’s) related work. I would consider such criticism slightly biased and found the authors response convincing. I would vote for a weak accept.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    8



back to top