Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Fengze Liu, Ke Yan, Adam P. Harrison, Dazhou Guo, Le Lu, Alan L. Yuille, Lingyun Huang, Guotong Xie, Jing Xiao, Xianghua Ye, Dakai Jin

Abstract

In this work, we introduce a fast and accurate method for unsupervised 3D medical image registration. This work is built on top of a recent algorithm self-supervised anatomical embedding (SAM), which is capable of computing dense anatomical / semantic correspondences between two images at the pixel level. Our method is named SAM-enhanced registration (SAME), which breaks down image registration into three steps: affine transformation, coarse deformation, and deep deformable registration. Using SAM embeddings, we enhance these steps by finding more coherent correspondences, and providing features and a loss function with better semantic guidance. We collect a multi-phase chest computed tomography dataset with 35 annotated organs for each patient and conduct inter-subject registration for quantitative evaluation. Results show that SAME outperforms widely-used traditional registration techniques (Elastix FFD, ANTs SyN) and learning based VoxelMorph method by at least 4.7% and 2.7% in Dice scores for two separate tasks of within-contrast-phase and across-contrast-phase registration, respectively. SAME achieves the comparable performance to the best traditional registration method, DEEDS (from our evaluation), while being orders of magnitude faster (from 45 seconds to 1.2 seconds).

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_9

SharedIt: https://rdcu.be/cyhPQ

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes self-supervised anatomical embedding-enhanced (SAME), a three-stage 3D medical image registration using the self-supervised anatomical embedding (SAM). SAME leverage the sparse anatomical correspondences from SAM to perform affine, coarse and deformable registration. The method is evaluated on a private chest CT dataset, and the results are compared to 5 classical methods and 1 deep learning-based method. The registration performance is quantified using Dice scores of 35 organs’ organs.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of using SAM for image registration is novel and promising: Compared to existing deep learning-based methods, using the sparse anatomical correspondences for image registration shows a new perspective for image registration with deep learning. It is particularly useful when the anatomical correspondences are sparse and non-linear in the input scans.

    A significant improvement over various image registration methods. (in terms of Dice)

    Sufficient ablation study is provided: The ablation study in Table 1 and 2 adequately validated the claimed advantage of the proposed method.

    The formulation of the method is technically sound and promising.

    The paper is well-written and easy to follow.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Lack of evaluation on the smoothness of the resulting deformation field: In deformable registration, there is a common trade-off between registration accuracy and smoothness of the deformation field. The author didn’t evaluate the smoothness/invertibility of the deformation field, which may cause biased comparison.

    Poor reproducibility: The proposed method evaluated with a private CT dataset. The authors will not publish the source code following the acceptance of the paper. Moreover, the proposed method builds on top of the self-supervised anatomical embedding (SAM) method, which is a relatively new learning framework, and there is no official implementation of SAM published.

    The proposed method only evaluated with Dice score, which may not be sufficient to represent the registration accuracy of lung, i.e., alignment of vascular structure.

    The hyperparameter choices and settings for baseline methods (ANTs, FFD, DEEDS) are missing.

    Lack of description of the memory footprint: The correlation feature in FlowNet has heavy memory consumption during training. Reporting the GPU memory consumption during training helps determine the clinal value of the proposed method.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Poor reproducibility: The proposed method evaluated with a private CT dataset. The authors will not publish the source code following the acceptance of the paper. Moreover, the proposed method builds on top of the self-supervised anatomical embedding (SAM) method, which is a relatively new learning framework, and there is no official implementation of SAM published.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Lack evaluation on the smoothness of the resulting deformation field: Including the deformation field analysis for all the methods (baseline methods and the proposed method) would help conduct a fair comparison, especially for non-linear registration. The quality/smoothness of the deformation field can be quantified using Jacobian determinant.

    Only one evaluation metric: Measuring the results with anatomical landmarks can help to evaluate the image registration performance for vascular structure.

    As the performance of the classical image registration methods highly related to its hyperparameter choices, it is necessary to report the hyperparameter choices and settings for each compared method.

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, the paper is well-written and easy to follow. The technical details of the proposed method are well-explained and adequately evaluated. The results show immense potential for various clinical applications. The major weaknesses of the method are the lack of deformation field analysis and poor reproducibility.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    This paper proposes an unsupervised registration approach called SAME. It is built on an existing framework namely the self-supervised anatomical embedding network (SAM).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1 This paper is easy to understand and well-structured 2 Good literature review

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1 Limited novelty I think the paper does not introduce much novelty as it is (more or less) a straightforward application of [16].

    2 incremental performance improvements The proposed method is said to outperform VM (and achieve similar accuracy as DEEDS). There are many newly proposed deep learning registration approaches that can outperform VM quite a lot.
    Since the theoretical and methodological contribution of this paper is very limited, it is expected for the proposed method to have significant performance boost. However, I’m not convinced by the current experimental results.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper provides sufficient information about the reproducibility, but detailed descritption of some parameter settings, such as how baseline methods were implemented and tuned, are somewhat missing.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    From a reviewer’s point of view, this paper is a solid work. However, people may expect more novelty/originality from a MICCAI submission.

  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    1 This is a “safe” paper which conducts extensive experiments to validate something that has already been proven useful. 2 Limited novelty 3 Incremental improvements on performance

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    This paper describes a new unsupervised registration algorithm for 3D CT images called SAME. It is heavily based on self-supervised anatomical embeddings (SAM), a recently proposed algorithm to find anatomical correspondences of landmarks in images. SAME utilizes SAM in a novel way for dense deformable image registration. In the initialization step, SAM-correspondences are used to compute affine and coarse deformations without any additional training. In the fine-tuning step, VoxelMorph is utilized together with a novel loss function based on SAM features to learn a dense deformation field. The method is evaluated on a CT dataset with mono- and multi-modal (contrast enhanced) images. SAME outperforms or shows comparable performance to multiple traditional and learning-based registration frameworks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The novel use of SAM correspondences in a multi-step registration framework. The correspondences are not only used for initialization (affine and coarse) without additional training, but also incorporated in the learning of a dense displacement field via a new loss function.

    Both mono- and multi-modal CT data are used for evaluation

    Well written and structured with well designed figures.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Only one dataset with limited number of images for evaluation.

    Comparison to supervised learning as upper bound is missing. How good are the Dice percentages of 50%?

    Some results could be added: standard deviation in Table 1 and a statistical analysis. Are the differences between the methods significant?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors based their code on VoxelMorph, whose code is freely available. So far, the code of this work is not available, but the used framework is reported, and the hardware used for training. The dataset seems to be in-house data but is described adequately. The hyperparameter selection is not transparent (“empirically selected”). This includes both the parameter selection for the proposed model, and for the methods used for comparison.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    I enjoyed reading this paper. It is well written, and the methodology is clearly described. The proposed method uses the SAM algorithm (for finding correspondences between images in a self-supervised manner) in a novel way. It is heavily built upon SAM, but it expands the use of SAM to computing dense deformation fields. Unsupervised learning-based image registration is very challenging, and this method proposes an elegant solution.

    One weakness of the work is that the method is only evaluated on one dataset with limited number of images. It is unclear how clinical useful alignments with a Dice score of 50% are. In Fig. 2, the results are further grouped by organ system, and there seems to be a big difference in performance between the organs. This should be discussed. Maybe this difference is the reason that standard deviations are missing in Table 1. I’m also missing a statistical analysis on the results to see how significant the differences are. From Figure 2 it appears that the ranking of the methods differs between organ groups.

    For future work, I would recommend including other datasets of different modalities. A comparison to other learning-based methods, such as a supervised approach as an upper bound or a Landmark-based method, are important. The authors should perform some kind of cross-validation. It could also be interesting to try a multitask approach, where SAM and SAME are trained together.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I recommend acceptance of this paper. Unsupervised dense image registration is challenging, and this work proposes an interesting and novel solution to the problem. The previously proposed SAM algorithm finds correspondences at pixel-level and this work extends its use to dense deformation fields. Costly manual annotations are not required.

    The evaluation has still some weaknesses but I’m aware of the limited space.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This work builds on the existing SAM approach to create feature vectors and uses these features as a similarity measure (both directly via a defined SAM loss and via correlation) in a voxelmorph-style deep registration network. Hence, the innovation of this approach is moderate. The approach is tested on a medium-sized in-house CT chest dataset (n=90). The experiments demonstrate that the approach can outperform the direct voxelmorph implementation and that both the SAM loss and its correlation variant are beneficial for registration performance. The reviewers raised various concerns which should be addressed in a rebuttal. In particular, the rebuttal should address the following concerns in detail: 1) It is unclear how hyperparameters, similarity measures, and regularizers were chosen for the competing methods (ANTs, FFD, Deeds); was a search for optimal parameters conducted? Did all of them use the same similarity measure? Did all of them use the same initial affine registration? What implementation for FFD was used? NiftyReg? 2) Standard deviations for the Dice scores should be provided and methods should be assessed for statistically significant differences. How variable are these results and the observed differences statistically significant, especially, in light of the relatively low Dice scores. Are such Dice scores meaningful for the targeted applications? 3) No measures of transformation smoothness were provided. Does the approach result in smooth transformations, in particular, as the used approach predicts displacement fields rather than using more sophisticated models (e.g., stationary velocity fields) which could model larger displacements? This is especially important as the results in Table 1 appear to suggest that one of the primary drivers of performance is chaining together an affine, a coarse, and a deformable (e.g., plain VoxelMorph) deformation. 4) Many more advanced models have been proposed since the original VoxelMorph (VM) work the proposed method builds upon. What motivates using VM as the only deep learning baseline comparison?

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4




Author Feedback

We thank reviewers and AC for their constructive and encouraging comments such as “technically sound and promising”, “sufficient evaluation” and “well-written”.

Q1: Novelty (R2). A1: Integrating SAM [16] concepts to produce a top-performing deformable registration is not trivial. We break down the process into 3 steps: affine transformation, coarse deformation, and deep deformable registration. We leverage SAM embeddings to construct more spatially coherent correspondences and we develop feature representations and a new loss function with improved semantic guidance. As R1 and R3 state: “the idea of using SAM for image registration is novel and promising … shows a new perspective for registration” and “… this method proposes an elegant solution”.

Q2: Hyperparameters & implementation details of FFD, SyN, DEEDS (R1). A2: FFD was implemented using Elastix [10]. Parameters matched those of the “F” method in a pulmonary CT registration challenge [EMPIRE10 Challenge], which was the best performing FFD method in the unseen test Phase-2. The only modification was an extra bending energy term with weight 0.01 to regularize the smoothness. For SyN (implemented in ANTS) and DEEDS (implemented by the original author), parameters were set according to those used in [15] (the largest evaluation work for inter-patient registration in abdominal CT). For affine transform, we used the default implementation in each package. We will include the details in the revised version.

Q3: Standard deviation (std), statistical test and relative low DSC (R3). A3: We have conducted paired t-tests and SAME significantly outperforms all other methods (p<0.0001), except for DEEDS in the CE-to-NC setting. SAM-VM also significantly outperforms VM (p=2e-7). We will include this in Table 1. The DSC box plots for each organ can already be observed in Fig. 2, where the inter-quartile range/min/max and outliers are presented. This is much more informative than std. As for Table 1, we do not report std because it summarizes results across all 35 organs of different sizes and contrasts with wildly different DSCs, which makes the interpretation of std non-meaningful. Our reported DSCs compare well to prior work, e.g., [6] (MICCAI 2020) reports a mean DSC of ~45% for 13 abdominal organs. Moreover, we used single atlas evaluation and performance can be substantially improved using multiple atlases and label fusion.

Q4: Smoothness of deformation field (R1). A4: We have computed the std of Jacobian determinant (std_jac) – DEEDS: 0.40; FFD: 0.51; SA+VM: 0.38; SA+SC: 0.40; SA+SAM-VM: 0.36; SA+SC+SAM-VM (SAME): 0.66. The higher std_jac of SAME is due to the cascading of two deformable steps, i.e., SC and SAM-VM, where SC directly matches two sets of coordinates without any constraint. Without SC, std_jac is 0.36. We will discuss more sensible constraints to SC as future work. Even so, if SC is not included, SA+SAM-VM still achieves very high performance (0.52 Dice/0.36 std_jac), which is comparable to DEEDS (0.527/0.40) and better than FFD (0.494/0.51) and SA+VM (0.484/0.38). SA+SAM-VM can be adopted when smoother spatial warping is preferred. SAME and SA+SAM-VM are our main results.

Q5: Using VoxelMorph (VM) as a baseline (R2). A5: We compare against VM to demonstrate relative improvements and because VM offers a well-understood and effective baseline that does not require a lot of engineering “tricks”. In terms of absolute performance, we compare against DEEDS, which 1) has been comprehensively shown to outperform by a large margin other “classic” approaches in abdominal CT [15] and 2) more recently matches two complex and sophisticated deep registration approaches [6,11] (MICCAI 2020, ECCV 2020). These are strong and comprehensive results.

Q6: Other evaluation metrics (R1). A6: The average surface distance has been further computed. Similar trend is observed, e.g., FFD 4.6mm, SA+VM 4.1mm, DEEDS 4.0mm, SA+SAM-VM 3.9mm, SAME 3.8mm. We will add this to Table 1.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This work proposes a deep registration approach making use of SAM features. Results are strong. There were some concerns raised in the review regarding hyperparameters, comparisons to other methods, novelty, smoothness, and statistical significance. Those have been largely addressed by the rebuttal. Some minor concerns remain regarding the driver of the performance, in particular, if multi-step approaches to deep registration would have been able to achieve similar performance levels. Further, while the standard deviation measures of the determinant of the Jacobian are one way of assessing the regularity of the transformation it might also be sensible to report the numbers of folds in a final version.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper excels in combining recent methods (pre-training using semantic embedding) within U-Net based registration and providing significant improvements. The reviews have already been positive (the only negative one, couldn’t find anything wrong) and the concerns of the meta-reviewer have been adequately addressed. I would vote in favour of accepting this solid paper with good evaluation and useful contribution. In future work, a direct comparison of recent Learn2Reg challenge datasets would be important.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Novelty: this paper heavily relies on SAM [16] algorithm and breaks down into 3 steps (I omitted the description herein). This breakdown is a natural thinking or practice in my opinion. Although it is assumed no additional major results could be added, it is not understandable why “std of DSC is not meaningful, but std of Jacobian determinants is more interpretable”. I would prefer the former, and the later brings me more trouble to understand the results. Jacobian determinants vary with the amount of deformation and are not a good metric for evaluating smoothness. Basically, improvement of DSC may not mean something unless the smoothness distribution of the fields can be evaluated, which is more than a std of Jacobian determinants. In summary, the rebuttal clarified some concerns but may bring new open concerns with new results putting to the paper without review. Finally, the contribution of breakdown into 3 steps of SAM and calling it SAME is incremental.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    9



back to top