Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Junshen Xu, Eric Z. Chen, Xiao Chen, Terrence Chen, Shanhui Sun

Abstract

Image registration plays an important role in medical image analysis. Conventional optimization based methods provide an accurate estimation due to the iterative process at the cost of expensive computation. Deep learning methods such as learn-to-map are much faster but either iterative or coarse-to-fine approach is required to improve accuracy for handling large motions. In this work, we proposed to learn a registration optimizer via a multi-scale neural ODE model. The inference consists of iterative gradient updates similar to a conventional gradient descent optimizer but in a much faster way, because the neural ODE learns from the training data to adapt the gradient efficiently at each iteration. Furthermore, we proposed to learn a modal-independent similarity metric to address image appearance variations across different image contrasts. We performed evaluations through extensive experiments in the context of multi-contrast 3D MR images from both public and private data sources and demonstrate the superior performance of our proposed methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_21

SharedIt: https://rdcu.be/cyhP2

Link to the code repository

N/A

Link to the dataset(s)

https://www.med.upenn.edu/cbica/brats2020/data.html

Reviews

Review #1

Please describe the contribution of the paper

1) The authors present a new image registration technique based on continuous optimization dynamics via neural ODEs. 2) The method introduces a novel multi-scale architecture. 3) The method is a general learn-to-learn image registration framework and is not limited to specific transformations. 4) The method can handle multiple contrasts with a single trained network
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Eq 2 shows that this paper is implementing the flow of images LDDMM approach using neural networks. The LDDMM flow of images was first (or one of the first) proposed by Hart, Zach and Niethammer [1]. However, the author’s work differs from [1] in that the implementation is formulated in a neural network setting. Furthermore, the proposed method introduces a novel multi-scale neural ODE architecture. The proposed method uses a 3D feature extractor. The 3D content feature is composed of 2D content features generated from N randomly selected 2D images of different axes from the image utilizing a 2D feature extractor E^c. Furthermore, they use a feature domain disentanglement to perform image translation from one contrast to another.

[1] G. L. Hart, C. Zach and M. Niethammer, “An optimal control approach for deformable registration,” 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, 2009, pp. 9-16, doi: 10.1109/CVPRW.2009.5204344.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

None.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Uncertain.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

None.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors present a novel image registration algorithm that shows good promise. The algorithmic details seem solid.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper proposes a new method for learning-based 3D image registration. It has three main contributions: (1) employing neural ODEs instead of classical convolutional architectures; (2) proposing a multiscale neural ODE variant for multiresolution image registration; (3) learning a modality independend similarity measure.

The method is compared with a ANTs, a conventional registration framework, and VoxelMorph, a learning-based registration framework. Brats data is used for training and evaluation. The method is additionally evaluated on an in-house dataset to test for generalizability.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

• This work proposes several interesting contributions to learning-based image registration: • This is the first work employing neural ODEs for image registration. • The multiscale variant follows the general multiresolution strategies in image registration but/and can be trained end-to-end. • A learned modality independent similarity measure is used to overcome the challenges of the definiton multimodal image similarity.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

• The presentation of the methodology is not clear. Without being familiar with neural ODEs, it could be difficult to understand the advantages for image registration. The motivation for this should be emphasized more. Also, I found the description of the similarity measure confusing. • Related work is not sufficiently discussed, e.g., for learning image similarity and the application of neural ODEs in computer vision. • The evaluation is not complete. For example, the similarity measure is not evaluated, statistical tests for significance are missing. • Data for evaluation is limited. Only intra-patient registration from brain MR is considered. • Details about the training and architectures of the models are missing.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Important details on code framework, network architecture and hyperparameter selection is missing. The data used for training is publicly available. A private dataset is used for additional testing.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The proposed approach to 3D image registration is interesting and the use of neural ODEs in this field is novel. Also the idea of learning a modality independent similarity measure is valid. However, I have some concerns about the presentation of the work and the evaluation. The motivation for using neural ODEs is not clear enough. Without being familiar with the concept of neural ODEs, an understanding after reading the paper might be difficult. Also, I found the description of the similarity measure quite confusing. In general, related work is not sufficiently discussed. In particular for learning image similarity, but also for the application of neural ODEs in medical imaging, or more broadly in computer vision. Detailed information on the code framework, architectures (number of layers, hyperparameters, …), parameter selection is missing. The authors conducted an extensive set of registration experiments, including rigid, deformable (Bsplines, dense deformation fields) and hybrid. They compared their approach to a traditional framework (ANTs) and to other learning-based methods (VoxleMorph and RCN). But for all models the parameter selection remains unclear, which raises the question on how fair the comparison is. I was surprised, for example, that ANTs performed quite badly on a simple rigid registration. The evaluation of the similarity measure is insufficient. It remains unclear how well the image2mage translation works. The authors should visualize some results. It also remains unclear if the increased performance of the proposed model in Table 1 comes from the neural ODEs, the multiscale variant, or the learned similarity measure. The ablation study in Fig 4a) is crucial here, but it is not explained or interpreted accordingly. Again, the contributions of neural ODEs and smilarity measure are mixed. What does here „without self-supervised“ and „without multi-slice“ mean? Does it refer to the similarity measure? I strongly suggest to add ODENet models when training with other image similarity measures such as employed in VoxelMorph for a better comparison. VM+I2I and ANTs+I2I (comparion methods with image2image translation) perform only slightly better (or worse) than the multimodal versions. Why is that? Is it significant? For a better comparison, the registrations should be performed with the proposed similarity measure. How does MSODENet perform with the image2image translation? All experiments are carried out for intra-patient registrations. Is the similarity measure not applicable for inter-patient registrations?
For future work, I would recommend to expand the experiments for a more complete evaluation. In particular, it is important to separate the improvements obtained by the neural ODEs for optimization and obtained by the learned similarity measure. In addition, the similarity should be extended to intra-patient data. Experiments on other multi-modal data, such as MR/CT, MR/US would be very interesting.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I see merit in the proposed approach and would encourage the authors to extend the experimentation. But in the current form, the study is still incomplete.
What is the ranking of this paper in your review stack?

4
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

The paper proposes a multi-scale Neural ODE solution for image registration. The paper proposes a perception loss that robust across modality.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The idea of multi-scale Neural ODE is interesting. The proposed perception loss is intuitive. The ablation study on the proposed loss is intuitive.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The first work using Neural ODE on image registration needs to be double check. There were some workshop paper working on NODE for image registration if I recalled correctly but I maybe wrong. Some arguments are not correct, like the ones on [27] [10]. An intuitive point why those PDE approaches work is that they divide the registration into small steps which can handle large deformation better. It is the metric space that deformation model lies in provides the advantage over those elastic models.

On the other hand the conclusions of the paper is based on tumor image dataset Brats, so It makes sense that the perception loss can handle this abnormal data better than traditional MSE or NCC. It is not clear that the conclusion can generalized to the normal data. Actually, for this tumor dataset, simply apply registration model on that is not a straightforward solution. A better strategy may consider remove the tumor first. see [1].

[1]A Deep Network for Joint Registration and Reconstruction of Images with Pathologies
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The method is clear illustrated However, the authors doesn’t plan to provide the code so it would be hard to reproduce the results.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Neural ODE is a good tool for introducing advanced dynamic property during the image registration process. However, in terms of the performance , it typically doesn’t bring much advantage over existing PDE based approach like SVF or LDDMM, which have already proven powerful enough for handle large deformation and diffeomorphism. The NODE complexity and instability in most case is higher than these well-designed traditional PDE approaches.

On the other hand, the topic that explores new dynamic via learning NODE would be interesting.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper has novelty but not too much.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This work proposes a neural ODE framework to image registration. Specifically, the neural ODE models the dynamics of the registration parameters and these dynamics are chained together at different scales. While the introduction makes a connection to fluid-based image registration models (e.g., LDDMM) it is not clear that this connection in fact holds. In LDDMM velocity fields evolve over time and continuously deform the space. In contrast, in the proposed approach the registration parameters evolve and their value at the terminal time is then used to encode the transformation (in the presented work these are the parameters of a rigid transformation, of B-spline control points, or of a dense deformation field). No composition of transformations over time appears to occur. Hence, this connection does not appear to be correct. There are several other major concerns which should be addressed in the rebuttal (based mostly on the review comments of R2): 1) It is unclear if the improvement of the method over other methods comes from the different similarity measure or from the different transformation model, the latter being the advocated contribution of the paper. How well would the neural ODE approach compare to the competing methods with the same similarity measures, e.g.. MI or I2I? Fig. 4a appears to indicate that without the learned loss (i.e., using MI) the proposed method is outperformed by a standard VoxelMorph approach. Is the reported performance gain hence mostly due to the different similarity measure? 2) How hyperparameters were selected is unclear. Was some form of grid-search performed? 3) The results in Tab. 1 are not quite clear. E.g., why does it make sense to run models such as VM or ANTs without a prior rigid registration? Why does ANTs perform so poorly on the rigid registration task? And most strangely, why is ANTs using rigid and deformable transformations significantly worse than only using the non-rigid part of the model (e.g., 81.9 vs. 73.6). In fact, this performance drop appears to be even true (though not quite as significant) for the proposed model.4) It seems that all evaluation results are strictly for pairs of an image and a synthetically transformed version of it. Is this the case? A registration paper should also be tested on real registration pairs.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Author Feedback

We appreciate comments from all reviewers and the consensus about our innovation.

Relation to LDDMM (AC) Our method is not based on LDDMM. the PDE in LDDMM comes from fluid model while the ODE in our method comes from gradient descent with small step size. The PDE dynamics in LDDMM is pre-defined, while in neural ODE it is parameterized by the networks which are learned.

Contributors of our improvement (AC, R2) We trained VM with the learned loss under the deformable setting, whose results together with other methods in Tab.1 are listed here. method | Dice | RMSE(x) | RMSE(phi) VM |79.4(8.7)|8.81(3.03)|1.61(0.69) VM+I2I |80.1(7.8)|8.52(2.30)|1.26(0.31) MS-ODENet(D) |81.6(8.1)|6.63(2.22)|1.11(0.18) VM+learned loss|80.5(7.7)|7.75(2.16)|1.18(0.20) The learned loss (VM+learned loss) outperforms MI (VM) and I2I (VM+I2I). With the same loss, MS-ODENet achieves higher accuracy than VM+learned loss. Therefore, both the MS-ODE network and the learned metric contribute to the higher accuracy of our method over the others. In Fig. 4a, the numbers of our model ‘w/o learned loss’ cannot be compared directly to those of VM* in Tab.1, because 1) VM* loads a rigid ODE model trained with the learned loss to solve the rigid transform, while the ‘w/o learned loss’ model is a MS-ODE trained end-to-end with MI; 2) the ‘w/o learned loss’ model uses Bspline while VM* uses dense transform.

Details of training, hyper-parameters and network (AC, R2) For the GAN part, we followed DRIT’s network design, training approach and hyperparameter selection. For other hyperparameters, we performed grid searches on a validation set. For dense transform, we use UNet [26] backbone. For rigid transform, the network has 6 convolutions with kernel size 4 (# of channels: 16, 32, 64, 128, 128, 128) and two FC layers (# of channels: 128, 6). For B-spline, the network has 4 convolutions with kernel size 4 (# of channels: 32, 64 128, 128), two resblocks, and a convolution with 3 channels.

Results in Tab. 1 (AC) Three types of simulated transforms (rigid only, deformable only and rigid+deformable) were experimented in Tab. 1. 1) In the rigid-only, we simulated large motion. The rigid registration in ANTs is based on gradient descent which only uses local information, so it tends to fail under circumstances like large motion. 2) In the deformable, we need no rigid registration for VM or ANTs. In the rigid+deformable, we initialized VM* using rigid NODE, and performed both rigid and deformable in ANTs. 3) The ANTs using rigid and deformable transformations is worse than only using the non-rigid part of the model due to the underlying task challenges where the model needs resolving rigid motion first in the rigid+deformable experiments.

Evaluation and Experiment data (AC, R2) To our best knowledge, our work is the first to formulate image registration as a multi-scale NODE. This pilot study was verified on both public and private datasets. We hope our work could inspire further efforts. The use of simulated transform provides accurate and controllable ground truth for evaluation and is commonly used, e.g., UMIDIR, Becker MIA’17, and Hu MIA’20.

Advantages of neural ODE (R2) 1) The neural ODE is an iterative method which can refine the predicted transform step-by-step through learning to mimic the optimization step. 2) Neural ODEs are more memory efficient than cascaded networks since they use adjoint method/checkpointing for backpropagation without keeping all intermediate results. Therefore, the training memory is O(1) v.s. O(L) for L-cascaded network (Chen NeurIPS’18). We trained a rigid ODENet with step sizes of 0.5, 0.1, 0.05 and the memory usage was constant (8GB). Besides, other iterative method such as RCN ran out of memory when trained with more than 2 steps, while our method can handle more than 5+5+2=12 steps.

Tumors in data (R3) We acquired data from subjects without tumors for the generalization study.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This work proposes a neural ODE approach for deformable registration. Specifically, the neural ODE models the evolution of coefficients of a chosen transformation model. While there are interesting aspects of the method many questions were raised during the review, in particular, with respect to the evaluation and comparison to other methods. Unfortunately, many of these concerns still remain after the rebuttal. For example, all results are just based on synthetic deformations (some of them very large; e.g., rotations up to 40 degrees) and hence it is unclear if the method would generalize to a real registration scenario. Further, concerns regarding the evaluation performance of competing methods for various combinations of rigid+deformable and for comparable similarity measures largely remain. Hence, this work is in my opinion not ready for publication.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

15

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Positive: all reviewers found interest in the novel use of neural ODEs for registration. In fact there are several small ideas combined (to learn from I2I translation), and while there are several negative aspects: in particular the difficult to read descriptions. Future work should definitely include comparisons on public challenge datasets where more SOTA methods have been evaluated, e.g. Learn2Reg. I vote for a weak acceptance.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposes to use a neural ODE framework for learning-based 3D image registration. All reviewers thought this idea is interesting, and have certain novelty. In the rebuttal, the authors have answered the main issues raised by the reviewers, such as the relation to LDDMM , the advantages of neural ODE. I think these answers have made the paper more clear.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

back to top

Multi-scale Neural ODEs for 3D Medical Image Registration