Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Balder Croquet, Daan Christiaens, Seth M. Weinberg, Michael Bronstein, Dirk Vandermeulen, Peter Claes

Abstract

Registration is an essential tool in image analysis. Deep learning based alternatives have recently become popular, achieving competitive performance at a faster speed. However, many contemporary techniques are limited to volumetric representations, despite increased popularity of 3D surface and shape data in medical image analysis. We propose a one-step registration model for 3D surfaces that internalises a lower dimensional probabilistic deformation model (PDM) using conditional variational autoencoders (CVAE). The deformations are constrained to be diffeomorphic using an exponentiation layer. The one-step registration model is benchmarked against iterative techniques, trading in a slightly lower performance in terms of shape fit for a higher compactness. We experiment with two distance metrics, Chamfer distance (CD) and Sinkhorn divergence (SD), as specific distance functions for surface data in real-world registration scenarios. The internalised deformation model is benchmarked against linear principal component analysis (PCA) achieving competitive results and improved generalisability from lower dimensions.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_12

SharedIt: https://rdcu.be/cyhPT

Link to the code repository

https://gitlab.kuleuven.be/u0132345/deepdiffeomorphicfaceregistration

Link to the dataset(s)

https://www.facebase.org/chaise/record/#1/isa:dataset/RID=VWP


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors introduce a one-step non-rigid surface registration method which internally learns a probabilistic deformation model. The method makes use of a conditional variational autoencoder setup and experiments with different loss-functions (chamfer distance, sinkhorn divergence). The method is validated on 2454 face surfaces. Their probabilistic model is competitive to a linear PCA model and the registration results are competitive to a standard non-rigid ICP method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the method is the possibility to perform a 1-step registration. Furthermore, the diffeomorphism is learned on the ambient space instead of the discretion surface, so the registration can be done on different levels of surface discretization (mesh resolutions)

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main weakness about the paper is the evaluation.

    The quantiative evaluation targets only surface matching and not the established correspondence, which is the main goal of the method.

    The authors mention that the CVAE model is more compact than a PCA model. It would also be helpful to see the absolute variation and not just the percentage variation each component is responsible for as in [M. Styner et al.: Evaluation of 3D Correspondence Methods for Model Building].

    The method is validated on a single data domain (faces) where a relatively large dataset (+2000) meshes is needed. It is difficult to know how well this method generalizes to other domains. And within the medical domain, it is rarely the case that so much data is available. Furthermore, the method needs extensive manually cleanup as mentioned in section 3 which relativize the argument of having a one-step registration method. In contrast CPD [A. Myronenko et al.: Point set registration: Coherent Point Drift] has a an outlier parameter to cope with noise and [1] is able to register partial meshes - thus requiring less manual intervention at the cost of a few additional algorithm iterations.

    The generalization plot shows an early plateauing when increasing the components. This suggest that the model misses to capture more variability of the data. The authors do not investigate if this is a limitation of their volume discretization, latent space or other model based limitations.

    We question that the drop of the specificity after rank 32 is due to overfitting. This is a very unlikely behavior of a model that the specificity drops after adding more model flexibility. The authors missed to justify or better explain their thoughts.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    While the paper is based on a publicly available dataset, it is not mentioned that the code or the trained network will be publicly available. We are doubtful that someone can re-implement the paper based on the presented information in reasonable time.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The authors mention that they use a standard non-rigid ICP (nICP) method [1] as implemented in [23]. To the best of our knowledge, there is no “standard” nICP method. Furthermore, [1] does not seem to be the nICP method implemented in [23].

    It is difficult from the images in Fig. 2 to get an idea of the registration quality. It would help to remove the checker-pattern.

    The model compactness is only given as a relative measure to the individual models, which makes it unusable to compare models. In addition, some results are surprising and are not further discussed. The authors mention that the specificity drops suddenly after including more basis functions is addressed in the discussion but with a comment that we can not fully understand or agree on.

    It is not obvious why the authors use “incremental PCA” [19] instead of a standard PCA model (Point Distribution Model). They also mention that the “incremental PCA” is referred to as a tangent-PCA, but without a reference (t-PCA is not mentioned in [19]).

    We would encourage the authors to make the model implementation publicly available together with the paper.

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    We have expert knowledge of shape modeling and question the usefulness of the evaluation. Furthermore, there is only a single showcase experiment. After reading the paper it is not clear how the method compares to other methods, how well it can perform on other problems, or what the strengths and limitations of the individual components are. While the core method might be valuable for publishing, the presentation, evaluation and discussion is not.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    This paper propose a deep learning approach for 3D surface registration while learning a probabilistic deformation model. The deformation is constrained to be a diffeomorphism.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Main strength : learning diffeomorphic deformation model using deep learning approach, is interesting to learn complex deformation from specific anatomical structures.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Main weakness : the validation of the method is not satisfying, as it is only compared to one method (MeshMonk), which is not introduced, nor presented with its main limitations (therefore, not part of the state-of-the-art?). Why not comparing the proposed method to state of the art methods, for example the ones from the introduction ? No application on medical images.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Authors do not specify if the code is available, but data can be downloaded on FaceBase. The re-implementation of the method from the paper needs more details, however values of parameters used are given.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Major comments :

    • Introduction: The motivation is not clear to me. The authors mention diffeomorphic model deformations, then the popularity of deep learning, but do not state why they want to learn diffeomorphic deformation via deep learning and what are the limitations of diffeomorphic models.
    • Method : I am not sure why considering Chamfer distance, as it is not efficient for anatomical structures and can lead to mismatches. In an other hand why not considering a varifold or current representation of shape (powerful for anatomical shape analysis), instead of point clouds ? Is is doable in a DL architecture ?
    • Results : A world about what “MeshMonk”would help. Also, I am wondering what would be the result with a state of the art LDDMM approach, as I think that the registration could be good too.

    Other comments :

    • variables are not well described : q_w ? Z? V? …
    • KL divergence should be explicited, as for me it should be written like KL(h p)
    • page 4: if \epsilon tends to zero OT(\alpha,\beta) is a constant as it does not depend on x and y, is it expected?
  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The validation do not allow to by the proposed method over other methods from the literature. Also, no results are shown on real medical imaging data. In that sense, introduction of motivation and goals (as breakthrough results are not mandatory) are very important, as the validation.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    This paper presents a VAE architecture for unsupervised diffeomorphic surface registration. The encoder uses a PointNet++ architecture and is fed point clouds, while the decoder uses a convolutional neural network to produce a stationary velocity field defined on a regular lattice. The loss function includes symmetric matching and regularization terms, and two different matching terms (the first based on minimal distance, the other based on the Sinkhorn divergence) are compared.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    While registration network for images and volumes have very rapidly become mainsteam, point clouds and surfaces trail behind. This is probably due to the fact that convolutions are much less efficient on a non-regular lattice and that there is no equivalent to the U-Net architecture for graphs. This paper smartly pieces together multiple components developed in recent years to build a diffeomorphic registration network for surfaces. In my opinion, the most important component of this paper is the “pointcloud-to-volume” VAE, built using a PointNet encoder and ConvNet decoder. The validation section is interesting and goes beyond comparing mere accuracies: the authors show that their network trades off some loss in accuracy for a more compact representation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is a shame that no effort was done to apply the method to medical imaging data (unless faces are considered medical images – and they may well be – in which case this should be explained in the manuscript). In general, I would consider this paper out of scope. However, I do see how it can easily be applied to medical imaging data and have an impact there. I wonder how this method would fare with more complex and convoluted shapes such as brain surfaces. Does learning helps to avoid spurious local minima?

    Piecing together component from different fields to propose new solutions to old problems is a valid and important part of science. However, I found sometimes difficult to understand the exact contribution of the authors: what idea did they have that others didn’t and allowed them to break the problem? It is always difficult to perform a thorough literature review in a conference paper with a strict page limit. However, I feel that a few papers are missing and should have been discussed:

    • “Unsupervised learning of probabilistic diffeomorphic registration for images and surfaces”, Dalca et al., Medical Image Analysis (2019) https://arxiv.org/pdf/1903.03545.pdf While the method described in this paper requires both a surface and a volume, the use of losses for point clouds is discussed. Given that some components are common between the papers, the authors could explain how their architecture enables the move from a “volume and surface” network to a “pure surface” network.
    • “FlowNet3D: Learning Scene Flow in 3D Point Clouds”, Liu et al., CVPR (2019) https://openaccess.thecvf.com/content_CVPR_2019/papers/Liu_FlowNet3D_Learning_Scene_Flow_in_3D_Point_Clouds_CVPR_2019_paper.pdf Although it does not generate diffeomorphic maps, it seems that this method does perform non-linear registration between point clouds.
    • “ResNet-LDDMM: Advancing the LDDMM Framework Using Deep Residual Networks”, Ben Amor et al., preprint (2021) https://arxiv.org/pdf/2102.07951.pdf This one is a (pretty recent) preprint so does not necessarily need to be discussed. But the authors might be interested since it also implements diffeomorphic registration between point clouds.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • The code does not seem available, although it seems from the manuscript that most components (the PointNet encoder, the geometric loss) are implemented in different open source packages and could be used to reimplement the full network.
    • The dataset is publicly available
    • The parameters of the network (architecture, number of features) are provided.
    • The splitting procedure of the dataset is provided.
    • The optimization parameters are provided.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • The SVFs are smoothed as in Krebs et al. and given a smoothness penalty as in Balakrishnan et al. I wonder if removing the explicit smoothing would yield better registration results. The advantage of the smoothness penalty over the explicit smoothing is that it is balanced with the other terms and allows the velocity to be locally less smooth if it really improves the match.
    • Eq (1) is described as the variational LB (to maximize) but there is a minus sign so it is the negative of the LB (to minimize).
    • Maybe say somewhere that the different matching losses (Lmse, Lcd, Lsd) correspond to -log p(y x)
    • While Rsmooth is a very common penalty in registration, I am not sure I understand the point of Rvert. I see it as a penalty on absolute displacement, which goes slightly against the philosophy of “large deformation“ diffeomorphic models. The use of an L1 loss is even more puzzling.
    • Invertibility comes from integration of smooth velocity fields, not from symmetric losses. It is absolutely possible to have invertible/diffeomorphic transforms with a non-symmetric loss. Problems that involve a template shape and an observed or measured shape are often non-symmetric by construction.
    • It is not extremely clear to me how the “compactness” metric was computed.
    • How were “optimization-based” SVFs obtained (section 3.3)? By making the velocity fields trainable or using a completely different implementation?
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    There are very few learning-based registration methods for surfaces, so this paper certainly fills a void. The paper is relatively well written, and the methods easy to follow. The validation is good, although a single dataset was used, which happened to be loosely related to medical imaging.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    4

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The strength of the paper was in the learning of probabilistic deformation models under the variational auto-encoder setting.

    The authors are invited to comment on aspects of method evaluation, missing references. They should also comment on why an incremental PCA was used instead of the standard tangent PCA. Again, details of the tangent PCA should be provided as the implementation differs according to the representation and the metric used.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5




Author Feedback

We would like to thank all reviewers for their comments and will make the necessary changes.

R2, R3 did not consider faces as medical imaging data: The shape of the face is important in various medical domains including craniofacial reconstructive surgery and clinical genetics. 3D images of the face (e.g. collected using 3dMDFace) are used in both clinic and research as a low-cost radiation-free alternative or addition to traditional medical imaging modalities. We will add this to the introduction.

R1 states that in the medical domain, large datasets are rare: Medical data is becoming more widely available e.g. the UK Biobank plans to scan the brain, heart, abdomen, … of up to 100,000 participants by 2022.

R2 requested to motivate learning diffeomorphisms: We used diffeomorphisms for extra robustness (2.1. Registration Model). Furthermore, because of the general applicability and the metric induced by a diffeomorphism we aim to generalize the technique to both volumetric and surface data in the future (4. Discussion). We will clarify the choice of diffeomorphisms in the introduction.

R2 requested extra information about MeshMonk: MeshMonk was considered the optimal baseline, because it was developed, optimised and validated for face registration. It has been used in >30 medical anthropological, genetic studies since its release in 2018. In our view its major limitation is the iterative nature with a complexity that scales with the number of vertices and its lack of a diffeomorphic guarantee which the proposed technique improves upon. We will add a summary of MeshMonk and its limitations in the introduction.

R1 argues that we did not qualitatively evaluate the correspondence: In 3.3 validation of PDM, the data was in dense correspondence and we used exact distances between corresponding points (RMSE), the generalisability is thus reflective of the correspondence. In 3.2 Validation registration, we quantified the quality of the correspondence with model compactness.

R2 criticises the use of a single baseline: We appreciate the suggestion of adding LDDMM as a diffeomorphic baseline and will add ‘Deformetrica’ (a well-known LDDMM implementation) to the registration experiment.

R3 states that the compactness metric was unclear, R1 proposes to use the absolute variation instead of the percentage variation: The compactness is defined as the normalised cumulative variance of a PCA model fitted to the corresponding points outputted by 1. MeshMonk 2. the CVAEs. MeshMonk has better fit than the CVAE and thus there is more variance in this data. As such the absolute variance would reflect the fit of the surfaces and not the ‘consistency’ of the corresponding points.

R1 Requested clarification regarding incremental PCA and t-PCA: Incremental PCA (a concrete implementation of PCA, used to handle memory scaling in both subject and feature dimension) was applied to the velocity fields obtained by iterative SVF (same registration model without the CVAE). We refer to this as tangent-PCA (PCA on tangent vectors of the lie group) to avoid confusion with the PCA (point distribution model) in 3.2. We will add the objective function for iterative SVF.

R1 questions that the “drop” of specificity is due to overfitting and whether the plateauing of the generalisation is due to model limitations: Performance on the test set (generalisability) plateaus, while the ability to generate examples that are close to the training set (specificity) improves (RMSE decreases). The model is thus becoming more descriptive of the training data while stagnating on the test data (overfits). However, we agree with R1 that the cause of overfitting is unclear. We will rephrase this comment in section 3.3. Validation of PDM.

Missing references: We are thankful of the works cited by R3 and will add them to the introduction for completeness.

Reproducibility: As encouraged by the reviewers we will make the code publicly available and provide a link.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have responded to the reviewers comments satisfactorily and in great detail. Their explanation about incremental PCA vs a direct tangent PCA is satisfactory. They authors state they will also include the objective function for iterative SVF, which should improve further understanding.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a network for surface registration based on SVF and various similarity measures that do not require correspondence. The method is tested on face registration. No clear medical problem that relates to the tackled data is mentioned in the manuscript though the rebuttal mentions several use cases. There are several concerns with regard to the evaluation which remain after the rebuttal: 1) Comparisons are only with respect to MeshMonk though the registration literature has proposed various algorithms which are appropriate for surface registrations (for example in the LDDMM community with varifold or current similarity measures; or in the computer vision community Coherent Point Drift); 2) More importantly, the validation does not appear to be appropriate in the sense that no landmark errors are evaluated; instead it is based on RMSE which is akin to evaluating a volumetric registration based only on the similarity measure rather than some more direct measure of correspondence quality.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    18



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper remains quite borderline after the rebuttal. Unfortunately, I feel like the rebuttal is not very strong. For example, there are several promises (e.g. will add Deformetrica results) but not effort to do so during the rebuttal period.

    I think that the face registration application is interesting, and while not obviously in scope, could be in scope with a better justification. To me, the fact that the application is different is what makes the paper potentially interesting to talk about at MICCAI, and I am recommending a conditional accept. However, the authors must make the following changes (without these, the paper should be rejected, as it is not quite up to the bar):

    • Clearly motivate why face alignment is appropriate for the MICCAI community
    • Add the promised citations, the expanded discussion about meshmonk, and the additional baselines.
    • Produce code and discuss overfitting.
  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    11



back to top