Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Madeleine K. Wyburd, Nicola K. Dinsdale, Ana I. L. Namburete, Mark Jenkinson

Abstract

Accurate topology is key when performing meaningful anatomical segmentations, however, it is often overlooked in traditional deep learning methods. In this work we propose TEDS-Net: a novel segmentation method that guarantees accurate topology. Our method is built upon a continuous diffeomorphic framework, which enforces topology preservation. However, in practice, diffeomorphic fields are represented using a finite number of parameters and sampled using methods such as linear interpolation, violating the theoretical guarantees. We therefore introduce additional modifications to more strictly enforce it. Our network learns how to warp a binary prior, with the desired topological characteristics, to complete the segmentation task. We tested our method on myocardium segmentation from an open-source 2D heart dataset. TEDS-Net preserved topology in 100% of the cases, compared to 90% from the U-Net, without sacrificing on Hausdorff Distance or Dice performance. Code will be made available at: www.github.com/mwyburd/TEDS-Net.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_24

SharedIt: https://rdcu.be/cyhLQ

Link to the code repository

https://github.com/mwyburd/TEDS-Net

Link to the dataset(s)

https://www.creatis.insa-lyon.fr/Challenge/acdc/

Reviews

Review #1

Please describe the contribution of the paper

In learned diffeomorphic registration, theoretical topology preservation guarantees may not hold in practice due to discrete representations, sampling, etc. The authors propose a number of small architectural changes to a common registration architecture (VoxelMorph), namely a split into a bulk and a fine field, gaussian smoothing between integration layers, and additional upsampling to the learned fields before they are applied. Experiments were for image segmentation via deformation of a prior shape, as applied to ventricular myocardium segmentation (in which a ring prior should be warped with no gaps). The proposed network had zero images with topological errors in 100 held-out test images, vs. 10-12 errors for two baselines (U-Net and vanilla VoxelMorph).
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Aiming to enforce diffeomorphisms more accurately is an excellent goal, and would be valuable to many in the community.
- Empirical results (with excellent ablation studies) show strong improvements in topology preservation for the proposed method, compared to two sensible and widely-used baselines (U-Net and VoxelMorph).
- Clear and well-written paper, including an excellent description of the specific contributions made and the experimental results (including setup, hyperparameter tuning, ablation studies, etc).
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Contributions sometimes read as a list of engineering changes, as opposed to something more thoretically grounded. I would have appreciated more description of the justifications behind each of the changes made (this was good for the proposed split into bulk/fine transformations, but less clear for the gaussian smoothing and super-upsampling.)
- Changes do not provide theoretical guarantees on the resulting diffeomorphisms (to be clear, the authors are upfront about this). For example, the method does break down for ventricular myocardium segmentation when the prior shape is too thin (Fig 4b, prior radius = 10).
- Experimental results show that the proposed method improves diffeomorphisms when deforming a small continuous ring (representing the ventricular myocardium). It is unclear whether the same improvements will occur for other applications, e.g., if small gaps in a discontiuous prior are to be preserved.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- No code made available. I’d encourage the authors to contact the VoxelMorph developers as they do integrate contributions from others, and the changes described here could be very useful for the wider community.
- Description of experimental results (part 3 of the reproducibility checklist) in particular is very well done.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- Consider carefully proof-reading the paper for the difference between displacement and velocity fields.
- Justification for the super-upsampling is unclear, as it should not fix any problems in the original grid but instead add interpolated vectors between them.
- Unclear why MaxPooling is used as the final downsampler when creating the final prediction Y-hat. Is this closing small holes? Would this still be valid if you wanted to maintain small holes in the prior? Did you try anything else (e.g., AvgPooling)?
- Definition of the field regularisation loss (after eqn. 2): i,j should go from 1,1 to N_i, N_j.
- How did you tune the field regularisation weight parameter in VoxelMorph? Can you weight the field regularisation loss higher in order to have fewer incorrect topologies? Perhaps these parameters need to be tuned differently for VoxelMorph vs. the proposed TEDS-Net (since there are two field regularisation losses in TEDS-Net but only one for VoxelMorph).
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper provides some changes to a commonly-used architecture for learned diffeomorphic registration networks that are easy to implement and appear to yield significantly better topological preservation. I believe that others could benefit from trying these changes in their own implementations. However, my enthusiasm for the paper is dampened by the lack of theoretical justifications, and by the fact that the experiments focus on only one type of topological preservation (preventing gaps, as opposed to maintaining gaps in the prior).
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

The authors propose an image registration model which warps a 2D binary mask of a ring (corresponding to a left ventricle myocardium segmentation template) to short-axis cine-MR images with the goal of segmenting the LV myocardium. Several components are proposed to enforce a diffeomorphic transformation (specifically to reduce the prevalence of negative Jacobians in the displacement fields and to reduce incorrect topologies after warping), including a tanh activation on predicted velocity fields, Gaussian smoothing of intermediate outputs of the scaling-and-squaring integration steps, and a ‘super upsample’ where displacement fields are linearly interpolated to twice the resolution of the input image.

Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well written and the authors perform a thorough analysis of their method by comparing to a baseline segmentation model (U-net) and registration model (VoxelMorph), as well as performing an ablation study to determine the impact of different model components on metrics including Dice score, Hausdorff distance, percentage of negative Jacobian determinants (% J <0), and topological errors. They also provide a quantitative assessment of the effect of number of intergation layers and radius of the binary prior.

The authors illustrate that with the inclusion of the proposed Gaussian smoothing of intermediate integration outputs, a registration model which uses segmentations to drive the loss which would normally produce high % J <0 (consistent with results in [3] (Balakrishnan et al., 2019)), reduces % J <0 to zero, as well as reducing the number of incorrect topologies at test time to zero. Additionally, the proposed super-upsample appears to provide slight improvements in terms of Hausdorff distance.

The paper is motivated as a means to segment a target geometry, in this case the LV myocardium, using a diffeomorphic registration of a template binary labelmap. The results demonstrate that TEDS-Net indeed improves performance over a U-Net and VoxelMorph in terms of % J <0, HD, and correct topologies, while only performing slightly worse in terms of Dice score compared to the U-net.

Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

It is promising that introducing gaussing smoothing of intermediate integration outputs can reduce incorrect topologies and % J <0, but it is unclear whether this may violate the invertibility of the transformation, which is also a theoretical guarantee of diffeomorphisms. Furthermore, by introducing gaussian smoothing between each integration layer, would this not result in increasingly smooth fields used to update the displacement with larger numbers of integration steps? This seems to go against the idea that larger deformations can be more accurately modeled by using more (smaller) integration steps, as discussed in [2].

The authors motivate their work by the observation that guarantees of diffeomorphic formulations can break down in discrete settings especially involving large deformations. This is due to steps including computing displacements from velocity fields at discrete pixel locations via scaling-and-squaring, and sampling/deforming images based on these computed fields. The proposed step of Gaussian smoothing of the integration intermediate outputs in combination with a commonly used deformation gradient regularizer (L_grad) seems to greatly reduce % J <0. However, this is demonstrated for a particular setting in the paper where no image loss terms are used, and instead loss terms involve training data with only hard binary labelmaps, which have naturally sharp boundaries that are difficult to model with a diffeomorphic transformation. As shown in Tables I and IV in reference [3], using image loss terms generally results in very low % J <0, (<1% in Table I in [3]), while using only binary segmentations in the loss will naturally lead to significant increases in % J <0 (~10% in Table IV in [3] for 3D brain image data registartion). The impact of using only binary labelmaps and not images on the resulting % J <0 is understated in the paper.

The comparison to VoxelMorph is not sufficiently clearly explained, raising questions about the results (see Major Comments).

Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- images from the ACDC Dataset are used. However, only ‘5 slices’ from each image were used, presumably mid-ventricle slices where the topology is always a ring (as opposed to the base or apex where crescent and circle topologies, respectively, are more common.)
- code is not provided, making it difficult to understand the comparison with VoxelMorph, the implementation of which is not clearly explained.

Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Minor Comments:

A brief description or reference for Betti number to classify incorrect topology would help readers understand the measure.

In Table 1, (A3) has gaussian smoothing and super-upsample but not L_grad, while (A1) has none of these, and (A1) produces about half as many test segmentations with incorrect topologies. This seems to suggest that Gaussian smoothing and/or super-upsample without L_grad results in worse topologies than if they weren’t used at all. This indicates the importance of L_grad, which is overlooked in the discussion.

The authors do not acknowledge previous methods that have been published where separate decoder branches are used to estimate ‘bulk’ and ‘fine-tune’ transformations, such as in Stergios et al., “Linear and Deformable Image Reigstration with 3D Convoliotnal Neural Networks” (MICCAI RAMBO/BIA/TIA Worksop 2018). In that case, the ‘bulk’ transformation is modeled as a 12 parameter affine matrix, and composed with a non-rigid displacement field to obtain the final transformation.

The authors refer to the field ‘u’ as a displacement field, e.g. in Section 2 Method, CNN.: “bulk-displacement, u_bulk”, Section 2 Method, Diffeomorphic Layers. “… to enforce that the displacements were between …”. However, scaling-and-squaring is used for integration of a velocity field to obtain displacements. Velocity fields are not mentioned in the paper, and yet scaling-and-squaring is used. Please clarify.

Major Comments:

The comparison to VoxelMorph is not clearly described, and the results comparing the methods are therefore difficult to interpret for the following reasons:

The proposed model TEDS-Net passes only a single input to the CNN, i.e. the target image X, and deforms a template P to segment the LV myocardium. The authors perform a comparison with VoxelMorph in Table 1. VoxelMorph however requires two inputs to the CNN. How was the use of VoxelMorph formulated in this case?

The authors state in Section 2.1 that VoxelMorph “was modified for segmentations by including our activation function, Equation 1”. It is unclear how applying this activation function allows VoxelMorph to be “modified for segmentations”. As described in [3] (Balakrishnan, 2019), VoxelMorph can be used for segmentation simply by propagating the segmentation of a moving image to a fixed image. Why is the activation function in Equation 1 required?

The VoxelMorph method appears to produce a significant % J <0 in Table 1 (24.5%), as well as incorrect topologies in 12/100 test cases based on Betti numbers. Without seeing the instances where VoxelMorph produces an incorrect topology, and without knowing how VoxelMorph is being used to create a segmentation in the first place, it is very difficult to assess the validity of this comparison. The VoxelMorph TMI paper [3] does demonstrate that without using an image loss term and learning registration exclusively from segmentation labels in an auxilliary loss, % J <0 increases to about 10% (Table IV in [3]). It is understandable that this could increase for the current setting of large deformations; but the high number of incorrect topologies is surprising, and a visualiation of such cases would be helpful.

Gaussian smoothing is proposed for the intermediate outputs of integration as a way to reduce % J <0 in the resulting displacement fields. The motivation for this is to avoid the violation of theoretical guarantees of diffeomorphic transformations, which would risk topological preservation. Another important theoretical guarantee of a diffeomorphism is that it is invertible. Does the Gaussian smoothing of intermediate outputs during scaling-and-squaring allow this invertibility to be preserved?

As shown in Fig. 4, with a larger number of integration steps, Gaussian smoothing is shown to worsen registration performance. Theoretically, without gaussian smoothing, accuracy should increase with more steps [2] (the authors state however “it is known that increasing the number of composition layers amplifies instabilities within the discrete diffeomorphic fields [2]”, where is this explained in [2]?). This requires further elaboration.

Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper introduces a method to greatly reduce negative Jacobian determinants in diffeomorphic displacement fields in the setting of image registration, where binary labels are used to guide the loss. The paper is clearly presented and the method would be of interest to the scientific community. The comparison to VoxelMorph however is insufficiently described, and the setting that the registration is driven by labelmaps and not image data needs to be made clear as a cause of the observed large percentage of negative Jacobian determinants observed.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

2
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

This paper proposes a novel Topology Enforcing Diffeomorphic Segmentation Network (TEDS-Net), which is claimed the first deep learning technique to achieve 100% topology accuracy. Also, this paper combines spatial transformer networks (STN) and diffeomorphic displacement fields to complete a segmentation as the primary task.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This paper addresses a critical problem in medical image segmentation, which is to preserve the topology of the segmentations besides the pixel accuracy.
2. Though the diffeomorphism method has been used in image registration [6,7], the way the authors adopt it in medical image segmentation is nice and elegant. Also, it somehow provides theoretical guarantee to topology preserving during image segmentation in the method part.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. From the method part, this paper seems to be a general method to learn to segment with correct topology, while the dataset validated in the experiment part is very specific. The ground truth of the dataset seems to have a single loop. More qualitative and quantitative are needed to support the effectiveness of the proposed method.
2. An ablation study of the weight parameter beta will be interesting and helpful to understand the method.
3. I am also curious how this method is generalized to other (medical) image segmentation tasks, and if it works compared with other existing topology-preserving methods [5,9,13].
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Reproducible
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

None
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper addresses a critical problem in medical image segmentation and the method is nice and elegant.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
Summary: Proposes a number of architectural changes to VoxelMorph to better preserve topology by decomposing into a bulk and fine field, with Gaussian smoothing between layers and additional up-sampling to the learned fields before they are applied. The diffeomorphic warps were applied as part of a segmentation model to assess behaviour when applied to ventricular myocardium segmentation.

Positives:
- Enforcing diffeomorphisms is a useful goal that should have many applications.
- Empirical results show strong improvements in topology preservation, compared to two sensible and widely-used baselines (U-Net and VoxelMorph).
- Clear and well-written, including a good description of the specific contributions and the experimental results.
Negatives:
- The work consists of a set of ad hoc changes, which are not necessarily theoretically justified.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

Author Feedback

We would like to thank all the reviewers for their comments and suggestions.

VoxelMorph Implementation:

In this work, we compared TEDS-Net performance to the VoxelMorph (VM) registration framework. Unlike TEDS-Net, VM requires two inputs, so for our adapted implementation of VM we used the shape prior as the second input channel. We were previously integrating our novel activation function into VM’s methods as without this term we were finding the segmentation performance to be extremely poor. However, this has now been remedied by better regulating the loss regularisation term, resulting in a much smaller percentage of folding Jacobians whilst the segmentation performance remained consistent, matching that reported in the original VM application [3].

Theoretical justification for Gaussian smoothing and super-upsampling (R1 and MR):

Gaussian smoothing and super-upsampling terms are used in TEDS-Net to encourage true diffeomorphism. Both these modifications are used to limit “the numerical inaccuracies brought about by the discrete composition of two diffeomorphic fields” as stated in section 3. Discrete sampling can cause numerical instabilities between voxels, these two additional terms were therefore used as additional smoothing terms to smooth between neighbouring voxels, minimising any extreme differences in local deformations.

Gaussian Smoothing and Invertibility (RI and R2):

R1 and R2 commented on the invertibility of the transformation after applying Gaussian smoothing between the composition layers. Invertibility is a theoretical guarantee of diffeomorphisms. However, in this work our main focus was topology preservation by encouraging one-to-one mapping of our transformations, which we achieved as shown by all-positive Jacobian determinants. However, in future work the effect that Gaussian smoothing has on the invertibility of the generated fields will be fully investigated.

Applications to other Topologies (R3):

R3 discussed whether TEDS-Net was generalisable to other topologies. The theory behind TEDS-Net should work across all topologies, with the only restriction being that the topology of the desired anatomy must be known. Due to page restrictions, we were limited to validating our methods on a single dataset. In future work, this will be expanded to different topologies over a range of medical datasets.

Clarifications:

R2 questioned the statement: “it is known that increasing the number of composition layers amplifies instabilities within the discrete diffeomorphic fields”. This statement is misleading and should have read: “Although theoretically accuracy should increase with the number of composition layers, this requires performing more compositions that can each bring about small violations, due to the discrete nature of the fields.”. This statement will be clarified in the camera-ready version.

In the camera-ready version, a clarification on the field regularisation term; inclusion of the Betti number reference; and the distinction between velocity and displacement fields will be included or corrected.

back to top

TEDS-Net: Enforcing Diffeomorphisms in Spatial Transformers to Guarantee Topology Preservation in Segmentations