Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xinrui Song, Hengtao Guo, Xuanang Xu, Hanqing Chao, Sheng Xu, Baris Turkbey, Bradford J. Wood, Ge Wang, Pingkun Yan

Abstract

Prostate cancer biopsy benefits from accurate fusion of transrectal ultrasound (TRUS) and magnetic resonance (MR) images. In the past few years, convolutional neural networks (CNN) have been proved powerful in extracting image features crucial for image registration. However, challenging applications and recent advances in computer vision suggest that CNNs are quite limited in its ability to understand spatial correspondence between features, a task in which the self-attention mechanism excels. This paper aims to develop a self-attention mechanism specifically for cross-modal image registration. Our proposed cross-modal attention block effectively maps each of the features in one volume to all features in the corresponding volume. Our experimental results demonstrate that a CNN network designed with the cross-modal attention block embedded outperforms an advanced CNN network 10 times of its size. We also incorporated visualization techniques to improve the interpretability of our network.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_7

SharedIt: https://rdcu.be/cyhPO

Link to the code repository

https://github.com/DIAL-RPI/Attention-Reg

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper is aimed at developing a self-attention method for multi-modal rigid image registration, promising experiments on transrectal ultrasound and MR images are provided.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    -The paper is well organized. -The aim of the work is clear, with clear applications. -The proposed attention mechanism for the task of medical image registration seems novel. -Every step made in the registration pipeline was well justified. The explanations were clear and the motivation behind every decision was explained adequately.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    My major concern is that registration is only constrained to rigid transformations. It is not clear to me that simpler rigid registration methodologies cannot be used in this context.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors state that the work is reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    1-Re: Line in first paragraph under Introduction: “Feature-based methods compute the similarity between images by representing image appearances using features.” Isn’t the idea of extracting features aimed at simplifying the alignment process? Are you implying that image features in MR images that are used to guide the registration are not present in both US and MR images, i.e., modality-dependent? If so, could you provide an example demonstrating the fact?

    2-Please define how the surface registration error (page 6) is calculated.

    3-More visual results could have been provided that demonstrate the good results as well as possible pitfalls and the reasons for such pitfalls.

    1. Minor comments: a. Page 2: Change “In out experiments…” -> In our experiments…” b. Page 2, last paragraph: change “spacial” to “spatial” c. Page 3: P \in \mathbb{R}^{LWH \times 32}, and similarly for C. d. Page 3, 2nd paragraph: i(\cdot) e. Page 6: pertubate -> perturbed
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It is a well-written paper that covers an interesting application of a cross-modal attention. It is only aimed at rigid registration and I am not fully convinced that this is a proper for this task.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    4

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    The authors proposed a method for learning-based multi-modal MRI-TRUS image registration, which employs non-local self-attention mechanism in order to establish spatial correspondences that drive the registration process. Evaluation on an independent set of 68 MRI-TRUS volume pairs showed a reduction of surface registration error in comparison to several state-of-the-art methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • manuscript is well structured and easy to follow
    • the proposed cross-modal attention block seems novel
    • quantitative comparison to relevant state-of-the-art appproaches
    • significantly improved results over the state-of-the-art
    • without modifications, the method has great potential to solve other multi-modal registration problems
    • strong validation, based on 528/66/68 train/validation/test cases of MRI-TRUS volume pairs
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • some recent and relevant state-of-the-art was not captured in the background review and/or not tested (while code is available)
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Data and code do not seem to be made publicly available. The method description and graphical presentation in the manuscript seems detailed enough to allow for the method to be reproduced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • voxelmorph (ref 1) is often used in other publications as a reference deep learning based image registration method (also due to the code being available and well documented) and its performance could also be demonstrated to better understand, also for other potential applications, the performance gains of the novel method

    • the PDD nets and its variants as presented in previous two MICCAI editions seem to be a relevant state-of-the-art and could be considered (note that code is available for both methods), eg. see

    Heinrich, MICCAI 2019, Closing the Gap between Deep and Conventional Image Registration using Probabilistic Dense Displacement Networks Heinrich and Hansen, MICCAI2020, Highly accurate and memory efficient unsupervised learning-based discrete CT registration using 2.5D displacement search

    • references 6 and 7 are the same
  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a very solid submission with novel methodological contribution and the potential to be applied for general (multi-modal) image registration. Validation seems sufficient and the results clearly show the advantage of the method.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    This paper presents a self-attention mechanism that maps features across two feature sets, for multimodal image registration. The cross model attention block and the following (rigid) registration module are clearly explained. The method is applied on a 650+ cases dataset of MR and TRUS images of the prostate. Results are compared to two iterative registration methods and two CNN-based registration networks. The proposed solution significantly outperforms all other methods in terms of residual errors, with only 1/10 of the number of parameters and a reduced run time.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • the cross-modal attention mechanism is novel
    • the overall network design is smart and very well explained
    • experiments include objective comparison with existing methods
    • evaluating the registration network without the attention block clearly demonstrates the benefit of the self-attention mechanism contribution
    • results are very good, with a significant reduction of the network size and computation time
    • very good literature review of deep learning-based image registration
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Figure 4 is difficult to read and understand.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Since the method is very well described, it should be easily reprogrammed. In addition the experiments and the application of state-of-the-art methods to the dataset are clearly detailed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • Running the network with the prostate segmentation instead of the MR image is an interesting add-on, and an opening towards other applications. Results are slightly better (no significance I guess), which makes sense since the features are easier to catch on a mask, although at the price of image segmentation.
    • Performance is measured with the surface registration error (SRE). It could be interesting to add other metrics like the Hausdorff distance to better evaluate the maximal errors and robustness.
    • How are the ground truth rigid transforms computed and validated in your dataset?
    • “the performance of our network with segmentation label as input was consistently better, with significantly reduced SRE when compared to MSReg (p <0.001).” The significance was only achieved with the label (not image) Attention-Reg and after Stage 2? This could be clarified.
    • Fig. 3: should the “flatten” arrow starts from the end of the deep registration block?
    • Fig. 4 is really difficult to read; there is an error in the caption (“Attention-Reg (label)” in the bottom row?). Feature maps are difficult to interpret here. It could be interesting to overlay the prostate contour before/after registration, next to the feature maps, to better appreciate the registration result.
  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is interesting with some novelty, very well described and justified, and results are significantly improved. Very clear paper.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    All reviewers commented on the important positive aspects of this work: validation on large dataset, good description of novel cross-modal attention modules, elegant compact network design, promise of public code release and fair comparison. The minor critical points to further improve presentation or discuss the potential for extension to non-linear transformations should be carefully addressed in a final version, but my recommendation is accept w/o rebuttal. I would just add another small issue that should be considered before publication: to my understanding all images are manually pre-aligned and use synthetic transformations for training/validation and test. This might introduce a bias for learning based methods that may benefit from detecting specific boundaries of image interpolation, and leads to an idealised training scenario (where all augmented transformations are centred around the true one)

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2




Author Feedback

We highly appreciate the reviewers and AC for their very positive opinions and constructive comments on our work. We would like to clarify a few points in response to the reviewers’ concerns.

The reviewers commented on the limitations of rigid/linear transformation. This is a valuable point. Our method is designed for multi-modal registration. The field of deformable multi-modal registration using deep learning is subject to the common limitation of lacking annotated deformable transformation for training and validation. Voxelmorph [2] focuses on mono-modal registration, although the researchers suggested the feasibility of using mutual information (MI) to supervise a multi-modal version of Voxelmorph. However, the available literature and our prior research indicate that MI is incapable of evaluating the quality of MR-TRUS alignment. Other established deformable multi-modal registration methods, e.g. [1], require multiple segmentations per image for weakly supervised training, but we do not have access to data of such quality yet. We completely agree with the reviewers that deformable registration can be very meaningful for our targeted applications. Our team plans to expand our data annotation in our future work to enable deep learning based multi-modal deformable registration.

The AC raised a great question about our data augmentation method and whether it will induce an idealized training scenario. We carefully revisited our method. We can conclude that our augmentation method does not lead to shortcut learning, since the synthetic samples are uniformly distributed throughout the solution space. Furthermore, although the synthetic registrations center around the ground truths, every synthetic data point also contributes to the testing and validation errors. If the trained model was biased towards the center of the solution space, it would have shown up in the testing phase, since many synthetic transformations are far away from the center. Additionally, the augmented images were resampled from a sub-region of the original images. Therefore, for most of the cases, the augmented images are filled with image contents without zero-paddings, which could prevent the “bias for learning based methods that may benefit from detecting specific boundaries of image interpolation”. We explained more details about the image preprocessing in our final version.

Lastly, we made minor revisions to be included in the final version to clarify some details. We will also insert the Github link of our source code to the final version.

References [1] Hu et al., “Weakly-supervised convolutional neural networks for multimodal image registration.” Medical image analysis 49 (2018): 1-13. [2] Balakrishnan, et al., “VoxelMorph: a learning framework for deformable medical image registration.” IEEE TMI 38 (2019): 1788-1800.



back to top