Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Adam Schmidt, Septimiu E. Salcudean

Abstract

Many descriptors exist that are usable in real-time and tailored for indoor and outdoor tracking and mapping, with a small subset of these being learned descriptors. In order to enable the same in deformable surgical environments without ground truth data, we propose a Real-Time Rotated descriptor, ReTRo, that can be trained in a weakly-supervised manner using stereo images. We propose a novel network that creates these fast, high-quality descriptors that have the option to be binary-valued. ReTRo is the first convolutional feature descriptor to learn a sampling pattern as part of the network, in addition to being the first real-time learned descriptor for surgery. ReTRo runs on multiple scales and has a large receptive field while only requiring small patches for input, affording it great speed. We quantify ReTRo by using it for pose estimation and tissue tracking, demonstrating its efficacy and real-time speed. ReTRo outperforms classical descriptors used in surgery and it will enable surgical tracking and mapping frameworks.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_27

SharedIt: https://rdcu.be/cyhQp

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a novel convolutional descriptor for feature tracking from live stereo streams. This approach is benchmarked against existing ones and outperforms them.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Well written, clear presentation of the whealth of methods involved, well evaluated and convincing.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

did not find any
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

the answers seem ok and would allow - with quite an effort, though - to reproduce the results.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

-
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

see above
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper presents a convolutional descriptor for surgical environments.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The motivation that learns a rotation-invariant descriptor is fine.
- The application scenario is important.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The paper is not well organized, in particular, the contribution is not well explained.
- The method is built upon existing methods without careful explanation.
- The experimental results are not well explained.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

No data, but I think the paper can be reproduced.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

This paper presents a convolutional descriptor for surgical environments. The application scenario is good, but the paper does not show contributions that support the acceptance of this work.

First, the introduction is not well written. The claims that “We propose ReTRo, the first surgical descriptor that runs in real-time and learns in non-rigid environments” is not well justified. Why do the “surgical” descriptors clearly different from generalized image descriptors? Then, the next part (first, second, third) seems to claim the contributions of this paper but the contributions are not solid. This paper is built upon existing methods (and old )

Second, the experiments cannot justify the desired conclusions. There is a single dataset, and the comparison methods seem weird. For example, CAPS produces just comparable results to SIFT, which may imply that the experimental settings are not well done.

Some designs of this work are also weird and not well explained. For example, as a deep-learning-based descriptor, ReTRo is built upon FAST, a detector that was proposed 15 years ago. Is there a new detector that can improve performance?

Overall, I think the quality of this paper does not achieve the standard of MICCAI.
Please state your overall opinion of the paper

reject (3)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Please see Q7.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

2
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

This paper proposes a convolutional descriptor for stereo matching in surgical environments. This descriptor is real-time and invariant to feature rotation. It performs similarly to CAPS while contains fewer parameters thus more efficient and faster.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The proposed method is sound and novel, demonstrating good performance on surgical data.
1. The idea of learning angle image and sparse sampling pattern in neural network is novel
2. Training the network in a sparse way is efficient
3. The validation is solid, showing the potential to be used in clinical environments.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The description of the network structure needs to be improved to provide more details for readers to follow and reproduce
2. Explanation of some terminologies and notions is missing
3. Not enough discussion regarding the clinical feasibility, although some validation work has been carried out on surgical data.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The readers will need more details of the network structure to reproduce this work
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
In general, this is a well-organized paper with clear structure. The weakness is on lacking of details on some step, and weak discussion. I hope the authors can make it clearer regarding the following points:
1. This network can learn sampling patterns – If this means “this network can learn to detect keypoints”, the authors need to make this statement clearer.
2. In terms of clinical feasibility, is 10 degree error which was used to calculate accuracy on Hamlyn dataset, good enough for 3D reconstruction during surgery?
3. Why do ReTRoM and ReTRoC outperform ReTRo in Fig. 5?
4. Explanation of some terminologies and notions need to be provided (e.g. what is ReTRoM/C/B? What is Rmax?)
5. More discussion regarding the clinical feasibility will be appreciated.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a novel and sound method and outperforms other classic methods. Compared with other DL-based methods, it also has advantages in inference speed and memory usage.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

2
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper receive quite divergent opinions. R2 and R3 raised some concrete questions and issues. In particular, R2 pointed out that the unique requirement and challenges for designing “surgical descriptors” are unclear. Why do the “surgical” descriptors different from generalized image descriptors. In addition, the experiments results are not well analyzed and cannot justify the desired conclusions, some designs of the method are not well explained. R3 mentioned that the paper lacks discussion on the clinical feasibility. The AC would like to give a chance to the authors to response to these issues.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

Author Feedback

R1 and R3 are supportive. We disagree with R2; rebuttal below.
Clarifications provided below to R3 will be added to the paper. R2: concern about the need for surgical image descriptors. Prior learning-based descriptors are based on outdoor images. Endoscopic surgical images are very different - they stem from deformable tissue, have more specular reflections, and are very much affected by illumination. A major difference is the limited availability of ground truth, so we cannot use supervised learning methods as done for most convolutional descriptors. Some non-surgical descriptors that learn in a weakly-supervised manner exist, e.g. CAPS [31], but none are suitable for real-time tracking. Our descriptor, ReTRo, is unique: it is real-time, requires no ground truth, and is being trained for a different domain than addressed in prior work. R2: concern that our contributions are not solid, and built upon old methods. Our network is inspired by old methods - hence the name - but uses a novel framework and the latest techniques. The novelty of our method is three-fold: (i) We present the first descriptor network that actually learns a sampling pattern like that of ORB [26]. Unlike ORB, we use modern machine learning and do not require greedy optimization or ground truth. This enables us to sample a learned, rotated pattern around each detected keypoint, providing a high receptive field with low computational cost. This efficiency is not achievable by just stacking convolutional layers. (ii) We are the first to apply pose and epipolar supervision in a sparse manner. In contrast, prior descriptors such as CAPS [31] and DISK [30] run over full images. (iii) We evaluate the descriptor network on two datasets of surgical images, and provide a comprehensive comparison with well-established methods and the state of the art. R2: we only use one dataset; experiments do not justify ReTRo’s benefit. We disagree. We used two datasets: Hamlyn (with a split for training and testing) and SuPeR [17] (only for testing). For comparison, the SuPeR paper uses just their own dataset for tissue tracking analysis, while we train on one data set (Hamlyn) and test on two (Hamlyn and SuPeR). For demonstrating usability of our descriptor, the annotated point tracking data from the SuPeR dataset (Fig. 3) is the most important, as it directly quantifies tracking. Only CAPS achieves similar performance to ReTRo, but is not real-time, sparse, or low-memory. R2: why use FAST [25] as it is an older detector. Many modern descriptors still use classical detectors. CAPS [31], for example, uses SIFT’s detected keypoints. We chose FAST as we desire a detector with real-time capability that does not require ground truth. R3 (6.1): ‘This network can learn sampling patterns – If this means “this network can learn to detect keypoints”, the authors need to make this statement clearer.’ Our network learns to describe keypoints, but does not detect them; it learns sampling patterns around a detected keypoint to gather descriptive features. R3 (6.2, 6.5): Clinical feasibility. For surgical perception, SuPer Deep [20] concludes there is a need for lightweight models and learnable tissue trackers. We believe ReTRo helps meet this need. Our future work will construct specialized phantoms and demonstrate how well mean Average Accuracy (from image matching literature [13]), and tracking error (from surgical tissue tracking literature) for 3D surface reconstruction. For clinical feasibility of tracked annotation, we can infer ReTRo is on par with CAPS with most errors being under 1% of the image size (Fig. 3). R3: (6.3): Why do ReTRoM and ReTRoC outperform ReTRo in Fig. 5? Forward-backward error is affected by image smoothness. CAPS [31], ReTRoC at quarter scale, and ReTRoM at half-scale likely perform better than ReTRo at full scale in this scenario because at lower resolution they will be affected less by high-frequency noise.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have clarified the major concerns raised by R2 in this rebuttal. The descriptor is based on latest technique and suitable for real-time surgical applications. The proposed method address unique challenges in endoscopic surgical images, i.e. have more specular reflections and are easily affected by illumination changes. The descriptor is validated on two datasets to demonstrate its effectiveness.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have done a good job rebutting the negative review (R2’s). The other two reviews were already positive. The paper can be accepted for MICCAI.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

It’s hard for this AC to find the clinical significance of this work. The proposed method works worse than the current available method. It’s not a good idea to use pixel to describe the error.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8

back to top

Real-Time Rotated Convolutional Descriptor for Surgical Environments