Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Haoyin Zhou, Jagadeesan Jayender

Abstract

We propose a novel stereo laparoscopy video-based non-rigid SLAM method called EMDQ-SLAM, which can incrementally reconstruct thee-dimensional (3D) models of soft tissue surfaces in real-time and preserve high-resolution color textures. EMDQ-SLAM uses the expectation maximization and dual quaternion (EMDQ) algorithm combined with SURF features to track the camera motion and estimate tissue deformation between video frames. To overcome the problem of accumulative errors over time, we have integrated a g2o-based graph optimization method that combines the EMDQ mismatch removal and as-rigid-as-possible (ARAP) smoothing methods. Finally, the multi-band blending (MBB) algorithm has been used to obtain high resolution color textures with real-time performance. Experimental results demonstrate that our method outperforms two state-of-the-art non-rigid SLAM methods: MISSLAM and DefSLAM. Quantitative evaluation shows an average error in the range of 0.8-2.2 mm for different cases.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_32

SharedIt: https://rdcu.be/cyhQu

Link to the code repository

https://github.com/haoyinzhou/EMDQ_C

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present a new algorithm, called EMDQ-SLAM, for stereoscopic SLAM, suitable for laparoscopy. SLAM methods from computer vision typically do not perform well on medical applications. This proposed algorithm is demonstrated to work well on a variety of medical datasets in the abdomen. The algorithm uses SURF features to track, the Expectation Maximisation and Dual Quaternion method to compute camera parameters and tissue deformation, a graph optimisation to optimise the shape along with As Rigid As Possible shape constraints, and multi-band blending to ensure colours are maintained.

    Overall, it’s an impressive combination of algorithms, nicely engineered and evaluated.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The major strengths appear to be algorithmic novelty leading to impressive results. Interestingly, the authors have a key insight, to track a smaller number of keypoint nodes, in order to reduce computational demands, and obtain real-time performance.

    The algorithm appears to work well on in-vivo porcine (Hamlyn), lung, heart, heart phantom (Hamlyn), which is impressive.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    None that I can see.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    As far as I can tell, the authors are not making code available yet. (possibly not due to anonymity constraints). Most data is publicly available. They have not promised to share their own data, presumably due to clinical constraints.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The paper was very clear, and an easy read straight through.

    Minor queries:

    1. In Fig. 2: Pictures go (a) - (d), which doesn’t match the caption.
  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Its very well written, an exciting new algorithm, with good experimental work, suitable for this stage of development.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    This paper proposes a non-rigid stereo SLAM approach for surface reconstruction in minimally invasive surgery. The approach provides dense 3D reconstruction of a surgical scene and is able to recover tissue 3D surface under deformation. A multi-band blending method is adopted to preserve good color textures when stitching surfaces across views.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The approach proposed in this paper provides a sound combination of techniques for 3D SLAM under tissue deformation. SURF feature are detected to create initial 3D point cloud based on stereo matching. An expectation maximization and dual quaternion algorithm is then used to recover the motion between frames. For decomposing rigid and non-rigid motion, least-squares is used extract the rigid transformation. To mitigate error accumulation, the as-rigid-as-possible approach is used. Finally, multi-band blending is used for surface stitching for consistent surface visualization. The qualitative results presented in this paper have demonstrated good performance of the proposed approach.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The content in the technical section are missing important details which should be added into the manuscript.

    Although the surface reconstruction/mosaic looks pretty good based on the provided figures, there is no analysis provided on the accuracy of the reconstruction.

    The comparison study presented in this paper is relatively weak. There is no direct quantitative comparison between the proposed approach to the state-of-the-arts.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Public datasets were used in this study. The code of this proposed approach is not available. Reproducing results and re-implementation of this proposed approach do not seem to be straightforward.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    This approach starts with SURF-based sparse stereo matching. Therefore, the accuracy of the whole framework would depend on the accuracy of SURF matching. Can authors provide more evidence why SURF was chosen? And how accurate it is on the surgical videos being used?

    The authors use EMDQ for the refinement on SURF matches. It is not clear how much percentage of matches will survive from this process and how is the sensitivity of the number of SURF matches being used for the least-square fitting? What is the minimum number of matches to guarantee accurate camera rotation and translation recovery?

    It is mentioned in Equation (5) that “w_{m}” is used to handle situations when EMDQ does not distinguish inliers and outliers correctly. Can authors provide more details how this weight is set practically? What values have been used in the provide studies? Are they different across different dataset?

    My main concern of this paper is on its comparison study to the state-of-the-arts. Currently, only qualitative results are provided. And the authors have claimed that their results are visually more accurate. I would like to suggest the authors to perform a more rigorous comparison such that quantitative results can be then provided. Visual inspection is not a preferred way in MICCAI contributions. It is understandable that some of these state-of-the-arts might not have their code publicly available. In this regard, the authors can try to process the same dataset being used by the compared methods, and report the numbers for comparisons.

    EM trackers have been used to provide the ground truth of camera motion. This reviewer is wondering how the EM tracker was attached to the camera? It is well known that EM sensing accuracy will suffer from ferromagnetic materials. Would the metallic part of the stereo laparoscope impact on the ground truth data accuracy? A better tracking device for obtaining this ground truth would be OptiTrak-like trackers.

    The validation section is also missing the evolution on the accuracy of the 3D point cloud. The authors could try to label sparse matches on the stereo images and then calculate the 3D points based on a set of representative “manual” matches. the 3D errors can be then obtained by compare those to the corresponding points in the 3D point cloud from the proposed approach.

    For runtime analysis, can the authors provide a table regarding the “runtime v.s. image sizes”. It is not clear why the runtimes reported are that different for those figures.

    Minor: There are several places, a reference is cited as “Ref.[x]”, please remove “Ref.” in those places. “w” is used in both in Equations (5) and (6). Please use different symbols for the weights in the two equations.

  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This reviewer appreciate the work presented in this paper for achieving stereo SLAM under tissue deformation. However, the current contents of the paper are missing details as mentioned in my comments. And more importantly, a rigorous quantitative comparison study to the state-of-the-arts is missing. In addition, the accuracy measures of the reconstructed 3D point cloud are not provided. Overall, this paper needs significant improvement on its validation study for a MICCAI acceptance.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    3

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    A depth-based SLAM for stereo-camera laparoscopes is proposed. Like DynamicFusion shape deformation is modeled to represent soft tissues that are deformed by time.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Non-rigid deformation is modeled.

    SURF-based feature combined with g2o optimization seems to be working well for analyzing tissue shapes and deformation.

    Comparison with SOTA methods.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Error accumulation outside the field-of-view of the camera is reported by the author.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Overall quality of the clarity and organization of the paper seems to be high, although detailes of the impimentation is not decribed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    I think the contents of this paper is interesting enough. For higher reproducibitlty, maybe some codes or implimentation detailes are desired to be shared.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The quality of the results with sufiiciently clear representaion.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper is about SLAM testing in stereo laparoscopy. R1 and R4 are enthusiastic but R3 recommends rejection. Nonetheless, the arguments of R3 can all be handled for a quite minor revised version of the paper for MICCAI. The AC recommends acceptance of the paper and encourages the authors to implement the changes suggested by R3 (for which R3 provides details) and try to answer R3’s questions as best possible in the revised paper.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2




Author Feedback

Missing contents: Due to page limit, it is difficult to provide all details since SLAM systems are usually complex and consists of multiple modules. For example, we select SURF with one octave layer because EMDQ obtains the displacements of the nodes by interpolating among the feature matching inliers, hence nodes that are distant from the feature matching inliers cannot be accurately estimated. However, ORB feature points mainly distribute at rich texture areas. We are aware of the existence of the improved ORB methods that are able to detect feature points uniformly on the images and have been widely used in rigid SLAM systems, in practice we found that its accuracy is lower. This is acceptable for rigid SLAM because the rigid motion model can be estimated with a few matches. However for non-rigid SLAM, it may result in low robustness because some image areas may not have feature inliers. SURF features distribute uniformly in the image with high accuracy. We are not implying that SURF is the best choice, but in practice it works well.

“This method starts with SURF-based stereo matching”. This is incorrect. Stereo matching is independent with the rest of our method. Our stereo matching method processes each pairs of stereo videos and outputs the depth of image pixels, hence the 3D point clouds can be obtained at each time steps. Then, EMDQ-SLAM mosaics the stereo matching results and obtains a large mosaic. SURF is used in the mosaicking process, not in the stereo matching method.

EMDQ and SURF: Since we use adjacent video frames for tracking, as long as the camera motion is not too fast, the number of SURF matches should be sufficient. To keep real-time, we use a maximum of 1500 SURF feature points and 500 SURF matches, and the rest of features or matches are simply omitted.

Comparisons: As mentioned in the paper, in vivo data does not have ground-truths. MISSLAM is visually much less accurate than ours. DefSLAM does not address the 3D reconstruction problem. Hence it is not necessary to conduct more comparisons.

EM trackers: we attached the trackers to the laparoscope using tape, in practice our device worked well. This device has been used in our surgical navigation system for years.

Runtime: high image resolution lead to slow SURF detection and large number of template points for visualization. Due to page limit, there is no space to show this relationship.



back to top