Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Baoru Huang, Jian-Qing Zheng, Anh Nguyen, David Tuch, Kunal Vyas, Stamatia Giannarou, Daniel S. Elson

Abstract

Dense depth estimation and 3D reconstruction of a surgical scene are crucial steps in computer assisted surgery. Recent work has shown that depth estimation from a stereo images pair could be solved with convolutional neural networks. However, most recent depth estimation models were trained on datasets with per-pixel ground truth. Such data is especially rare for laparoscopic imaging, making it hard to apply supervised depth estimation to real surgical applications. To overcome this limitation, we propose SADepth, a new self-supervised depth estimation method based on Generative Adversarial Networks. It consists of an encoder-decoder generator and a discriminator to incorporate geometry constraints during training. Multi-scale outputs from the generator help to solve the local minima caused by the photometric reprojection loss, while the adversarial learning improves the framework generation quality. Extensive experiments on two public datasets show that SADepth outperforms recent state-of-the-art unsupervised methods by a large margin, and reduces the gap between supervised and unsupervised depth estimation in laparoscopic images.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_22

SharedIt: https://rdcu.be/cyhQf

Link to the code repository

N/A

Link to the dataset(s)

http://hamlyn.doc.ic.ac.uk/vision/

https://endovissub2019-scared.grand-challenge.org/Home/

Reviews

Review #1

Please describe the contribution of the paper

The authors proposed a novel method for self-supervised adversarial depth estimation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors proposed a self-supervised adversarial depth estimation. They also applied the disparity smoothness loss and formed the network across multiple scales. Two public endoscopic datasets were used to prove the effectivity of the proposed algorithm.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The evaluation metrics for two datasets are different, Why?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

There is no code or data available, and it is not easy for the reproducibility of the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The novelty of paper is good.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The novelty of the proposed algorithm for depth estimation is good. The experiment results based on two datasets are better than others.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper proposed an end_to_end self-supervised network for depth estimation in laparoscopic Images.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. They use re-projective sampler to reconstruct stereo images for self-supervised and adversarial learning
2. Two datasets have been used to evaluate the proposed network.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. In evaluation, they should compare their results with the non-learning based methods too. For the depth estimation of the stereo images, many non-learning based methods performs well.
2. The two comparison methods described in [8] and [9] are both for the depth estimation of monocular camera. As we know, the depth estimation of monocular images is much more challenging than that of stereo images. This comparisons were not fair. In the meantime, the proposed network only works for the depth estimation of stereo laparoscopic images.
3. The qualitative results are missing, which make the evaluation results unimpressive. Without the intermediate results, It’s difficult for readers to understand how the network works and how well it works.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

This paper is not reproducible as they won’t provide code.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. In evaluation, they should compare their results with the non-learning based methods too. For the depth estimation of the stereo images, many non-learning based methods performs well.
2. The two comparison methods described in [8] and [9] are both for the depth estimation of monocular camera. As we know, the depth estimation of monocular images is much more challenging than that of stereo images. This comparisons were not fair. In the meantime, the proposed network only works for the depth estimation of stereo laparoscopic images.
3. The qualitative results are missing, which make the evaluation results unimpressive. Without the intermediate results, It’s difficult for readers to understand how the network works and how well it works.
4. Is it possible to create ground truth for this task by mounting other scanning devices onto the laparoscope and calibrating them?
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. Comparisons with non-learning methods were not performed.
2. The comparison methods in [8] and [9] are for monocular camera.
3. Qualitative evaluation results are missing
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

This paper presents an unsupervised depth estimation algorithm using generative adversarial networks (GANs). The generator is trained to estimate disparity maps between the input image pairs, which is then used to estimate depth. Depth estimates are projected onto the image planes and the discriminator tries to determine whether the image generated is the original input image or that generated by projecting estimated depth onto the image plane. Results show improvements over other unsupervised depth estimation methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Reasoning behind the type of neural network and use of the different loss functions is intuitively explained and results are compared against several other methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper was fairly hard to read and follow. Once the reader is through the methods section, the reasoning makes sense. However, getting through the methods section was tough even though the actual technical method, from my understanding, is not too complicated. The methods section would benefit from a more straightforward explanation of steps.

The main weakness of the paper, however, is the lack of network details. While authors do include several parameter settings in the main paper, details about the network architecture are left to supplementary material, which unfortunately is not published along with the main paper.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Due to the network architecture details being left to supplementary materials, the method as it is explained in the paper alone may be hard to reproduce.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

One thing that would really help in clarity is to explain the method step by step as the data travels through the network. Currently, the methods section seems overly complicated.

While the number of methods compared against is definitely a positive, there are a few methods that are referenced but not compared against. Is there a reason why?

Finally, although the large number of quantitative comparisons is great, it would be nice to see at least one qualitative result. How does an SSIM of ~80 translate in terms of depth estimates?
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My recommendation is based mostly on the lack of network details in the main paper. The clarity of the methods is also a contributor although less important than network details. If the authors can include network details in the paper and better organize their methods section, this could be bumped up to borderline accept.
What is the ranking of this paper in your review stack?

4
Number of papers in your stack

6
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The reviewers agreed that the paper has some merits. However, two reviewers pointed out problems in the experiment section, e.g. the evaluation metrics for two datasets are different, the comparison is unfair, etc. In addition, R3 thinks the description of the method, in particular the network details are missing. Please clarify these points in the rebuttal letter.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

Author Feedback

We would like to thank all of the reviewers for their constructive comments. We have addressed all the points raised, including: i) our source code will be released, ii) comparison with non-learning methods, iii) clarification of stereo laparoscope settings, iv) adding qualitative results, and v) re-organizing Methodology section and provide more network details in the manuscript.

Reviewer 1

Evaluation metrics: The evaluation metrics for the two datasets are different because SCARED provides the per-pixel depth ground truth, while dPVN does not. Therefore, for the SCARED dataset we use the mean absolute error to compare the predicted depth map with the ground truth depth image. For the dPVN dataset, we used the SSfIM metric to compare the reconstructed image with the original image which is a standard process in unsupervised depth estimation.

Dataset and code: Both SCARED and dVPN datasets are available online. We will also release our code.

Reviewer 2

Comparison to non-learning methods: In Table 1, we did compare our framework with two non-learning methods namely, ELAS and SPS. Following this suggestion we have also reproduced the results of ELAS and SPS in Table 2.

Comparison to monocular methods: In [8] and [9], an extension of the proposed monocular reconstruction framework was proposed for stereo reconstruction, with publicly available code. In our work, for fair comparison, we compared our framework to the stereo implementation of [8] and [9].

Qualitative results are missing: This was mainly due to page length limitations but we have created qualitative results and we have added them to the main paper by compressing the references to save space.

Generate ground truth data by mounting other scanning devices: This is a good suggestion but challenging to achieve in practice. We — and other research groups — have previously used commercial RGB-D cameras to attempt this, primarily in a laboratory or pre-clinical setting. However, the working distance of the laparoscope is shorter than most depth cameras, resulting in sparse and noisy depth maps. Another strategy is to build a system with a structured lighting device that can project an encoded pattern onto the tissue but it is difficult to generate significant amounts of accurate data in endoscopic applications. We continue to work on this problem including designing new hardware and exploring more efficient algorithms to overcome some of these limitations.

Reviewer 3

Clarity of method explanation: We have restructured the Methodology section in the main paper and present a step-by-step explanation of how the data travels through the network. A more straightforward description of the method has been given to show network details.

Compared methods: We compared our method with recent methods that satisfy the following conditions: i) they used stereo datasets; ii) they provided stereo training settings; iii) they had executable code released. We referenced other papers where the ideas were valuable to this work, even where a direct data comparison was not possible based on these criteria.

Qualitative results and SSIM score comparison to depth accuracy: Please refer to our response to Reviewer 2 question 3. We have generated and visualized the reconstructed image, depth image, and input image size-by-size, together with the associated SSIM score. These are separate metrics that can be used to understand the model performance, and the qualitative results now complement the quantitative ones to help illustrate the relationship.

More network details: We accept this critique and have now improved the description of the network and clarified the approach in the methods section. We have also moved the network figure from supplementary materials to the main body to improve the explanation of the method.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have explained the reason of using different metric for evaluation of the two datasets. That is becase SCARED provides the per-pixel depth ground truth, while dPVN does not. The authors also clarified that they did compare their method with non-deep learning methods as requested by R2 and explained the reasons of not generating ground truth data by mounting other scanning devices. Regarding the qualitative results requested by R2 and R3, the authors would add some in the revised version if the space is allowed. In general, the AC is satisfied with the authors responses as the major concerns have been addressed and clarified in this rebuttal letter.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The initial reviews pointed out some weaknesses of the paper, including comparisons to previous work and clarity. The authors used the rebuttal to address these critics in a reasonable way. The task at hand – 3D reconstruction in laparoscopy – is yet a difficult and unresolved task. Even if the paper doesn’t bring a strong technical contribution, it’s still a useful one, as an early attempt of GANs for this problem. I would thus recommend acceptance of the paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper proposes a method for self-supervised depth estimation in laparoscopic images using generative adversarial networks. The novelty of the approach is adequate, and the paper presents experimental results of two public datasets. The rebuttal promises to release public code, which assuages concerns about reproducibility and technical clarity. It also clarifies that empirical comparisons are conducted against the stereo versions of [8,9] methods, which is essential to ensure the fairness of the evaluation.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

back to top

Self-Supervised Generative Adversarial Network for Depth Estimation in Laparoscopic Images