Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Jie Liu, Xiaoqing Guo, Yixuan Yuan

Abstract

Surgical instrument segmentation is fundamental for the advanced computer-assisted system. The variability of the surgical scene, a major obstacle in this task, leads to the domain shift problem. Unsupervised domain adaptation (UDA) technique can be employed to solve this problem and adapt the model to various surgical scenarios. However, existing UDA methods ignore the relationship among different categories, hindering the model learning discriminative features from a global view. Additionally, the adversarial strategy utilized in these methods only narrows down the domain gap at the end of the network, leading to the poor feature alignment. To tackle above mentioned problems, we advance a semantic-prototype interaction graph (SePIG) framework for surgical instrument type segmentation to grasp the category-level relationship and further align the feature distribution. The proposed framework consists of prototypical inner-interaction graph (PI-Graph) and prototypical cross-interaction graph (PC-Graph). In PI-Graph, EM-Grouping module is designed to generate multi-prototypes representing the semantic information adequately. Then, propagation is performed upon these multi-prototypes to communicate semantic information inner each domain. Aiming at narrowing down the domain gaps, the PC-Graph constructs hierarchical graphs upon multi-prototypes and category centers, and conducts dynamic reasoning to exchange the correlated information among two domains. Extensive experiments on the EndoVis Instrument Segmentation 2017 to 2018 scenarios demonstrate the superiority of our SePIG framework compared with state-of-the-art methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87199-4_26

SharedIt: https://rdcu.be/cyl39

Link to the code repository

https://github.com/CityU-AIM-Group/SePIG

Link to the dataset(s)

https://endovissub2017-roboticinstrumentsegmentation.grand-challenge.org/Data/

https://endovissub2018-roboticscenesegmentation.grand-challenge.org/home/

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes an unsupervised domain adaptation (UDA) framework for surgical instrument segmentation. The proposed framework incorporates the concept of multiple-prototype embedding and graph convolution embedding.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) The idea of combing multiple-prototype and graph convolution for UDA is interesting (to me).

2) The experiments show promising results comparable to state-of-the-art methods.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

I want to clarify that I am not an expert in EM-Grouping and dynamic reasoning, so I can not access the novelty and technical correctness of these two parts. But I still have a few concerns for other parts.

1) The dataset description is insufficient. I think some visualization of examples from the source and the target domain is necessary. If there is no obvious difference between examples from two domains, have the authors considered semi-supervised learning methods? (Note, in UDA, the domain shift should be non-trivial to illustrate a methodological contribution, if this is the one).

2) In Section 2.3, there is missing discussion of the contributions from previous studies. First, using domain discriminator on the predicted masks for UDA is not novel. The first work with similar idea I know should be [1]. [1] Unsupervised Domain Adaptation for Automatic Estimation of Cardiothoracic Ratio, MICCAI 2018.

3) In Section 2.3, the authors use a term called self-supervised loss. There is also missing discussion. It is still an open question to treat noisy peusdo-labels self-supervision. Generating peusdo-labels for unlabeld data is not a new idea, e.g. label progragation. Various similar semi-supervised methods have been proposed to address unlabeled data in semantic segmentation.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The procedure described in the paper seems to be complex. I think the reproducibility might be OK but I am unable to validate it.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Following 4.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My recommendation is basd on my concerns in 4. I am willing to upgrade my score if there is rebuttal phase to address my concerns.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

In this paper, the SePIG framework was proposed to address the shortcomings of UDA in surgical instrument segmentation by introducing graph-based techniques.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This is a very useful work and the data is original. Clinical feasibility is strong.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The parameter setting is not clear. So the authors need to clarify the amount of parameters, layer settings, active functions, and optimization methods. Those are not included in details in this paper. Please add a table to specify them so that the readers can follow up. There is no significant improvement from this work comparing with the previous ones. The difference of measurements is too small, such as dice value changes from 96% to 97% or 95% in table 1 . It may be from the systematic various. Please provide the P-values of the models to illustrate it.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The code and data via a link can be provided for readers to reproduce the work and have a comparison
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The authors need to add parameter settings in the revision. The major contribution from this work is updating graphic-net structure. So besides the information of table 1, the authors need to clarify the amount of parameters, layer settings, active functions, and optimization methods. Those are not included in details in this paper. Please add a table to specify them so that the readers can follow up.

Please add comparisons with other listed works. The difference of measurements is too small. It may be from the systematic various. Please provide the P-values of the models to illustrate it.

Any other technical contributions beside the accuracy? Please redo the comparison experiments and verify other outcomes, such as computational complex, the amount of parameters, time consuming, data robustness, etc.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Clinical background is strong and method is novel.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

This work introduces a method to deal with the challenge of domain shift in the task of instrument types segmentation in scenes of Robot-Assisted surgery. The proposed method, SePIG, incorporates a prototypical inner-interaction graph to represent the semantic information within the image of each domain independently and a prototypical cross-interaction graph to relate the semantic information between both domains. The experimental validation is performed on the EndoVis 2017 -> EndoVis 2018 datasets scenarios and the model achieve state-of-the-art results.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The main strengths of this paper are the following:
- This work has an adequate experimental validation: the main results are compared with methods of the state-of-the-art relevant to the task studied and the proposed model obtains better results.
- The technical novelty of the method is relevant.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The main weakness of the paper is that the description of the proposed self-supervised method for the target domain images does not specify how the pseudo labels are calculated and why they can be considered of high accuracy.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

In the reproducibility checklist it is mentioned that the source code and pretrained models will be made publicly available, which is important to guarantee the reproducibility of the results. Additionally, the method was developed using the public benchmark datasets for instrument type segmentation, which allows to promote research in the area.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Although there is an ablation for each of the different components of the proposed model, it would be interesting to understand the relevance of each of the terms of the loss function. The results suggest that there is a drop in performance due to the domain shift between both datasets. However, the changes in the domain of robot-assisted surgery scenes may be more abrupt. It would be interesting to see results of the method in a more drastic domain change. For instance, surgical scenes that use a greater number of different instrument classes.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well-written and has a significant technical novelty. However, there is no clarity about the proposed method in its whole, thus compromising the reproducibility of the results.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Somewhat confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

All reviewers consider the technical novelty of the approach sufficient and the experimental validation on public datasets adequate, recommending acceptance unanimously. The final version of the paper should include reviewers’ comments, in particular: to clarify the motivation and computation of pseudo-labels, additional experimental details, and a more structured discussion of related work.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

We thank the reviewers for their positive reviews and insightful questions. Please find our response below:

Q1(R1): The domain difference between two datasets. A1: In this work, we set Endovis17 as the source domain and Endovis18 as the target domain. Endovis17 and Endovis18 are procedure datasets recorded by da Vinci XI system and da Vinci X system separately. Comparing with Endovis17, Endovis18 includes more complex porcine tissue and realistic instrument motion. The domain difference can be evaluated by the performance of the model trained with Endovis17 in Endovis18, which drops 16.04% compared with the performance of the “oracle” setting. Furthermore, we show the visualization results in Fig. 2(c). The qualitative and quantitative results demonstrate the proposed models could deal with the domain shift among the source and target datasets.

Q2(R2): No significant improvement in the background. A2: The performance in the background promotes 1.41% compared with baseline (source only). Although the performance of background is not rank 1st among compared methods [12, 15, 21, 23], the proposed SePIG model has significant improvement in instruments with 21.5%, 6.64% and 10.14% for forceps, needle driver and scissor respectively, which is more clinically relevant.

Q3(R1, R3, Meta-review): Unclear motivation of self-supervised method and computation of pseudo-labels. A3: In unsupervised domain adaptation, we can’t access the corresponding label in the target domain. Thus, we adopt the self-supervised method to dig out the supervised signal inside the unlabeled dataset. Specifically, we set up different thresholds for different categories considering the data unbalance problem. If the predicted probability for target data is higher than the threshold, it could be selected as pseudo-label. We will add these in the final version.

Q4(R1, Meta-review): Loss function discussion of related work. A4: The AdaptSeg [17] first proposes to do adversarial learning on the predicted mask, which could be regarded as low dimensional feature alignment, and the CBST [1*] first employs self-training in unsupervised domain adaptation.

Q5(R2, R3, Meta-review): To clarify the amount of parameters, layer settings, active functions, and optimization methods. A5: As mentioned in our paper, we utilize dilated Res101 pretrained on the ImageNet as backbone. We employ 1*1 conv as classifier and ReLU as active function. The optimization method is SGD. In addition, we will release the code with detailed model structure and experiment setting.

[1*] Zou, Yang, et al. “Unsupervised domain adaptation for semantic segmentation via class-balanced self-training.” Proceedings of the European conference on computer vision (ECCV). 2018.

back to top

Prototypical Interaction Graph for Unsupervised Domain Adaptation in Surgical Instrument Segmentation