Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

David Owen, Maria Grammatikopoulou, Imanol Luengo, Danail Stoyanov

Abstract

Laparoscopic cholecystectomy can be subject to complications such as bile duct injury, which can seriously harm the patient or even result in death. Computer-assisted interventions have the potential to prevent such complications by highlighting the critical structures (cystic duct and cystic artery) during surgery, helping the surgeon establish the Critical View of Safety and avoid structure misidentification.

A method is presented to detect the critical structures, using state of the art computer vision techniques. The proposed label relaxation dramatically improves performance for segmenting critical structures, which have ambiguous extent and highly variable ground truth labels. We also demonstrate how pseudo-label self-supervision allows further detection improvement using unlabelled data.

The system was trained using a dataset of 3,050 labelled and 3,682 unlabelled laparoscopic cholecystectomy frames. We achieved an IoU of .65 and presence detection F1 score of .75. The model’s outputs were further evaluated qualitatively by three expert surgeons, providing preliminary confirmation of our method’s benefits.

This work is among the first to perform detection of critical anatomy during laparoscopic cholecystectomy, and demonstrates the great promise of computer-assisted intervention to improve surgical safety and workflow.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_31

SharedIt: https://rdcu.be/cyhQt

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The paper presents a methodology of detecting critical structures within cholecystectomy surgery. The authors use label relaxation and show significant improvement in performance. Self supervision is also used to train on unlabeled data.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- the paper tackles a very clinically relevant problem of critical anatomical structures detection
- The proposed method works significantly well on the used dataset
- Thorough analysis on results is done
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Size of labelled dataset seems fairly small
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Unsure about paper reproducibility as authors do not mention making code publicly available upon acceptance
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The paper is well written overall and tackles a very clinically relevant problem. I believe detection of critical anatomical structures in surgery can help many other applications e.g it can help in recognizing surgical phases based on what structures are present in view. The proposed label relaxation adds to the performance. The use of self-supervision is also a key within the paper as utilization of unlabeled data is becoming more and more important in the surgical domain. My main concern is with the size of the labelled data. While I can understand that labelling such images for segmentation can be very time consuming, it can be hard to draw conclusions from dataset of this size within deep learning.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The problem tackled in the paper is very clinically relevant and the results presented show promise.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

The paper proposes to tackle an essential task of critical structure detection for surgical safety. Label relaxation and pseudo labeling are applied to address this task. Experiments on the in-house dataset are conducted to validate the effectiveness of methods. User study with surgeons is also performed to verify the applicability of the method on real-world usage.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Take the first step to conduct an essential task for surgical satefy.
- Extensive experiments for validating the methods
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The novelty of method is weak, with two well-established methods of label relaxation and pseudo labeling directly applied to tackle this task
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility is good, with most details clearly described.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
Major:
- The definition and annotation process of the task can be better described. Why not formulate this task as a 3-class segmentation task, i.e., cystic duct, artery, and bile duct? May show the more significance for surgical safety.
- Labeling relaxation is mainly alleviating the boundary ambiguity in annotations. If we can more correctly annotate the critical structure, is the labeling relaxation strategy not that needed?
- The training procedure of teacher-student architecture is better to be more clearly described. Minor:
- In Fig. 1, which one (blue or green) is defined as the ground truth? How to define this region?
- Better to show the ground truth in Fig. 2.
- Cannot successfully see the supplementary video.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Take the first step to tackle one essential task for surgical safety.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

4
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The manuscript presents an automatic approach to detect the critical structures (cystic duct and cystic artery) in laparoscopic cholecystectomy. The authors propose a heatmap-based approach to modify the ground truth annotations using euclidean distance transform to handle the ambiguous boundaries and challenging annotations. The heatmap-based approach works better than the traditional segmentation. The accuracy is further improved by exploiting the pseudo labels on the unlabeled frames and joint training with the pseudo labels and the ground truth.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The detection of critical structures (cystic duct and cystic artery) is an essential step towards safe cholecystectomy. The proposed heatmap-based approach using euclidean distance transform works better than the naive segmentation baseline and is further improved by exploiting pseudo labels on the unlabeled data.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Comparison with recent methods for label relaxation and handling of boundaries is missing
- The work is a necessary but not sufficient step towards the CVS
- Clear explanation on evaluation metric is missing
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Datasets and code: The authors have used a private dataset. The authors have neither provided nor mention the availability of dataset, models, training/evaluation code upon acceptance.
- Experimental results: No result on the different hyperparameters setting. The authors used fixed hyperparameters.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
The authors should discuss the following points:
- The sentence “This may be beneficial for guiding surgical workflow, as once CVS has been achieved then by definition the structures are already identified and dissected, meaning classification at this stage is less useful” is incorrect according to the definition of CVS [7, 8]. The detection of only two critical structures (cystic duct and cystic artery) entering into the gallbladder before clipping and cutting is one of the necessary but not sufficient criteria of CVS. Once the structures are dissected, it is impossible to achieve CVS. The author should reformulate this sentence.
- The proposed heatmap-based segmentation aims to handle the confusing annotations for the critical structures, especially at the boundaries. Recent works such as [*1, *2] handle the boundaries more effectively. How does authors’ work compare to these?
- Could the authors confirm the IoU calculation for the heatmap-based method? As the segmentation evaluation is done using the threshold, is it the same threshold being used for the heatmap-based method? Both methods should use the same threshold on the same ground truth test images for a fair comparison with the segmentation method.
- As the authors stated that the results are not sensitive to parameter tuning for making the heat maps, it would be interesting to see if the authors include the 10 pixels “ignore” border in training.
- Section 3.1, The authors should explain how the frames are selected. Are these frames selected randomly, at a fixed frequency, or hand-picked? Also, what is the rationale behind selecting the frames after the CVS? The authors should explain more on the annotation process and dataset characteristics in terms of class distribution.
- Section 3.4, Assessment from surgeons, seems inconclusive. The authors have shown a preference in one set, the opposite preference in the other set. The sentence “although one participant did note that in one frame common bile duct was detected,” further reinforces first point for the problem regarding the setup for the CVS.
Minor:
- Figure 3d is not clipping but bipolar coagulation
[1] Yuan, Yuhui, et al. “Segfix: Model-agnostic boundary refinement for segmentation.” European Conference on Computer Vision. Springer, Cham, 2020. [2] Zhu, Yi, et al. “Improving semantic segmentation via video propagation and label relaxation.” Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed work provide an autmatic appproach to detect the critical structures which can lead to safe cholecystectomy procedures. The authors need to do some more comparison with recent label relaxation approaches with their method that could establish the effectiveness of their approach.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The reviewers and myself agree that the paper is of high enough quality for acceptance at MICCAI. All reviewers have favorably reviewed the work, and have also provided constrcutive feedback to the paper to improve its quality. I would ask the authors to take these into account before submitted their final version.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Author Feedback

We thank the reviewers for their valuable suggestions, and for their encouragement of our work. Below, we reply to the main comments.

R1: Size of labelled dataset seems fairly small

We agree the dataset is comparatively small, and we’d like to expand upon it in future work. However, we believe it is large enough to draw initial conclusions. Many datasets (e.g. Cityscapes) are of comparable size, and related work uses similar dataset sizes (Mascagni et al, Tokuyasu et al, already referenced in the paper). We will add this discussion to limitations.

R2: The definition and annotation process of the task can be better described. Why not formulate this task as a 3-class segmentation task, i.e., cystic duct, artery, and bile duct?

We will add further discussion, and agree it might be beneficial. We touch on this when we say “[w]e hope to develop the method further by using a greater variety of anatomic classes”.

R2: Labeling relaxation is mainly alleviating the boundary ambiguity in annotations. If we can more correctly annotate the critical structure, is the labeling relaxation strategy not that needed?

We are uncertain whether it can really be answered. If the boundaries are truly ambiguous, is it meaningful or reliable to annotate them “more correctly” beyond a certain point? Medical annotators labelled the structures with review by an anatomy specialist. It would be hard to improve the data - yet label relaxation does still help.

R2: The training procedure of teacher-student architecture is better to be more clearly described.

Thank you, we agree and we will add more information as supplementary material.

R2: Minor

Thank you for these suggestions, we will address them.

R3: The detection of only two critical structures (cystic duct and cystic artery) entering into the gallbladder before clipping and cutting is one of the necessary but not sufficient criteria of CVS. […]

Thank you for raising this. We will correct this sentence.

R3: Recent works such as [*1, *2] handle the boundaries more effectively. How does authors’ work compare to these?

These methods have a different objective, but may be relevant. Our method explicitly downweights the edges, arguing that for localisation, it can be harmful to treat them as equal in importance. These works focus on improving the model performance at the edges. They also assume the original labelled data have unambiguously correct edge annotations.

We will add the above as discussion, and we will try to add SegFix results to our ablation study in time for camera-ready.

R3: Could the authors confirm the IoU calculation for the heatmap-based method? As the segmentation evaluation is done using the threshold, is it the same threshold being used for the heatmap-based method?

The same threshold is used, and we will add this to the text.

R3: As the authors stated that the results are not sensitive to parameter tuning for making the heat maps, it would be interesting to see if the authors include the 10 pixels “ignore” border in training.

We included this in evaluation and not in training, and will clarify this in the text.

R3: Section 3.1, The authors should explain how the frames are selected. Are these frames selected randomly, at a fixed frequency, or hand-picked?

Frames are sampled at 1fps from windows taken around CVS. We will add this to the text.

R3: Section 3.4, Assessment from surgeons, seems inconclusive. The authors have shown a preference in one set, the opposite preference in the other set.

We agree the results are mixed, but we are upfront about that - we show both sets of results separately in the table and discuss in Section 4.1. We also present significance tests in each set. We’d say “surgeons show a preference in one set and do not show a statistically significant preference in the other set”. We will clarify this further.

R3: Minor

Thank you, we will review and update this.

back to top

Detection of critical structures in laparoscopic cholecystectomy using label relaxation and self-supervision