Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Mengya Xu, Mobarakol Islam, Chwee Ming Lim, Hongliang Ren

Abstract

Generating surgical reports aimed at surgical scene understanding in robot-assisted surgery can contribute to documenting entry tasks and post-operative analysis. Despite the impressive outcome, the deep learning model degrades the performance when applied to different domains encountering domain shifts. In addition, there are new instruments and variation in surgical tissues appeared in robotic surgery. In this work, we propose class-incremental domain adaptation (CIDA) with a multi-layer transformer-based model to tackle the new classes and domain shift in the target domain to generate surgical reports during robotic surgery. To adapt incremental classes and extract domain invariant features, a class-incremental (CI) learning method with supervised contrastive (SupCon) loss is incorporated with a feature extractor. To generate caption from the extracted feature, curriculum by one-dimensional gaussian smoothing (CBS) is integrated with a multi-layer transformer-based caption prediction model. CBS smoothes the features embedding using anti-aliasing and helps the model to learn domain invariant features. We also adopt label smoothing (LS) to calibrate prediction probability and obtain better feature representation with both feature extractor and captioning model. The proposed techniques are empirically evaluated by using the datasets of two surgical domains, such as nephrectomy operations and transoral robotic surgery. We observe that domain invariant feature learning and the well-calibrated network improves the surgical report generation performance in both source and target domain under domain shift and unseen classes in the manners of one-shot and few-shot learning.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_26

SharedIt: https://rdcu.be/cyhQo

Link to the code repository

https://github.com/XuMengyaAmy/CIDACaptioning/settings

Link to the dataset(s)

https://bit.ly/2TaaUsj

Reviews

Review #1

Please describe the contribution of the paper

This paper introduce class incremental domain adaptation for improving surgical scene captioning performance. Transformer-based model is used for this purpose.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This is quite interesting paper to generate surgical scene captioning. In captioning, feature extraction plays an important role. The authors are improving feature extraction process by introducing new distillation loss for CIDA.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

I think this is great paper. Weakness is probably some of lack of explanation of variables. But I think this comes from page limitation. More detailed information is desired for reproducibility.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

More detailed information is desired for reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
This paper introduce class incremental domain adaptation for improving surgical scene captioning performance. Transformer-based model is used for this purpose.
1. What is nmimg1 Is this New model of image1? Some of equations are mixture of programming and mathematical notations. More simple mathematical description is requested.
2. How about of consistency of scene captioning? For example, one cut scene continues 2minutes and scene captioning does not change. How about captioning stability?
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper achieve higher performance by combining recent techniques for laparoscopic vide captioning.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper introduces a domain adaptation framework, which integrated several existing techniques, for surgical video captioning. A public data set is used as the source domain and a in-house data set is used as the target domain.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Performance is improved on the target domain. The DA task is fairly difficult because the instrument type are different between the source and the target domain.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

One major weakness of this paper is the lack of comparison to prior arts. The main contribution of this paper is to propose a new domain adaptation framework, but only comparing to one baseline is enough to show the advantage of the proposed framework. It’s a natural result that the performance of the model is improved when new components or techniques are used. State-of-the-art domain adaptation methods that can be used in the experimented task should be compared. Please refer to [13] for possible comparing methods. Moreover, existing domain adaption methods should be briefly reviewed, especially those similar to the proposed one, such as [13].

Since we have one or few shots labeled data in the target domain, two baselines should be compared to: the first is to first train on the source domain and then fine-tune on the target domain, and the second is to train directly on the target domain.

This paper integrated several existing techniques into one framework but the methodology novelty is limited.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Though some details of training and testing are missing in the paper, the reproducibility is OK because the source code will be released.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

1)Sentence 1-3 page 2: “Class-Incremental (CI) learning methods can learn new instruments absent from SD but will fail if there is a domain shift in robotic surgery [4, 13].” However, paper [13] “Classincremental domain adaptation” deals with domain shift, it will not fail if there is a domain shift. In addition, neither of the two papers include experiments on data of robotic surgery, and the two papers can not support the claim of this sentence. 2)Experiments show that the proposed framework have improved performance in both SD and TD, 3)A series of techniques are used in the proposed framework. It should be clearly stated which part of the framework is novel and which part is just an application of existing techniques. Please focus on the novel parts in summarizing the contribution of this work. 4)There are two DA scenarios, one shot and few shot. Is one shot only one frame in the target domain or on series (50 seconds) in the target domain? How many shots are there in the few shot scenario? 5)The 2nd contribution is to “two-dimensional (2D) CBS in feature extractor”, but 2D CBS is never mentioned after that. 6)The last sentence of the first paragraph of section 2.2 “However, it is unsuitable to deal with domain shift due to the sensitivity of CE loss to training data.” I think reference [13] is one approach to deal with this problem. Please explain the difference and the advantage of the proposed method, when it is compared to [13]. In addition, if CE loss is the reason that makes CI learning unsuitable to deal with domain shift, then which technique solves this problem and makes CI learning suitable to deal with domain shift in this paper, SupCon or CBS, and how? Moreover, in table 1, it seems that CI can be used in the one shot and few shot DA and improve performance on the target domain. 7)More details should be given how the two datasets are sued in training and testing in different experimental scenarios. 8)If possible, the metrics should be briefly introduced. 9)Can CBS be used alone or it should be combined with LS? In table 1, it seems that CBS is used alone in CISC. 10)It is a little difficult to follow the idea of this paper. One reason may be that there are too many acronyms in the main text and tables, and the relationship between them is fairly complicated.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I lean to reject because the methodology novelty is not clear and the baseline is far from being enough.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

The authors presented the class-incremental domain adaptation (CIDA), which aims to handle novel target domain classes under domain shift without the need to retrain all datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The feature extractor and Transformer-like model trained with CBS can extract the domain-invariant features, and generate the surgical report which can describe the instruments-tissue interaction.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Lacks of comparison with other methods.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Very difficulty.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

ResNet is employed as the feature extractor. The authors should explain the current state of technology in details for feature extraction in endoscopic vision.

Instrument segmentation is difficult in the clinic for occlusion caused by bleeding. How to resolve it? or this model only can be used in the simple case.

The author states “The feature extractor is trained by minimizing the class-incremental loss consists of supervised contrastive (SupCon) loss and a novel distillation loss”. Are they trained step by step or same time? If it processed step by step, does the result change by different training order?

Please give the uncertainty of the parameters in the training model.

Please give the Error analysis for this model using 95%CI.

Please give the time consumption for this model in the NVIDIA RTX 2080 Ti GPU.

Please make a comparison with other novel machine learning methods.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Based on the reproducibility of the paper.
What is the ranking of this paper in your review stack?

6
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
Please see strengths and weaknesses of the paper summarized below. Please try your best to address the items under weaknesses and answer reviewer questions in your rebuttal.

Strengths:
- This paper introduce class incremental domain adaptation for improving surgical scene captioning performance.
Weaknesses:
- The proposed framework integrates several existing techniques. It is not very clear what are the novel contributions of this work.
- Only comparison with one baseline is not sufficient
- Some detailed questions and concerns raised by Reviewer #3 and #4 should be answered or addressed if possible.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Author Feedback

We would like to express sincere thanks to the reviewers for their critical assessment of our work.

Critique1: Reproducibility of the paper Response: We share our code, annotation files, and ReadMe file into https://bit.ly/2TaaUsj. We will public all the codes and datasets after the paper gets accepted.

Critique2: It’s difficult to tell the novelty and application techniques Response: In our DA, one issue is domain shift, the other issue is out-of-distribution (novel instruments). To address the novel instruments in TD, we first design the CI learning. However, the CI is not tailored to address domain shift [13]. Thus, in practical surgery where there are domain shift and unseen instruments, we further upgrade CI to CIDA by designing a new distillation loss that is inspired by SupCon loss. Inspired by 2D CBS (It’s an application technique), we also designed 1D CBS to extract domain-invariant features. Our novelties also lie in novel observation: 1) Observed the CBS on non-image data and found that it provides better performance in both SD and TD. 2) Investigated the effect of calibration on DA and found that a well-calibrated network has better capacity in DA. 3) To the best of our knowledge, this is the first work that provides the solution to a rare problem in surgical DA. We first show that a critical issue in surgical DA which is not only domain shift but also new instruments appears in the new surgical domain. We have successfully implemented CI learning and demonstrated the benefits of CI learning into surgical DA with qualitative and quantitative metrics.

Critique3: Lacks comparison with other baselines. Response: To our best knowledge, this is the first work in surgical report generation in robot-assisted surgery until the submission. Therefore there is no existing baseline to compare with. We have done new experiments with other recent baselines X-LAN (https://arxiv.org/abs/2003.14080) and DANN (https://arxiv.org/abs/1505.07818). Our model obtained superior performance among all the baselines. We will add them to Table1 in our revised version. We evaluate the X-LAN on SD:BLEU:[0.5733, 0.5053, 0.4413, 0.3885], METEOR:0.3484, ROUGE_L:0.5642, CIDEr:2.0599 We also evaluate the DA method called DANN on our dataset. DANN on SD:BLEU:[0.5995, 0.5318, 0.4748, 0.4301], METEOR:0.5995, ROUGE:0.5994, CIDEr:2.4672 DANN on few TD:BLEU:[0.6338, 0.5367, 0.4819, 0.4321], METEOR:0.3173, ROUGE:0.5794, CIDEr:3.0407 We have done extensive comparison in supervised/weakly-supervised schemes and in-depth ablation studies to verify the effect of each novel technique.

Critique4: There should be a second baseline: train directly on the TD Response: Yes, the second baselines are as follows. We will add them in the revised version. one shot:BLEU:[0.2408, 0.098 , 0.0319, 0. ], METEOR:0.1051, ROUGE:0.2407, CIDEr:0.1348 few shot:BLEU:[0.5331, 0.4567, 0.4114, 0.3712], METEOR:0.2738, ROUGE:0.5348, CIDEr:2.7496

Critique5: Which technique makes CI suitable to handle domain shift Response: When CI was introduced, the performance was improved due to the handling of the novel instrument rather than domain shift. The new ablation study demonstrates that SupCon empowers the CI to handle domain shift and out-of-distribution simultaneously because Our CIDA outperforms the CI (It is limited to domain shift) and DANN (It’s a new baseline and limited to unseen classes). CBS plays an auxiliary role to achieve better feature extraction. BLEU1 / METEOR / ROUGE CI SD: 0.5571 / 0.3609 / 0.5791 few TD: 0.6156 / 0.3189 / 0.5973 CI+CBS SD: 0.5704 / 0.3528 / 0.5856 few TD: 0.6185 / 0.3119 / 0.5722 DANN SD: 0.5995 / 0.5995 / 0.5994 few TD: 0.6338 / 0.3173 / 0.5794 CI+SupCon (CIDA) SD: 0.6009 / 0.3963 / 0.6317 few TD: 0.6309 / 0.3205 / 0.6046

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Authors explained the technical novelty in better detail in the rebuttal. A new baseline and more comparison results are also provided. I recommend acceptance of this paper, yet authors should include the novelty and application techniques discussions in the final version if accepted.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper addresses an interesting clinical problem of surgical report generation which is only marginally explored in the literature. The authors clarified the main concerns raised by the MR and reviewers, presented the novelty and included the baseline comparisons. Please include these justifications and results in the camera ready.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Although the paper is an interesting engnieerng work, it does not offer strongly novel methodological components or analysis. There was a split among reviewers about the merits of the work and hence the authors are encouraged to improve the novelties and presentation of the results as suggested by the reviewers.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

22

back to top

Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation