Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Negin Ghamsarian, Mario Taschwer, Doris Putzgruber-Adamitsch, Stephanie Sarny, Yosuf El-Shabrawi, Klaus Schoeffmann

Abstract

A critical complication after cataract surgery is the dislocation of the lens implant leading to vision deterioration and eye trauma. In order to reduce the risk of this complication, it is vital to discover the risk factors during the surgery. However, studying the relationship between lens dislocation and its suspicious risk factors using numerous videos is a time-extensive procedure. Hence, the surgeons demand an automatic approach to enable a larger-scale and, accordingly, more reliable study. In this paper, we propose a novel framework as the major step towards lens irregularity detection. In particular, we propose (I) an end-to-end recurrent neural network to recognize the lens-implantation phase and (II) a novel semantic segmentation network to segment the lens and pupil after the implantation phase. The phase recognition results reveal the effectiveness of the proposed surgical phase recognition approach. Moreover, the segmentation results confirm the proposed segmentation network’s effectiveness compared to state-of-the-art rival approaches.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_8

SharedIt: https://rdcu.be/cyl9N

Link to the code repository

https://github.com/Negin-Ghamsarian/AdaptNet-MICCAI2021

Link to the dataset(s)

http://ftp.itec.aau.at/datasets/ovid/LensID/

Reviews

Review #1

Please describe the contribution of the paper

This paper presents an approach for len irregularity deteciton in cataract surgery. Overall, the paper is well organized and easy to follow. The experimental results also show some improvement.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well organized and the topic is of interest to MICCAI audience.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The is lack of visulaization in the results, which make it hard to understand the difficulties and challellenges of the problem.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The work shall be reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Some results shall be given for illusration.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors proposed Adaptnet with some new moduels. The results show some improvement compared with other methods.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper introduces a novel framework for detecting the lens irregularity detection. The RNN is used for recognize the lens-implementation phase and a move semantic segmentation network is used to segment the lens and pupil after the implementation phase. The experiment results outperforms the state-of-the-art.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1.This paper introduce one interesting framework which can predict the lens-implementation phase and also implement the semantic segmentation part.
1. Three interesting modules are introduced in the segmentation phase, which are novel and effective for the segmentation. These modules are useful for developing the algorithm of semantic segmentation.
2. The experiment results are outstanding and outperforms the SOTA.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. For the phase recognition, can author provide more detailed information about how many classes the video have? And what is the output in the phase recognition. Current description is not clear.
2. In the SHA block, why does a hard tangent hyperbolic function follow the conv layer? Not using the Conv layer to generate the offset directly?
3. The relationship between the phase recognition and segmentation is unclear.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

None.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Please clarify the points listed in 4.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Totally, the proposed method has the technique contribution. However, some points are needed to be clarified.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

This paper provides a new recurrent neural network which can be used to detect the implantation phase in cataract surgery videos. It provides also a convolutional neural network able to do semantic segmentation of the lens and pupil in eyes images/videos. Mainly, the novelty lies in the proposed modules called the cascade pooling fusion (CPF) module, the scale-adaptive feature fusion module (SSF), and the shape adaptive (SHA) block. Last but not least, it is the first time that deep learning is used for cataract surgery to detect lens dislocation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The strength points in this paper are clearly the CPF, SSF, and SHA modules, helping to perform good semantic segmentations of the pupil and of the lens, but the lack of references at critical locations in the paper make not clear which architectures are (or not) contributions in this paper. Also, the numerous figures really help to understand how the complete networks are built.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The lack of references in Section 3 is very confusing: it is hard to understand what is (or not) a contribution. Furthermore, I did not understand why the authors spoke about the U-Net based segmentation network since at the end, they use a VGG16, plus two of their modules. Perhaps is it due to some skip connections, but I do not see from which module to which module they go. Furthermore, the different modules seem very promising but the lack of technical details make the network note reproducible. An ablation study of the network (component by component) seems essential to show the effectiveness of each of them in the complete architectures.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I am not sure that this paper can be reproduced, since they are some confusions in Section 3, even if the diagrams (Figures 1 and 2) seem very clear at the first insight.

Also, as said before, many technical details are missing/confusing: in Figure 1, second row, green blocks seem to be sometimes features and sometimes softmax, but the legends say that they are assumed to always be softmax. Also in Section 3 about the methodology, in the phrase “As the first step”, I do not get what is the referred procedure it is the first step of. Furthermore, at the end of the paragraph “Lens & Pupil segmentation”, the authors speak about some decoding, but we do not know which convolution layer is used, is it a transposed convolution or an upsampling followed with a convolution layer? Or something else?
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
General comment:

I think the most important improvement of this paper would be to make clear which components (networks, blocks, modules) are new or not, to explain their functionalities separately, and also to make an ablation study to show how much important each of these complicated components are in the final architectures.

Besides, I would suggest to the authors to make a diagram with the VGG16 and the two modules (represented by blocks) to make clear why it is a U-Net based architecture.

Minor/detailed comments:

• thicker dotted lines would be welcome in Figure 1 so we can see more easily the shared weights and the connections among the different blocks.

• first page: “3D” -> “2D+t” since the analyzed signals are videos,

• page 2: the phrase “achieving accurate … segmentation” must be rephrased from (iii) to segmentation, it seems not to be well constructed.

• page 2: “segmentation performance for” -> “segmentation performance of”?

• page 2: “approaches in the mentioned” -> “approaches related to”?

• page 2: “As for pruning to suboptimal results”: I do not get the meaning of this phrase, please correct it.

• Figure 1: features and softmax seem to have the same green color, it is confusing.

• Figure 1: second row: what is the dimension of the prediction, is is a scalar? Please detail it.

• Figure 1: why is the average pooling used? I did not see the motivation or justification, I would suggest the authors to compare to the Max Pooling which is well-known to less blur the signals.

• Figure 1: third row: what the colors correspond to? I see gray, green and orange colors but I do not see the legend indicating that orange is CPF, green is SSF, and gray are simple features.

• Page 3: the authors use the term “Unet” but the right term is “U-Net” for obvious reasons (the same thing applies later for the ResNet). Please correct it all over the paper.

• Page 3: “Figure 1 demonstrates” -> I think that “Figure 1 shows” would be better.

• Page 3+: Section 3 is globally not enough detailed to my opinion, and for this reason I estimate that the paper is not reproducible. For example, the post-processing is not detailed. The authors also speak about a convex polygon surrounding the pupil but I do not see where the resolution of this polygon is given (number of nodes). Furthermore, the morphological operations the authors speak about are not detailed; are they closings? openings? dilation? erosion? and I do not see any reference which would give an insight of what has been done in this way.

• Page 4: In the “Phase recognition” paragraph, I think it would be nice to refer to the associated figure, so we can understand more easily that the coefficient of the DropOut is 0.5, and so on. Furthermore, the authors speak about the pretrained backbone but do not tell at this location which backbone is used.

• Page 4, in the subsection related to “Lens & Pupil Segmentation”:
- “It is usually being unfolded”, perhaps remove the term “being”?
- “lens’s shape” -> “lens’shape”
- the acronym “CPF” has not been defined before the authors use it, the same thing applies for the acronym “SSF”.
- Adaptnet has to be in italics only once, when the authors define it, but not anymore in the sequel.
- The VGG16’s reference is missing, and the same thing applies for ImageNet.
- “this feature map is … decodes” : I do not get if the decoding is done with transposed convolutions or some other layer. Please be more specific.
- “fed to” -> I think that “fed into” is more appropriate (many times in the paper).
- “to the input features” must be placed before “applies a sequence”, otherwise it is very confusing.
- “the generated feature maps … refinement”: I am sorry but I do not get at all this phrase, please rephrase it and be more explicit about the “refinement” procedure.
- “shared weights”: these weight are shared between which layers? Are they shared between the decoding convolution layers? Please develop how and why.
- what do the authors mean by “fine-grained” for features? Do they speak about fine resolution?
- do the “upsampled semantic feature maps” come from the CPF module in the SSF module?
- Are the scale-adaptive and shape adaptive block new? If they are, the authors must be more explicit, in particular because they seem to be important contributions. However I do not get why are they so effective and no ablation study justifies so complex architectures. I think this last point is crucial for this paper.
- about the citation of paper [6], it would be nice to describe in some lines how this “Deform Conv 2D” block works so that the paper is self-contained.
- “in case” -> “in the case”
• Section 4:
- “gaussian” -> “Gaussian”
- “dice” -> “Dice”
- about he BCE, please recall its formula to avoid ambiguities. Also, I do not get if CE refers to binary or categorical cross-entropy. Please precise.
• Section 5: what are the references of BiGRU and of VGG19, I think it would be nice to recall at least these ones in this section.

• Conclusion:
- the term AdaptNet has already been defined before, citing it using italics is not necessary.
- “Spatio-temporal” -> “spatio-temporal”
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Mainly, this paper is not clear enough to be reproducible (see above), but I am sure that with some improvements related to what I said before, it can become a very nice paper, at the condition that a complete ablation study and explanations are provided for the different blocks and modules.
What is the ranking of this paper in your review stack?

4
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #4

Please describe the contribution of the paper

This paper addresses two main tasks: lens implantation phase recognition in videos with a recurrent neural network (RNN), and lens/pupil segmentation using a novel semantic segmentation network based on U-Net, named AdaptNet. Phase recognition appears classifiable with extremely high accuracy across a variety of backbones and RNN architectures (Table 1), while the proposed Adaptnet was shown to outperform existing methods on lens and pupil segmentation (Figure 3), with UNet++ coming close.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The main innovation appears to be the formulation of the AdaptNet architecture, which was comprehensively compared against existing methods. The architecture of AdaptNet was described in detail in Section 3/Figure 2.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Notably, the main features of the architecture (CPF/SSF modules) appear adapted from prior work (cited as [6,7,11]). This is not in itself problematic, but the contributions of each of these new features might not have been individually explored/justified through ablative experiments.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It is stated that the three datasets involved will be released upon acceptance. The proposed AdaptNet architecture is described in detail.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. While the ultimate objective is stated to be towards lens irregularity detection, it is not altogether evident as to how the presented tasks (phase detection, lens/pupil segmentation) contribute towards detecting lens irregularity. In particular, whether a lens is indeed irregular, and what the characteristics of such lens are, do not seem to be explored in the experiments. This might be clarified.
2. It is not immediately clear as to whether the lens/pupil segmentation is evaluated for a single frame at the appropriate phase (i.e. is essentially equivalent to image segmentation), or whether multiple video frames are considered. In particular, lens instability and unfolding delay are stated to be estimated over time at the beginning of Section 3, but details on segmentation are sparse in Section 4, with most of the descriptions appearing to involve the phase recognition task. Details for the AdaptNet segmentation (e.g. number of frames/images for training/testing, parameter optimization for AdaptNet and other segmentation methods, etc.) might be further discussed.
3. The procedure by which the ground truth was obtained for lens/pupil segmentation, might be described further. Were there any standards that were followed?
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

There is a lack of clarity in describing the segmentation task and experiments, and the individual contributions of the proposed AdaptNet model might have been dissected.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

6
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This is a very interesting paper. The project/product is helpful in clinical usage; however, many details are omitted due to page limit and poor organization. Given four inconsistent reviews, you are accordingly invited to submit your rebuttals to address the major comments, especially to: 1) indicate your innovation components (networks, blocks, modules) and their functionalities separately, and explain how important each component is, (better) with sufficient ablation study results; 2) make a diagram with the VGG16 and the two modules (represented by blocks) to make clear why it is a U-Net based architecture; 3) explain how to obtain the ground truth for lens/pupil segmentation; 4) clarify how the presented tasks (phase detection, lens/pupil segmentation) contribute towards detecting lens irregularity, now we can only find an example at the end of the experiments. Also, to me, the major study of this paper is lens segmentation; however, this part is not strong enough to support a MICCAI paper. The whole system/application is somehow new, but the technical contribution is limited.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8

Author Feedback

We would like to thank all reviewers for their constructive feedback. The ablation study has been performed in response to reviewers’ comments. We have listed the Dice and IoU percentage with two different learning rates by gradually adding the proposed modules and blocks (for lens segmentation). It can be perceived from the results that regardless of the learning rate, each distinctive module and block has a positive impact on the segmentation performance. We cannot test the FFD block separately, since it is bound with the SSF module.

Learning-Rate Metric, U-Net with VGG16-backbone, Adding SSF (without SHA block), Adding SHA block, Adding CPF (Whole AdaptNet)

0.001 Dice, 89.94, 90.38, 91.12, 91.28

0.001 IoU, 82.79, 83.54, 84.76, 85.03

0.002 Dice, 90.90, 91.22, 92.17, 92.62

0.002 IoU, 84.33, 84.99, 86.34, 87.09

We can shorten the introduction and add the complete ablation experiments with explanations to the paper. Besides, we will release the PyTorch implementations of AdaptNet and phase-recognition network with the acceptance of the paper.

To Reviewer 1) We appreciate it if you check the output videos of the framework used for “lens unfolding time and instability” computation in supplementary materials.

To Reviewer 2) For phase recognition, we have two classes: the implantation phase and the rest (binary classification). The hard Tanh function is used to limit deformable sampling into a 3*3 window around each pixel position in the regular kernel.

To Reviewer 3 and Meta-Reviewer)

Regarding the new components, (I) We have introduced the first deep-learning-based framework towards irregularity detection in cataract surgery, (II) We have proposed a recurrent CNN architecture that can precisely detect the implantation phase, and (III) We have proposed a novel segmentation network.

Regarding novelties in AdaptNet: (a) Fusion of deformable and structured features (SHA block), (b) Fusion of successive deformable features (SSF module), (c) Pixel-wise attention map computation via shared-feature extraction (FFD block), (d) Sequential pooling and distinct inter-channel and intra-channel feature extraction (CPF module).

Regarding the diagram, we already have the block diagram of AdaptNet in Fig. 1 (bottom) with the blocks of VGG16 (shown as encoder blocks) and the two modules. You see four skip connections from the encoder blocks to the decoder modules (CPF and SSF) same as U-Net (simple black arrows). We change Fig. 1 and add suggested descriptions based on reviewer 3’s comments to avoid ambiguities.

To Reviewer 3)

Regarding the 3D image, the surgeon can perceive the depth through the microscope.

Morphological operations: opening, closing, and convex polygon. For convex polygons, we used the Scipy “ConvexHull” function. The number of nodes depends on the input shape and is not determined by users.

Decoding is performed by upsampling (bilinear) as shown by dashed arrows in Fig. 1.

As shown in Fig. 1, bottom, the semantic feature-maps come from the CPF module for the first SSF module, and from the previous SSF modules for other SSF modules.

To Reviewer 5 and Meta-Reviewer)

Contributing towards detecting lens irregularity: The two main risk factors of lens irregularity after surgery are lens unfolding delay and lens instability. We compute these two parameters through the LensID framework. Automatic detection of these irregularities requires accurate phase recognition and lens/pupil segmentation, which is the focus of this paper. The remaining work is purely based on a statistical analysis of unfolding time and instability for different lenses from hundreds of surgeries and will be the subject of future work.

Lens/pupil segmentation (and subsequently statistical analysis) starts exactly at the end of the implantation phase and stops at the end of surgery (with 25fps).

Pixel-wise Lens/pupil annotations (ground truth) are performed using the “Supervisely” platform based on the guidelines from cataract surgeons.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors failed to respond to the issues summarized by the primary AC, especially the technical novelty, ground-truth annotation, and validation of the segmentation. The application is somehow new, but maybe not the 1st one [*], which is one step further from existing work on Instrument Tracking in Cataract Surgery videos. The major improvement is the segmentation part, which is not well disclosed in this paper. This paper focuses on stating a complex system, not a novel technical algorithm, which is not easy to be clearly stated in an 8-page short conference paper and more suitable for journal submission. In addition, adding new experimental results in the rebuttals is NOT allowed. Therefore I recommend rejecting this paper.

[*] Shoji Morita, et al. Real-Time Surgical Problem Detection and Instrument Tracking in Cataract Surgery, J Clin Med. 2020 Dec; 9(12): 3896.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

14

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Authors propose a novel approach to assess eye lens irregularity deteciton in cataract surgery. This is an application domain I am not familiar with but it seems to me that it explores a clinically relevant problem. Specifically, two different tasks are tackled: 1) lens implantation phase recognition in videos using a recurrent NN and 2) lens/pupil segmentation using an adapted U-Net (AdaptNet). The results show improvements as compared to other baseline strategies. As reviewer 3, I’ve found the paper difficult to follow, with lack of references and technical details. Some are now included in the rebuttal and should be also included if the paper is finally accepted. The authors could have more supplementary material if needed for those technical details. The authors have now also included in the rebuttal a very interesting ablation study that explores the value of the added modules. I think that given the novel method and correct evaluation, the paper is interesting to be accepted. The major drawback was overall the lack of clarity in many points as raised by the meta-review that I think are now clarified in the rebuttal.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

12

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The proposed method addresses a highly clinical relevant problem which remains largely underexplored. The authors have sufficiently addressed all major comments in the rebuttal. These updates and justifications must be included in the camera ready.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

back to top

Learning-Rate	Metric, U-Net with VGG16-backbone, Adding SSF (without SHA block), Adding SHA block, Adding CPF (Whole AdaptNet)
0.001	Dice, 89.94, 90.38, 91.12, 91.28
0.001	IoU, 82.79, 83.54, 84.76, 85.03
0.002	Dice, 90.90, 91.22, 92.17, 92.62
0.002	IoU, 84.33, 84.99, 86.34, 87.09

LensID: A CNN-RNN-Based Framework Towards Lens Irregularity Detection in Cataract Surgery Videos