Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Heiko Maier, Shahrooz Faghihroohi, Nassir Navab

Abstract

In order to scan for or monitor retinal diseases, OCT is a useful diagnostic tool that allows to take high-resolution images of the retinal layers. For the aim of fully automated, semantic segmentation of OCT images, both graph based models and deep neural networks have been used so far. Here, we propose to interpret the semantic segmentation of 2D OCT images as a sequence alignment task. Splitting the image into its constituent OCT scanning lines (A-Modes), we align an anatomically justified sequence of labels to these pixel sequences, using dynamic time warping. Combining this dynamic programming approach with learned convolutional filters allows us to leverage the feature extraction capabilities of deep neural networks, while at the same time enforcing explicit guarantees in terms of the anatomical order of layers through the dynamic programming. We investigate both the solitary training of the feature extraction stage, as well as an end-to-end learning of the alignment. The latter makes use of a recently proposed, relaxed formulation of dynamic time warping, that allows us to backpropagate through the dynamic program to enable end-to-end training of the network. Complementing these approaches, a local consistency criterion for the alignment task is investigated, that allows to improve consistency in the alignment of neighbouring A-Modes. We compare this approach to two state of the art methods, showing favourable results.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_67

SharedIt: https://rdcu.be/cyhML

Link to the code repository

N/A

Link to the dataset(s)

http://people.duke.edu/~sf59/Chiu_BOE_2014_dataset.htm


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors present an approach for segmenting retinal layers in OCT by integrating dynamic time warping (DTW) with deep learning. Two approaches for integrating the CNN was investigated: 1) using pretrained CNN features and 2) End-to end CNN features. The evaluation was performed on a patient cohort of 10 subjects with diabetic macular edema, and showed improvements in the fluid class.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Overall, the paper is very well written with clear descriptions of existing literature, the two proposed methods, and the details for the training and evaluation. The proposed work represent a nice integration of existing ideas for OCT segmentation approaches with deep learning.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The primary weakness of the work is that the evaluation is a bit lacking. The experiment result is missing several key details, such as 1.) a baseline comparison against using DWT without deep learning. 2.) standard deviations for the dice results 3.) a evaluation of the topology in the final segmentations. The dataset used was also exceptionally small, with only 10 images in total and 2 subjects used for validation. Lastly, the proposed method does not, in general, seem to perform significantly better than the two baseline methods compared. The only improvement in performance seems to be the fluid class, but without knowing the variance of the performance, it is difficult to determine if such differences are significant.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility is passable. An openly available dataset was used, but the authors do not appear willing to release their code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    I have a few recommendations that might help improve the quality of this paper:

    -Including additional details for the evaluation as listed above (i.e. baseline DWT comparison, standard deviation of the results, analysis of the layer topology).

    -Improving the quantity of subjects used for evaluation. Perhaps also demonstrating its performance on a dataset acquired from a different scanner. To show the model is not just overfitting the currently limited data.

    -In Figure 3 it is a bit unclear what the 2nd image is. And it is difficult to determine what structure each of the colors in each of the segmentations represent. Showing more than one example would also be helpful. Or showing the difference in segmentation result between the various methods being compared.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The authors present a novel integration of deep learning into an existing approach for retinal layer segmentation. The paper was very well written, and the results suggests improvement in the detection of the fluid class.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper
    • this paper use the dynamic time warping in the task of retinal OCT segmentation
    • The proposed method is fully end-to-end trainable. in addition, the method can guarantee too adhere to topological constraints.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • the idea of dynamic time warping in the task of OCT segmentation seems novel.
    • The investigation of related work is sufficient.
    • the authors also compare the result between DTW as post-processing and end-to-end training with DTW
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • the experiment is insufficient. there are only two papers for comparison. And only Language is published in 2020. So comparison with the latest method is not sufficient.
    • About Table I, the best result should be shown in bold for comparison.
    • This paper should be introduced DTW in detail rather than giving a reference.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • the description of the proposed method (DTW) is not clear.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • provide the detail of DTW
    • more comparative experiment are necessary.
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • the idea of dynamic time warping in the task of OCT segmentation seems novel.
    • the sufficient study of releated work
    • the insufficient comparative experiment.
  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Somewhat confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    There is consensus about the novelty of the proposed DTW-based segmentation method, but several questions about the experimental results were raised such as key missing details about the standard deviation of Dice coefficients, and a lack of evaluation of the topology of the final segmentation. The small sample size of the dataset also raises some concern about overfitting.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3




Author Feedback

Dear Chairs, We would like to thank the reviewers for their constructive comments and the discussion. We are especially glad that both reviewers appreciate the novelty of our approach. Still, we would like to discuss a few points mentioned by the them: 1) We will gladly provide standard deviations of our dice scores (asked by R2). To summarize, over the 22 B-Modes in the test set, our 4 methods have standard deviations in their dice scores that are on average, over all classes, between 0.065 and 0.076, with the largest contributions coming from the fluid class (between 0.322 and 0.332). The latter makes sense as fluid comprises one of the smallest classes and is often concentrated in only one region in the B-Mode. If the network does not recognize this region as containing fluid, the dice score for fluid in one single B-Mode can easily become 0, leading to a higher std deviation than for the retinal layer classes. A full listing of all the standard deviations could be added to the paper if desirable. 2) R2 wishes to see a study of the topology of the final segmentations. On the specified test set, we counted the number of topological violations, defined as the number of times the method assigned a pixel to a layer (N-1) after it had already assigned at least one pixel as being of layer (N). Means and std deviations of these violations over 4 training runs were: using only a CNN (no DTW at all) : (27570 +- 2504); soft-DTW: (28 +- 17); pretr. Only and soft-hard DTW (with and without cumulative dist. matrix): (0 +- 0). This clearly shows that using soft DTW strongly diminishes the violations a simple CNN alone produces and using a “hard” DTW completely prevents violations by design. These numbers can be added to the paper for completeness. 3) R2&3 mention that we evaluate our results on only one dataset. We would like to point out that the given DUKE Dataset for retinal layer and fluid segmentation, with its 110 OCT images (10 subjects x 11 images), is a comprehensive, public dataset that is popular in the field. Due to this, we think firstly evaluating a new approach on this dataset is a reasonable choice. 4) R2 asked us to evaluate DTW without deep learning for the distance function. We did not add such a comparison because the only paper doing so [Source 6 in our paper] only does semi-automatic segmentation, and thus is not directly comparable to the fully automated approaches presented here. Furthermore, we do not see a straightforward way of using DTW for retinal OCT segmentation in a fully automated manner without a learned distance function. (Note that as DTW needs a distance function giving the “cost” of aligning to entries of two sequences, one would need to design such a distance function manually if he or she does not want to learn it. This manual design can rather easily be done to align two sequences of identical type, like two A-Modes, e.g. using a squared Euclidean distance as in [6]. It would be much more challenging to define it between two sequences of different kinds – e.g. an A-Mode and a symbolic series like our topological reference series, which is why we think a data driven approach is expected to outperform any non-deep learning approach here.) 5) R3 mentioned that we only compare to two baseline methods, of which only one is from 2020. We agree that we did not perform an exhaustive comparison of state of the art methods. Our aim was mostly to present our novel idea to the community and to validate the applicability of DTW, in combination with Deep Learning, to OCT. For this, these two baselines were useful because the paper that provided them had evaluated their experiments using the same train/validation/test split that we had worked on, and because they also do joint segmentation of layers and fluid unlike many other approaches that only segment layers. If the chairs would like, we could add more methods or recent approaches to the results section to better pin down the performance of our methods.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal adequately addressed questions from the reviewers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    1



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have responded to the reviewers concerns. The authors propose that they will add the results of the Dice coefficients in the paper. Their response on the evaluation methodology is satisfactory and the paper presents novel and interesting ideas.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The reviews are positive and consistent, which recognize the novelty of this work. The authors’ response amends some essential details of the validation, which well addresses the concerns summarised from 1st round review. In summary, I agree to accept this paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2



back to top