Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Yifan Yang, Huihui Fang, Qing Du, Fei Li, Xiulan Zhang, Mingkui Tan, Yanwu Xu

Abstract

We address the problem of Peripheral Anterior Synechiae (PAS) recognition, which aids clinicians in better understanding the progression of the type of irreversible angle-closure glaucoma. Clinical identification of PAS requires indentation gonioscopy, which is patient-contacting and time-consuming. Thus, we aim to design an automatic deep-learning-based method for PAS recognition based on non-contacting anterior segment optical coherence tomography (AS-OCT). However, modeling structural differences between tissues, which is the key for clinical PAS recognition, is especially challenging for deep learning methods. Moreover, the class imbalance issue and the tiny region of interest (ROI) hinder the learning process. To address these issues, we propose a novel Focal Contrastive Network (\method), which contains a Focal Contrastive Module (FCM) and a Focal Contrastive (FC) loss to model the structural differences of tissues, and facilitate the learning of hard samples and minor class. Meanwhile, to weaken the impact of irrelevant structure, we introduce a zoom-in head to localize the tiny ROI. Extensive experiments on two AS-OCT datasets show that our proposed \method~yields $2.3\%$ - $8\%$ gains on the PAS recognition performance regarding AUC, compared with the baseline models using different backbones. The code is available at \href{https://github.com/YifYang993/FC-Net}{\textit{https://github.com/YifYang993/FC-Net}}

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_3

SharedIt: https://rdcu.be/cyl9z

Link to the code repository

https://github.com/YifYang993/FC-Net

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes a Focal Contrastive Network for recognition of PAS, a feature of irreversible angle-closure glaucoma, from AS-OCT data. A localisation module called a zoom-in head first localises the RoI. Structural difference features of an eye under different lighting conditions (bright and dark) are extracted both globally (whole image) and locally (RoI) and classified into Normal and PAS classes. The critical contribution is the focal contrastive loss that combines focal loss and contrastive loss to measure the difference features of the paired AS-OCT sequence and handle class imbalance.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The focal contrastive loss is interesting, and seems to be a straightforward combination of focal and contrastive loss. It is well explained. Modelling of the problem, use of dataset, and the clinical application itself is good, and evaluation using metrics and ablation tests is at state of the art.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper is marred by poor English, incomplete sentences and spelling errors. Sometimes the order of presentation is not ideal. On page 2, what is AS-I and AS-II- these are undefined until section 3. The zoom-in head seems to be a localization method, why not identify it by that term? AS-OCT image pairs should be described briefly. In the experiments, was the training and test set split at the patient level? Also,show the number of patients in Table I. On metrics, for class imbalanced datasets, PR-AUC should also be reported. How are the backbone networks selected- what is their common feature? In Table 2, significance tests should be reported, as it is difficult to evaluate improvement without them.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It is likely to be reproducible. More information on training/test split is probably required, see comment earlier on patient level split.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

See under 4
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The application and dataset are sufficiently novel, and the localisation method and Focal Contrastive Network are interesting. Experiments and reported results show improvements over other networks.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

2
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

The authors proposed the automatic peripheral anterior synechiae (PAS) detection methods from anterior segment optical coherence tomography (AS-OCT). They incorporate the structure difference of tissues in the pair of dark/bright AS-OCT images into PAS classification. They also tackle the class imbalance problem with the focal contrastive loss and decay learning strategy.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The main strength of the paper is the use of clinical prior knowledge such as structural difference of tissues in dark/bright AS-OCT into PAS classification. The paper is clinically well motivated and shows clinical feasibility of a novel dark/bright AS-OCT. The validation on real clinical data and the corresponding ablation study is convincing. Also, the proposed architecture was generalized to multiple backbone networks and different parameters for robustness check.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The main weakness of the paper is the lack of details on zoom-in head model (ZIH) and its performance. According to the ablation study, ZIH model has a significant impact on the classification performance but ZIH model itself is not validated and well discussed. If ZIH model misses the ROI, the error will be propagated to the following classification. In addition, the misalignment between dark/bright AS-OCT should be considered. In addition, the proposed focal contrastive module is based on simple fully connected layers and convergence behavior and computational time during training are not discussed.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors provide the details of the hyper-parameter and perform the ablation and sensitivity analysis. They will also provide the code and data if the paper is accepted.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The authors need to provide the detailed explanation about the zoom-in head model and validate the performance of ROI detection. Instead of using [10] based on HOG, the deep neural network for ROI detection can be developed. The misalignment between dark/bright AS-OCT needs to be addressed. In addition, focal contrastive module can be improved by a skip connection or adversarial learning and the convergence behavior and computational time can be reported.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well written and easy to read. Even though I have a few concerns about the novelty on ZIH and focal contrastive module, the paper highlights the importance of incorporating clinical domain knowledge into PAS detection. The paper is well validated on real clinical data with the sensitivity analysis and ablation study. All mathematical equations are well defined. The proposed focal contrastive loss and decay learning strategy are effective in dealing with class imbalance problem. The paper is reproducible with open-source/data and hyperparameter information.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

I would say that this paper presents a learning strategy that appears to be useful for classifying OCT sequences in terms of Peripheral Anterior Synechiae (PAS) presence. Under the (strong) assumption that there are two OCT sequences of the same eye available (one captured in bright and another one in dark illumination), such strategy consists of looking for differences in both sequences with a contrastive module loss - no structural difference could mean presence of PAS. There is also a zooming module that spatially crops the sequence, so we end up with a two-stream architecture that process the data both globally and locally before classifying each pair as containing PAS or not.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The inspiration for the presented approach is backed by clinical knowledge: if the iris is not separated from the cornea when the eye is illuminated, we are in trouble. Being a general strategy, it can be employed on top of existing architectures for classifying OCT volumes, which one can consider as backbones, as a plug-in improvement. Experiments with several backbones are provided, and the method seems to bring consistent improvements on all of them, for two different (although similar) datasets.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Technically, the main weakness I can see is the lack of detail on a critical component of the proposed system, namely the zoom-in head. It is only briefly mentioned that the authors use such a mechaniism, but there are no explanations on how it is implemented. The reader is referred to [10] for details, but [10] does not seem to have anything to do with zoom-in heads that learn to crop end-to-end in CNNs, being a 2012 paper on HOG for glaucoma classification.

Another big source of confusion for me while reading the paper was the analogy shown in Fig 3 with the clock. At some point the authors mention that they “aim to classify the 12 clock regions of an eye into normal or PAS”, but afterwards it is not clear to me how this is handled in training. From Fig. 2 it would seem that the pair of rectangular OCT sequences is provided as input to the network, but from the text it seems that each “hour” in the clock represents a training sample that could contain PAS or not. What I don’t understand is, is the data labeled at the “hour” level, or at the sequence pair level?

Last, I understand that the need to have a pair of sequences, one captured under bright and another under dark conditions, is a limitation of this approach, which would be limited to scenarios in which we have this particular kind of data. Maybe this would need to be discussed in the context of the clinical significance of the paper.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
I believe there are some serious inconsistencies between what the authors claim in their checklist and what one can find in the paper. Specifically:
- A link to a downloadable version of thea new dataset (if public) The authors answer “Yes”, but I could not find a link to their datasets anywhere in the paper.
- Code: The authors say that all code and pretrained models are available, but there is no indication of such availability in the paper, I think.
- The range of hyper-parameters considered, method to select the best hyper-parameter configuration, and specification of all hyper-parameters used to generate results. The authors answer “Yes”, but I could not see any mention to hyper-parameter search nor anything similar.
- An analysis of statistical significance of reported differences in performance between methods. The authors say “Yes”, but no statistical significance tests are reported anywhere.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- The zoom-in head seems to be a critical component of the method, but there is a lack of explanation on how does it work. What is the size of the croped region? Is it fixed or variable? What is the architecture of this head? And its output, is it a set of 8 coordinates that specify where to crop the entire volume/sequence? How was this initialized?
- The reader needs to understand what is the role of the “hour” regions in Fig 3a. Is the problem here classifying each slice of the clock, classifying each pair as PAS or not PAS, or to classifying each of the sequences? How was the data really labeled, per-slice, per-volume, or per-pair?
- In the experiments, why was the FC module not used for the SMA-Net backbone? Also, why was the -C (=simple concatenation) experiment reported onl for the NL I3D architecture? Could the authors complete this table reporting also all these missing values?
- If the datasets and the code are to be made publicly available (as promised in the reproducibility checklist), I think the authors should advertise that somewhere in the paper, preferrably in the abstract.
- Is the requirement to have a pair of bright/dark sequences a serious obstacle for clinical implementation of this approach, or is the acquisition of such pairs a routine practice and this data is readily available? It would be worth adding a couple of lines discussing limitations for clinical applicability of the method, if any.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Most parts of the paper seem meaningful, with a reasonable clinical inspiration behind the architecture design. However, there is a lack of detail in some components of the proposed approach that prevents me from understanding precisely how this works. I don’t think this paper could be reproduced by anyone without more explanations, or code sharing (preferrably both).
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper focuses on a well-motivated, focused clinical application. They show exhaustive experimental validation and also compare their method with latest deep neural network architectures and show improvement in performance.

The application topic is niche, but it helps that the authors proposed a clinically driven algorithm.

The writing could be improved in parts and the feedback of the reviewers should be carefully considered.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

Response Letter
We thank all the reviewers for their constructive comments and suggestions. We are pleased to see that all the reviewers recognize the clinical motivation of our paper as well as the state-of-the-art performance on two AS-OCT datasets. In this response letter, we reply to the main comments proposed by the three reviewers. For the other detailed comments, we will carefully modify them in the manuscript. In addition, our code will be publicly available after the article is published. 1.The scope of the application of the algorithm： In this paper, we aim to recognize the peripheral anterior synechiae , which is a glaucoma-related disease. The dimension of the input data (i.e, AS-OCT sequence) is the same as a video or other 3D data. Distinguishing the differences between the pair of AS-OCT sequences is the main idea of our proposed method. Therefore, it has the potential to be extended to tasks that distinguish differences between paired videos or other 3D data. 2.Explanation for the zoom-in head module： The zoom-in head module is designed to capture the region of interest (ROI) near the junction of the iris and cornea, which is fed to the neural network as local information for analysis. In our paper, the horizontal centerline of the rectangular ROI is the horizontal line at the junction of the iris and cornea, which is close to the horizontal line with a maximum sum of the pixel value on the AS-OCT slice. Considering a gray image X of the size (h, w), the zoom-in head module first traverses X from top to bottom (i.e., [0, 0:w] -> [h, 0:w]) , then searches the horizontal line ([i, 0:w]) with the largest sum of the pixels (which is achieved with the following pytoch code: i=torch.max(torch.sum(X, 1),0).indices), finally sets the horizontal line i as the centerline of the ROI. With this centerline, we can crop the AS-OCT slice at a fixed size to obtain the final ROI. Since the zoom-in head module locates the ROI using a strategy for directly analyzing image pixel values, the ROI will be found in each AS-OCT slice without missing cases. 3.Explanation for the paired data In our paper, we divided an eye into 12 clock hours, as shown in Fig.3(a1-a2). The goal of our proposed FC-Net was to determine whether there were peripheral anterior synechiae (PAS) in every clock hour region. The input data corresponding to the clock hour region (h_E) is a pair of AS-OCT sequences ({x_d, x_b}), which are collected from dark and bright conditions respectively, as shown in two cubes in the orange rectangle dashed box of Fig.2. To be more specific, the dark sequence x_d (upper cube) and the bright sequence x_b (lower cube) represent the AS-OCT sequences of the same clock hour h_E from an eye E collected in the dark room (Fig. 3(a1)) and bright room (Fig.3(a2)), respectively. The label of this paired AS-OCT sequences ({x_d, x_b}) is label(h_E)), which represents whether this clock hour region is normal or PAS. Note that the labels for the 12 clock regions of the same eye are independent.

back to top

Distinguishing Differences Matters: Focal Contrastive Network for Peripheral Anterior Synechiae Recognition