Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Yinglin Zhang, Risa Higashita, Huazhu Fu, Yanwu Xu, Yang Zhang, Haofeng Liu, Jian Zhang, Jiang Liu

Abstract

Corneal endothelial cell segmentation plays a vital role in quantifying clinical indicators such as cell density, coefficient variation, and hexagonality. However, the corneal endothelium’s uneven reflection and the subject’s tremor and movement cause blurred cell edges in the image, which is difficult to segment, and need more details and context information to release this problem. Due to the limited receptive field of local convolution and continuous downsampling, the existing deep learning segmentation methods cannot make full use of global context and miss many details. This paper proposes a Multi-Branch hybrid Transformer Network (MBT-Net) based on the transformer and body-edge branch. Firstly, We use the convolutional block to focus on local texture feature extraction and establish long-range dependencies over space, channel, and layer by the transformer and residual connection. Besides, we use the body-edge branch to promote local consistency and to provide edge position information. On the self-collected dataset TM-EM3000 and public Alisarine dataset, compared with other State-Of-The-Art (SOTA) methods, the proposed method achieves an improvement.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_10

SharedIt: https://rdcu.be/cyhLC

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper presents an extension to the standard UNet in medical segmentation, it consists of two core branch, first one is the hybrid convolution transformer branch, which used to exact short/long-range information, the other one is the body-edge branch, which works as an auxiliary task and predicts the boundary of the cell.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The authors explore the transformer architecture in the field of medical segmentation.
2. The paper is well writing and organization
3. The approach has shown improvements in two datasets compared with other conventional methods like UNet, UNet++
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Academic novelty is slightly lacking, first, the transformer branch simply inserts the residual transformer block (transformer-encoder) from [14] into the central module of UNet and does not make a specific design for the task of cell segmentation. Secondly, it is already common to add an auxiliary head to predict the boundary of objects and supplement its corresponding features to the main segmentation features.
2. Besides the performance, as new network architecture, the authors did not provide any compared metrics like parameters and flops when comparing with other top methods.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Very clear description, can be reproduced
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. The proposed transformer branch has a similar design to the work: Chen, J., Lu, Y., Yu, Q., Luo, X., Adeli, E., Wang, Y., … & Zhou, Y. (2021). Transunet: Transformers make strong encoders for medical image segmentation. arXiv preprint arXiv:2102.04306. The author compared and claimed: “Long-range dependencies are established through the transformer in both the encoder and decoder” in sec 3.2, however, as shown in figure 2, the authors insert the residual block into the center of the encoder-decoder network, which does not differ with the TransUnet.
2. The proposed body-edge prediction head has been widely explored. The authors discuss some of the paper but not compared with them. It will be better to compare with the high-revelent methods.
3. The authors should provide comparison with SOTA architectures in terms of parameters or FLOPs.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The novelty of idea
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper proposed a U-Net shape architecture for the segmentation of corneal endothelial cells, that combines CNN (for local feature extractions) and transformer (for global information aggregation). The framework also outputs body and edge predictions and merges them to obtain the final prediction. The proposed network achieves better performances on a self-collected dataset and a public dataset compared with other SOTA methods.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. A specially designed network structure with Transformer, decoupled edge/body
2. Comprehensive experiments results and ablation study
3. Superior performance over other methods
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The details of network structure is missing, such as conv-e1, conve02, etc. Any upsampled downsampled operations included? A detailed figure should be added in the supplementary to make it clear.
2. While the proposed architecture is interesting, it may not be of benefit for the specific task, i.e., cell segmentation, since a cell is comparably small in an input image, therefore intuitively global information aggregated by transformer may not be helpful for a local cell. If the authors can convince me about this, I can upgrade the rate.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Satisfactory
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. Why Conv-e1 is fused into the edge feature? Any ablation study on this? Also, why do not fuse it into body feature?
2. Any experiments on replacing the Transformer layers into other long-range capturing modules, such as GCN or non-local?
3. Any ablation studies on the kernel size of the canny operator and Gaussian blurring? How they affected the performance?
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The experiments part.

I will raise the score if the author can address my concerns.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

The paper describes a network to segment images of the corneal endothelium. From the segmentation, several biomarkers can be extracted that are clinically relevant (although this paper do not evaluate that point). This paper proposes a network that combines U-net, transformers, and a multi-branch approach (body-edge predictions). The network is tested in two datasets, and a comparison between different networks against the proposed one is performed. An ablation study is also performed to evaluate some small tuning decisions in the network. Results are convincing and the network seems to be promising.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Well written. Good introduction (see point 7 for improvements). Evaluated in two very different datasets (one is quite simple, but the other is very challenging). Experiments are well described and performed. I find very interesting the comparison with the other 5 basic networks (section 3.2) and the ablation study (which basically clarifies that the inclusion of a transformer is beneficial – not much, but it certainly is). An overall good paper, that deals with an important problem.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
There are no major weaknesses. I am quite pleased with their work. However, I find some points for improvements (the authors already filled the 8 pages, so this is more like a suggestion):
- The authors claim that the use of the transformers help to capture the global context. This is very interesting for this type of cell segmentation (some cells might be very large, where “global features” are more critical). Table 2 seems to suggest that it is true. However, I would have preferred a visual comparison where the use of the Transformers clearly shows a benefit (similar to Fig. 5, but comparing a network with and without Transformers, and pointing out to some cells that were better segmented when Transformers are used).
- While the quantitative results are good, the qualitative ones (Figure 2) are not so much. Yes, the proposed network is better than the ones used for comparison, but there are many small mistakes in Figure 2-e that would affect (greatly) the estimation of the corneal parameters. I am specifically referring to edges that are not completely connected, or small isolated edge-pixels. How do the authors plan to solve that in the next step to estimate the corneal parameters? Certainly, the paper only covers up to the accuracy of the segmentation, but that is just one part of the problem (if one looks at the current state of the art, it is actually the simplest part of the problem).
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It seems feasible to reproduce the network. Many details have been given (code will be certainly preferred).
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
These are small mistakes and suggestions:
- Abstract, line 2: coefficient “of” variation
- Abstract, line 11: “we” instead of “We”
- Page 2, first line: correct the accent in Fabijańska.
- Table 1. In TM-EM3000, the best SE is UNet++ and not TransUNet
- No need to repeat the scores in the text once they are given in the tables. Just discuss the improvement and/or point out to the cases that are worth observing.
- In Reference 12, the journal name is in all capital letters. Please, fix.
- Why do you use a batch size 1? It is usually better to have a larger batch size. You could have gotten better results by just using 4-8 images in the batch (as much as your GPU memory allows).
- Did you train using both datasets together or in a independent way? I think you did it independently. However, it would have been nice to check whether having a network trained with both types (all together) would have learnt better features and, thus, perform better (probably not, but it is worth checking it).
- (As a suggestion) I miss the references to the newest methods used for corneal cell segmentation. Your most recent reference is Fabijanska (2018) and Al-Fahdawi (2018), and you make a reference to Ruggeri paper (2010; because you use their dataset, although their most recent paper is from 2016), but there are 2-3 other groups that have worked on this field and it would be nice to make a brief reference in the Introduction (just a reference, mentioning that all use U-nets or similar deep learning methods, which are then comparable to your method; no need for more than that). These are the most recently published papers (all post 2018):
- Vigueras-Guillén JP, van Rooij J, Engel A, Lemij HG, van Vliet LJ , Vermeer KA. Deep learning for assessing the corneal endothelium from specular microscopy imagse up to one year after ultrathin-DSAEK surgery. Translational Vision Science & Technology. 2020; 9 (2), 49-49.
- Daniel MC, Atzrodt L, Bucher F, Wacker K, Böhringer S, Reinhard T, Böhringer D. Automated segmentation of the corneal endothelium in a large set of ‘real-world’ specular microscopy images using the U-net architecture. Nature Scientific Reports. 2019; 9(1):4752.
- Vigueras-Guillén JP, Sari B, Goes SF, Lemij HG, van Rooij J, Vermeer KA, van Vliet LJ. Fully convolutional architecture vs sliding-window CNN for corneal endothelium cell segmentation. BMC Biomedical Engineering. 2019; 1:4.
- Nurzynska K. Deep learning as a tool for automatic segmentation of corneal endothelium images. Symmetry. 2018; 10(3):60.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Well written paper, with a novel network that combines several concepts. Experiments were correctly done and reported. It has good performance, and it can be used in other type of segmentation problems. It is a good conference paper.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

All reviewers acknowledge that the paper is well organized and well presented. The experiments are well described and performed. Two reviewers concerned that the proposed method was not optimized for cell segmentation. Please address this concern and all other comments to further improve the quality of the manuscript. Overall, the paper can be interesting for MICCAI.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

To reviewer1:

We use transformer to capture long-range dependencies in encoder3, encoder4 and decoder3, decoder4, but TransUNet only uses ViT’s Transformer layer in the encoder part.

Lightweight design and comparison of calculation amount will be included in our future work. To reviewer2:

Since corneal endothelial cells present a highly repetitive pattern, when there are interference factors such as blurred edges, capturing long-range dependencies helps to infer the morphology of local cells. To reviewer3:

Thanks very much for the comments about some small writing mistakes and related research recommendations.

We will consider the suggestions, adding the visual results of with and without transformer in future work, and exploring in depth which cells the transformer has a greater impact on the segmentation.

back to top

A Multi-Branch Hybrid Transformer Network for Corneal Endothelial Cell Segmentation