Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Mengjun Cheng, Zishang Kong, Guoli Song, Yonghong Tian, Yongsheng Liang, Jie Chen

Abstract

Gastrointestinal polyps are the main cause of colorectal cancer.
Given the polyp variations in terms of size, color, texture and poor optical conditions brought by endoscopy, polyp segmentation is still a challenging problem. In this paper, we propose a Learnable Oriented-Derivative Network (LOD-Net) to refine the accuracy of boundary predictions for polyp segmentation. Specifically, it firstly calculates eight oriented derivatives at each pixel for a polyp. It then selects those pixels with large oriented-derivative values to constitute a candidate border region of a polyp. It finally refines boundary prediction by fusing border region features and also those high-level semantic features calculated by a backbone network. Extensive experiments and ablation studies show that the proposed LOD-Net achieves superior performance compared to the state-of-the-art methods by a significant margin on publicly available datasets, including CVC-ClinicDB, CVC-ColonDB, Kvasir, ETIS, and EndoScene. For examples, for the dataset Kvasir, we achieve an mIoU of 88.5% vs. 82.9% by PraNet; for the dataset ETIS, we achieve an mIoU of 88.4% vs. 72.7% by PraNet. The code is available at https://github.com/midsdsy/LOD-Net.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_68

SharedIt: https://rdcu.be/cyhMM

Link to the code repository

https://github.com/midsdsy/LOD-Net

Link to the dataset(s)

https://github.com/DengPingFan/PraNet

Reviews

Review #1

Please describe the contribution of the paper

This paper provides a Learnable Oriented-Derivative Network that introduces the boundary information in the segmentation and boosts the segmentation quality, especially in low-contrast regions.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is easy to follow. The figures are of good quality.
- Learning derivative at the boundary area and incorporating the boundary information in segmentation is a novel idea.
- The results look promising.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The manuscript lacks a baseline experiment that does not use any orientation gradient strategy. The baseline would be essential to demonstrate the effectiveness of the proposed idea.
- The model is built upon Mask R-CNN, which is not wise as Mask R-CNN is for multi-object instance segmentation. Since the dataset images are limited, it would be better to employ a lightweight architecture or detector.
- Some descriptions of the manuscript are not clear. Figure 4 is not even mentioned in the manuscript.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The code is publicly released.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- On page 2: “we find that in feature maps, oriented derivatives of pixels in boundary regions are larger than those of other pixels” is not substantiated with figures or statistics.
- On page3: “However traditional methods ignore oriented derivative other than orientation of gradients, whose representation capability is insufficient” is not clear. Which traditional method do you refer to?
- Equation 1 and 2 should switch position. Besides, what’s the definition of normalized parameter D?
- In Equation 4, what is o?
- It would be better to introduce section 2.3 together with Figure 2.
- Figure 4 is not even mentioned in the manuscript.
- Experiment: – No validation set is employed; what’s the criterion to stop training? How to avoid overfitting? – It looks like Mask R-CNN is a very big architecture. As the manuscript only has one object in each image, it’s not wise to use a multi-object detector. It would better to clarify the number of model parameters and inference speed. – Table 2 lacks the baseline experiment without any orientation strategy.
Other comments:
- Page 3: “a adaptive”->”an adaptive”
- Page 5: “Soomth”->”Smooth”
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The idea of addressing the boundary is novel.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Somewhat confident

Review #2

Please describe the contribution of the paper

This paper presents a novel method by fusing learned oriented derivative feature on object boundary and high-level semantic feature within object, achieving good results on four benchmarking datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The idea is interesting and novel. To the best of my knowledge, this is the first time that combines learned oriented derivative feature on object boundary and high-level semantic feature within object for object segmentation.
- Based on the experimental results, the proposed method is effective.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Some important ablation study using the same network backbone and metrics are missing.
- There is a MICCAI 2020 paper “Learning Directional Feature Maps for Cardiac MRI Segmentation” that shares similar idea by leveraging feature related to object boundary. Though the enhanced feature is very different, a comparison between them would still be appreciated.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors have released the code. Though I have not checked the code, I believe one can reproduce the results based on the released code.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Some of my suggestions are given in the following (see also weakness part) I would like to suggest to show some qualitative illustrations in the paper. Include the ablation study in the supplementary file to the main paper and use the same metrics. Add the comparison the one related work sharing somehow similar idea. Proofread the paper. Some typos: a Adaptive –> An adaptive Fig.Number –> Fig. Number
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Interesting idea and good results (see the strengths)
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

2
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

Authors propose a Learnable Oriented-Derivative Network (LOD-Net) to refine the accuracy of boundary predictions for polyps. Specifically, it firstly calculates eight oriented derivatives at each pixel for a polyp. It then selects those pixels with large oriented-derivative values to constitute a candidate border region of a polyp. It finally refines boundary prediction by fusing border region features and also those high-level semantic features calculated by a backbone network.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Novelty: This paper aims to enhance the representation around boundary and refine the boundary prediction. The proposed solution of highlighting border region in feature map is novel.
- Experiments: Authors utilize four public polyp datasets to evaluate the proposed method and conduct extensive experiments to demonstrate the effectiveness of the proposed method. Moreover, the source code is available, which is a positive aspect of this paper.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- In Figure 2, it seems that the ground truth is an input of network, since the data flow starting from ground truth is concatenated with high-level semantic features for further processing. I wonder how to input the ground truth of test image to the network to obtain final segmentation results during inference phase.
- The expression of oi,j in equation (4) has not been defined.
- The calculation of evaluation metrics utilized in this paper is inconsistent with that in SOTA methods, such that the comparison results displayed in Table I is not convincing. I have checked the evaluation code provided by authors and that of PraNet [8]. In the provided code, authors firstly calculate the total intersect and union of whole dataset and then calculate the score of mIoU and mDice. In the code of PraNet, the mIoU and mDice is calculated by the mean of IoU and Dice score for each image rather than the whole dataset. In practice, the former one usually leads to higher scores of mIoU and mDice. However, authors directly borrow results reported from PraNet [3], which is unfair.
- The ablation study is insufficient. The main missing experiment results is the quantitative result of baseline model (Mask R-CNN), which ablates all the proposed components compared with ‘ours’. Moreover, it is suggested that visualization of learned offset map or oriented derivatives should be presented to help reader better understand the proposed method.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Authors utilize four benchmark datasets to evaluate the proposed method, and the source code is available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- The quantitative result of baseline model (Mask R-CNN) should be provided to demonstrate the effectiveness of the proposed metho.
- It is suggested that visualization of learned offset map or oriented derivatives should be presented to help reader better understand the proposed method.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Considering the novelty of paper and unconvincing experimental results, I suggest to ‘boderline reject’.

This paper proposes an effective idea to boost the accurate prediction around boundary in polyp images. I am open to raise my rating if the authors can solve my questions.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

7
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper presents a novel method by fusing learned oriented derivative feature on object boundary and high-level semantic feature within the object, achieving good results on four benchmark datasets. However, some important ablation studies using the same network backbone and metrics are missing, such as the baseline model Mask R-CNN. Moreover, some descriptions of the manuscript are not clear, including the number of model parameters, parameter settings, criteria to stop the training.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

We thank area chair and reviewers for their appreciation and suggestions of our work. Below please check our clarifications regarding their major concerns.

For Reviewer #2, Reviewer #3, Reviewer #5, and meta-reviewer: We provided more quantitative and qualitative results in our Github page, https://github.com/ midsdsy/LOD-Net.

Ablation studies with baseline Mask-RCNN. We put the suggested ablation studies in the supplemental materials due to the space limitation, for example, 78.5 vs. 76.0 in CVC-ClinicDB (seen) and 54.9 vs. 51.6 in ETIS-LaribPolypDB (unseen). Moreover, in terms of the metrics for polyp segmentation, our proposed method also performs better compared to Mask-RCNN, for example by mDice and mIoU with an improvement of 1.4% and 2.6% on Kvasir (seen) and an improvement of 4.1% and 7.0% on ETIS-LaribPolypDB (unseen), respectively.

Descriptions of manuscript. To be specific, the normalized parameter D in Eq.2 and Eq.3 is the Euclidean distance between the current pixel and sampled pixel. The o_{ij} in Eq.4 is the predicted oriented derivative.

Training Settings. We set the hyper-parameters like learning rate and batch size according to the commonly used settings of object detectors. All of parameters are open-sourced. We stop the training based on the convergence condition of loss function. The model achieves a total loss of about 0.16 in 7k iterations and stays relatively stable in following iterations. The total number of our model parameters is 15.28M, which is trivially more than Mask R-CNN (14.76M) and less than PraNet (30.49M) and HarDNet (17.42M).

Visualization. We provided the visualization of learned oriented derivatives in our Github page.

For Reviewer #2:

Lightweight architecture. For datasets like Kvasir and CVC-ClinicDB, one can find there are multiple polyps in an image. A multi-object detector like Mask R-CNN is thus a suitable baseline model. In addition, it should be noted that our proposed oriented-derivative representation could be implemented in both single-object and multi-object detectors so long as we replace their mask head with ours.

For Reviewer #5:

GT of test images. Fig.2 shows the workflow of our proposed method during training. During inference, we do not use the ground truth of test images, but use the predicted pixel-based oriented-derivative to generate the feature map with size of 28x28x256 and also use it for adaptive thresholding module and the following steps.

Evaluation metrics. For the calculation of mDice and mIoU, we use the metric code of mmSegmentation which is an open-source project with 1.8k stars by OpenMMLab. Besides the difference mentioned by the reviewer, it should be noted that our code uses a simpler threshold selection policy compared to PraNet. The code from PraNet calculates a mean metric value of all images by mean of different thresholds in [1:-1/255:0], while our code only uses a fixed threshold 0.5 for evaluation. Here the results in the same metrics, for example, by mIoU we achieved 88.45% vs. 82.87% by PraNet in Kvasir (seen) and we achieved 88.37% vs. 72.71% by PraNet in ETIS (unseen). More results are shown in our Github page. In addition, we will update the results in our final manuscript.

back to top

Learnable Oriented-Derivative Network for Polyp Segmentation