Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Yutian Shen, Xiao Jia, Max Q.-H. Meng

Abstract

Automatic polyp segmentation in the screening system is of great practical significance for the diagnosis and treatment of colorec- tal cancer. However, accurate segmentation in the colonoscopy images still remains a challenge. In this paper, we propose a hard region en- hancement network (HRENet) based on an encoder-decoder framework. Specifically, we design an informative context enhancement (ICE) mod- ule to explore and intensify the features from the lower-level encoder with explicit attention on hard regions. We also develop an adaptive fea- ture aggregation (AFA) module to select and aggregate the features from multiple semantic levels. In addition, we train the model with a proposed edge and structure consistency aware loss (ESCLoss) to further boost the performance. Extensive experiments on three public datasets show that our proposed algorithm outperforms the state-of-the-art approaches in terms of both learning ability and generalization capability. In particu- lar, our HRENet achieves a mIoU of 92.11% and a Dice of 92.56% on Kvasir-SEG dataset. And the model trained with Kvasir-SEG and CVC- Clinic DB retains a high inference performance on the unseen dataset CVC-Colon DB with a mIoU of 88.42% and a Dice of 85.26%. The code is available at: https://github.com/CathySH/HRENet.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_53

SharedIt: https://rdcu.be/cyhMw

Link to the code repository

https://github.com/CathySH/HRENet

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

-The authors utilizing the hard region attention map to generate the grid, which is utilized to resample and enhance features. -Moreover, an edge loss is proposed to persevere the consistency of the prediction of object boundary. -Achieve sota performance on three benchmark datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Interesting idea to enhance feature by sampling the feature of the hard region. Comprehensive experiments
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The idea of this paper is very similar to the following paper: [1] Recasens, A., Kellnhofer, P., Stent, S., Matusik, W., Torralba, A.: Learning to zoom: a saliency-based sampling layer for neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 51–66 (2018) there are two main differences: 1) how to construct the input of grid generator, in paper [1], important regions are utilized to generate grid, and in this paper, authors adopt the hard region for grid generation. It should be noted that the “easy” regions may also be important, and undersampling them may lose critical information. 2) paper[1] implements image-level resampling while in this paper, authors conduct feature-level resampling, the author should compare and discuss their difference, moreover, in the section of the experiment, it will be better if the authors can provide the results of the paper [1], since they have high-similarity task attribute and this paper derived from it).
2. The novelty of adaptive feature aggregation seems limited, which is simply combined by a non-local operation, a deformable conv, and a se block.
3. The edge loss has been explored in many medical segmentation tasks, however, the author did not cite/discuss/compare with them. As a contribution of the paper, authors should make full discussion and comparison.
4. The author should provide quantitive metrics such as parameters/flops to make a fair comparison with other methods.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Yes, I think it can be reproduced
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Please refer to “list the main weaknesses of the paper”
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. The paper borrows too much from paper [1], but does not discuss and compared with the paper. Besides, the novelty of the AFA module and edge loss is limited.
[1] Recasens, A., Kellnhofer, P., Stent, S., Matusik, W., Torralba, A.: Learning to zoom: a saliency-based sampling layer for neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 51–66 (2018)
What is the ranking of this paper in your review stack?

4
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

A deep learning polyp segmentation method is proposed. The method mainly includes three contributions (1) an informative context enhancement (ICE) module where the mapping is based on [15], (2) an adaptive feature aggregation (AFA) module, and (3) the structure consistency aware loss (ESCLoss). Experiment results demonstrate the effectiveness of the proposed method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The proposed method target on exploring features on the hard region for a better polyp segmentation. For this purpose, the authors combine the ICE, AFA and ESCLoss. The idea is well motivated and the experimental results prove the idea.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

While it is interesting, I notice that the ESCLoss is a combination of BCE, Dice, edge penalty loss (the focal loss), and SSIM based structure loss. No further discussion about the effectiveness of each part of the loss. Moreover, how to choose the weights between those losses.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Reproducibility Response is good. The authors list the detailed information point to point on the list. Public datasets are used and the authors will provide the code and pre-trained model.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

It would be nice to investigate more about how to combine those losses (BCE, Dice, focal loss, structure loss) and the necessary/effectiveness of them in future work.

Minor: Please add the unit (%) of the reported results in Tables.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed method is interesting and well explained. Extensive experiments including comparison with SOTA methods and ablation studies, on suitable public datasets show the effectiveness of the idea.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

Authors propose a hard region enhancement network (HRENet) based on an encoder-decoder framework. The contributions can be summarized into three aspects. Firstly, an informative context enhancement (ICE) module is designed to explore and intensify the features from the lower-level encoder with explicit attention on hard regions. Secondly, an adaptive feature aggregation (AFA) module is developed to select and aggregate the features from multiple semantic levels. Thirdly, the segmentation model is optimized with a proposed edge and structure consistency aware loss (ESCLoss).
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Although the proposed ICE borrows the idea from [15], it is a smart way to enhance the features from the lower-level encoder with explicit attention on hard regions, so as to improve the polyp segmentation performance in uncertain regions.
- AFA is developed to aggregate features from different semantic levels, including the enhanced features of ICE module and those passed from encoder and the previous decoder block.
- The proposed edge and structure consistency aware loss is a hybrid objective function to optimize the proposed HRENet, where the proposed structure consistency loss is somewhat novel and aims at adapting HRENet to different scales.
- Authors utilize three benchmark datasets to evaluate the proposed method, and the comprehensive experiments are convincing.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- HRENet is proposed for enhancing the polyp segmentation performance in uncertain regions. It should be compared with other data hard sample mining methods, such as focal loss etc.
- The proposed AFA module is a combination of state-of-the-art components, including self-attention module [19], deformable convolution [4] and SE attention, but the last one has not been cited in the manuscript.
- The word `Down-concatenations’ only appears in experiment section, which should also be mentioned in methodology part.
- It is not clear whether the Lds should be calculate if ablating ICE module. I wonder whether the contribution of ICE module is owing to the introduced deep supervision loss.
[*1] Hu, Jie, Li Shen, and Gang Sun. “Squeeze-and-excitation networks.” Proceedings of the IEEE conference on computer vision and pattern recognition. 2018.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Authors utilize three benchmark datasets to evaluate the proposed method, and the comprehensive experiments are convincing. Moreover, the source code will be published, which is a positive aspect of this paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- It is better for authors to provide the computational resource demands, including training time, inference time and the number of trainable parameters. Because there is a concern that the proposed ICE and AFA modules will introduce large number of parameters, and whether the proposed method could achieve real-time segmentation that is of high importance in clinical practice.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I gave the overall score mainly considering the novelty of paper.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposed a hard region enhancement network for polyp segmentation. An information context enhancement (ICE) module on hard regions, an adaptive feature aggregation module and edge & structure consistency are combined to achieve good performance over three benchmark datasets. The reviewers raised several concerns, including comparison with other sample mining method (especially the paper Recasens et al. ECCV 2018 R1 mentioned), unclear contribution of ICE module, further discussion on the different parts of loss, etc. Overall, the reviewers gave all positive comments. Therefore, a decision of provisonal accept is recommended.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

As introduced in the paper, the ICE module is implemented to complement the features for each decoder block to improve the segmentation. Features from encoder block and the previous decoder block can be utilized to segment so-called “easy” regions, and the ICE module mainly provides information for difficult-to-classify pixels identified from the decoder feature and a grid is generated to guide the corresponding feature-level resampling. While in paper [1], a saliency map is generated from a low-resolution version of the input image to guide the image-level resampling for task like fine-grained object classification.

The effectiveness of each part of the combined loss had been examined. For example, the HRE model trained with only supervision loss achieves a mIoU of 91.45% and a Dice of 91.60%, which is lower than the model with combined loss showing the effectiveness. Meanwhile, several kinds of losses have been examined and the performance is not that satisfied compared with current model. Due to the page limit, I didn’t give these experimental result. As for the weight chosen for different parts of the combined loss, I hadn’t conducted experiments about these weight parameters. These will be further investigated in the future work.

Many thanks to all the reviewers for their helpful suggestions. [1] Recasens, A., Kellnhofer, P., Stent, S., Matusik, W., Torralba, A.: Learning to zoom: a saliency-based sampling layer for neural networks. In: Proceedings of the European Conference on Computer Vision (ECCV). pp. 51–66 (2018)

back to top

HRENet: A Hard Region Enhancement Network for Polyp Segmentation