Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Xiaoqi Zhao, Lihe Zhang, Huchuan Lu

# Abstract

More than 90\% of colorectal cancer is gradually transformed from colorectal polyps. In clinical practice, precise polyp segmentation provides important information in the early detection of colorectal cancer. Therefore, automatic polyp segmentation techniques are of great importance for both patients and doctors. Most existing methods are based on U-shape structure and use element-wise addition or concatenation to fuse different level features progressively in decoder. However, both the two operations easily generate plenty of redundant information, which will weaken the complementarity between different level features, resulting in inaccurate localization and blurred edges of polyps. To address this challenge, we propose a multi-scale subtraction network (MSNet) to segment polyp from colonoscopy image. Specifically, we first design a subtraction unit (SU) to produce the difference features between adjacent levels in encoder. Then, we pyramidally equip the SUs at different levels with varying receptive fields, thereby obtaining rich multi-scale difference information. In addition, we build a training-free network LossNet’’ to comprehensively supervise the polyp-aware features from bottom layer to top layer, which drives the MSNet to capture the detailed and structural cues simultaneously. Extensive experiments on five benchmark datasets demonstrate that our MSNet performs favorably against most state-of-the-art methods under different evaluation metrics. Furthermore, MSNet runs at a real-time speed of $\sim$70fps when processing a $352 \times 352$ image. The source code will be publicly available at https://github.com/Xiaoqi-Zhao-DLUT/MSNet.

# Link to paper

SharedIt: https://rdcu.be/cyhLE

N/A

# Reviews

### Review #1

• Please describe the contribution of the paper

This paper proposed a novel multi-scale subtraction network for automated polyp segmentation. In addition, a training-free loss network is implemented for contextual level information supervision. Extensive experiments show the proposed method provides higher performance than recent literature.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• Good organization of the paper, easy to read.
• Extensive experiment with ablation studies and comparison to other methods
• Promising real-time performance for clinical applications.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• Subtraction definition looks interesting, but lack of mathematical explanation. It similar to a ResNet block, which replace + to - with some minor changes. As a consequence, it is difficult to find the difference between subtraction unit and ResBlock. Author should include more discussion of this.
• Concept of LossNet is applied in many applications, and received successful results, also in this paper. But without any fine-tuning, the pre-train ImageNet parameters might have different input domain, i.e. segmentation binary domain v.s. nature RGB domain. I am not sure author’s approach is reasonable. In addition, the L2 distance might not be the optimal metric, as the feature space is a more complex manifold rather than a standard Euclidean space.
• Why all the evaluation results are presented by mean without STD? Without STD, it is hard to find the distribution with statistical difference.
• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Statistical results are not sufficient

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

As discussed in the above weakness, there are still some limitations of the paper. The overall result is OK but still need more efforts to improve the paper.

• Please state your overall opinion of the paper

probably reject (4)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Limited motivation of the component design and experimental results.

• What is the ranking of this paper in your review stack?

3

• Number of papers in your stack

7

• Reviewer confidence

Confident but not absolutely certain

### Review #2

• Please describe the contribution of the paper

This paper proposes a multi-scale subtraction network for polyp segmentation. A subtraction unit is proposed to extract lower-/higher-order cross-level features. The designed loss function is utilized to optimize the results from both structure and details.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This paper designs a novel multi-scale subtraction polyp segmentation network combing the higher-order and lower-order features, which is a simple and easy strategy to follow.

2. The experiment section is sufficient since the proposed model is comprehensively investigated and it shows its superiority among other methods.

3. The authors utilize a new training loss to min the structure and details during the backward phase, which is interesting and useful.

4. The overall results are pretty good and achieve real-time speed.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Motivation of subtraction unit is not clear. The description in the method section only based on the technique motivation but not intuitive motivation.

2. No groundtruth in the fig.1.

3. The publisher of the references are not consistent. Some are abbreviations, some are not. Please check them ([3,21,24]) carefully.

4. Minor issues -The text in the Fig.5 are too small. -More detail description of the title in Fig.2 will be better.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

According to the authors, they will release the training code upon the paper accepted.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

In addition to the weakness presented in sec.4. Other issues should be addressed.

1. Please provide more intutive motivation of the network design.

2. Discussing future work.

3. Providing some failure cases.

• No failure cases.
• Please state your overall opinion of the paper

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. State-of-the-art performance with real-time inference which will provide useful toolbox for doctors.

2. Solid experiments.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Very confident

### Review #3

• Please describe the contribution of the paper

The authors present a novel network for polyp segmentation from colonoscopy images. They introduced a subtraction unit to get the difference between features of adjacent levels and apply it at different levels. The network makes quick predictions and presents good results.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The network architecture is interesting: the nested structure reminds me UNet++, while the authors do not perform multi-level supervision.
2. Tables and figures are well presented.
3. The quantitative results tested with different datasets are quite stable, substantially better than others.
4. Ablation study is performed and showed good results.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. If I understand correctly, the authors use Kvasir and CVC-ClinicDB datasets for training and other datasets as testing. In this case, Kvasir and CVC-ClinicDB are considered as source domains, while others are target domains for domain adaptation/generalization. Considering the quantitative results, the proposed method seems not significantly better PraNet on Kvasir, CVC-T and CVC-ClinicDB. Thus, it looks like a better domain generalization performance on unseen domains, which is not the main goal of this network. This is not discussed.
2. The claimed contribution about LossNet is not convincing. For me, it is very similar to style loss or perceptual loss, which is commonly used in tasks including style transfer, inpainting, etc.
• Please rate the clarity and organization of this paper

Excellent

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors claim they will release the code if the paper is accepted, and the datasets they used are public benchmarks, so it would be easy to reproduce their results.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. Add statistical tests for the results.
2. Further improve the description of network structure. The inspiration of MSNet is not well explained.
3. Acronym like FPN is mentioned but not explained.
• Please state your overall opinion of the paper

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. the network architecture is interesting
2. the overall results are good, different aspects of metrics are considered, ablation study is performed
3. figures and tables are well presented, paper is well organized
• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper develops a Multi-scale Subtraction Network for polyp segmentation. However, the major concern is the novelty of this paper. The major idea is very similar to “Deep Layer Aggregation”, “Deep High-Resolution Representation Learning for Human Pose Estimation”. The difference is that this paper uses subtraction while these papers utilize”+”. The author should illustrate the motivation of subtraction. The concept of LossNet is similar to perception loss, which is applied in many applications and received successful results. A more clear illustration and contribution should be added to clarify the effectiveness of the proposed model.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

# Author Feedback

We thank the reviewers for their helpful comments. Below we answer some key questions. Q1:The motivation and novelty of subtraction unit. A: We indeed use the subtraction unit (SU) to replace the element-wise addition to fuse different scale features. The SU computes the absolute difference of its two input features. In the multi-scale subtraction module, the horizontal cascade of SUs can continually compute the differential features between adjacent scales, and then they are added to the main-scale feature. Just because of the subtraction operation, the resulted features input to the decoder have much less redundancy among different levels and their scale-specific properties are significantly enhanced. Thus, in the decoder, the interference of different levels is weakened and the task-aware region can be more easily attended. While the addition operation makes the feature of each scale fed into the decoder cover the information of two scales (itself and its deeper neighbor), which results in that the decoder needs to handle very redundant features. In addition, the SUs actually build a constraint to extract the multi-level separable information, which helps the network converge faster and better. We replace all SUs with the element-wise addition units and compare their performance on all five datasets. Our MSNet achieves the average gains of 4.43%; 9.19%; 5.05%; 2.10%; 2.27%; 20.07% in terms of mDice, mIoU, wFm, Sm, Em and MAE. The significant improvement further shows that the SU is a very solid contribution. The experimental results on different datasets are listed as follows: ColonDB (mDice: 0.697 vs 0.755, mIoU: 0.630 vs 0.807, wFm:0.676 vs 0.737, Sm: 0.811 vs 0.836, Em:0.839 vs 0.883, MAE:0.055 vs 0.041), ETIS (mDice: 0.680 vs 0.719, mIoU: 0.621 vs 0.664, wFm:0.636 vs 0.678, Sm: 0.817 vs 0.840, Em:0.820 vs 0.830, MAE:0.020 vs 0.020), Kvasir (mDice: 0.872 vs 0.907, mIoU: 0.815 vs 0.862,wFm:0.853 vs 0.893, Sm: 0.897 vs 0.922, Em:0.929 vs 0.944, MAE:0.039 vs 0.028), CVC-T (mDice: 0.868 vs 0.869, mIoU: 0.803 vs 0.807,wFm:0.846 vs 0.849, Sm: 0.921 vs 0.925, Em:0.940 vs 0.943, MAE:0.010 vs 0.010),
ClinicDB (mDice: 0.886 vs 0.921, mIoU: 0.840 vs 0.879,wFm:0.874 vs 0.914, Sm: 0.928 vs 0.941, Em:0.944 vs 0.972, MAE:0.015 vs 0.008).

Q2:The motivaiton and novelty of LossNet. A:LossNet is similar in form to perception loss. Our motivation is different from its. Using LossNet can get rid of the manual and complex design of the loss function. Moreover, high segmentation accuracy is required for the body as well as the contour of the lesion. Because the inputs are binary segmentation masks, LossNet can directly target the geometric features of the lesion and perform joint supervisions from the contour to the body, thereby improving the overall segmentation accuracy. In Fig. 3, the visualized feature maps qualitatively reveal the rationality of our application of LossNet, that is, it can implement the detail-to-structure supervision in the feature levels. In Table 3, it can be seen that LossNet does lead to an important performance improvement. While in the style transfer and inpainting tasks, the perception-like loss is mainly used to speed the convergence of GAN and obtain high frequency information and ease checkerboard artifacts, but it does not bring obvious accuracy improvement. Reviewers have also stated that multiple papers in multiple tasks have used the perception-like loss, but in the binary segmentation task, this work is the first one.

# Post-rebuttal Meta-Reviews

## Meta-review # 1 (Primary)

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper develops a Multi-scale Subtraction Network for polyp segmentation. However, the major concern is the novelty of this paper. The major idea is very similar to “Deep Layer Aggregation”, “Deep High-Resolution Representation Learning for Human Pose Estimation”. The difference is that this paper uses subtraction while these papers utilize”+”. The concept of LossNet is similar to perception loss, which is applied in many applications and received successful results.

Although the author addresses the motivation and illustrate some information of the novelty part, this paper is still just minor modification compared with other methods. The polyp segmenation is a hot topic, which has been developed quite well in the medical image society. A lot of recent papers in MedIA and TMI, should be added for the comparison. Therefore, I would reject this paepr due to the limitation of the novelty.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

15

## Meta-review #2

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The whole paper is well written and verifies the effectiveness and novelty of the proposed method. The authors responded to the major concerns, such as the motivation of utilizing subtraction and LossNet. I think the rebuttal address the question, and it reaches the minimum requirement for publication. Overall, I satisfy the proposed paper.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

## Meta-review #3

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper presents a multis-scale subtraction network for polyp segmentation and experimental results showed clear improvement over other methods. Although I agree with R2 and R3 about the main strengths of this manuscript, I would suggest authors provide more evidences on the mechanism of subtraction and why it could achieve better performance in the final version.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

12