Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Zewen Liu, Timothy Cootes

# Abstract

Many medical and biological applications involve analysing vessel-like structures. Such structures often have no preferred direction and a range of possible scales. We take advantage of this self-similarity by demonstrating a CNN based segmentation system that requires far fewer parameters than conventional approaches. We introduce the Multi Angle and Scale Convolutional Unit (MASC) with a novel training approach called Response Shaping. In particular, by reflecting and rotating a single oriented kernel we can generate four versions at different angles. We show how two basis kernels can lead to the equivalent of eight orientations. This introduces a degree of orientation invariance by construction. We use Gabor functions to guide the training of the kernels, and demonstrate that the resulting kernels generally form rotated versions of the same pattern. Invariance to scale can be added using a pyramid pooling layer.

A simple model containing a sequence of five such blocks was tested on CHASE-DB1 dataset, and achieved better performance comparing to the benchmark with only $0.6\%$ of the parameters and $25\%$ of the training examples. The resulting model is fast to compute, converges more rapidly and requires fewer examples to achieve a given performance than more general techniques such as U-Net.

SharedIt: https://rdcu.be/cyhWg

# Reviews

### Review #1

• Please describe the contribution of the paper

The authors introduce a method for vessel segmentation to better capture curvilinear structure of vessels. Specifically, a multi-angle conv (MAC) unit is used to preserve rotation invariance. MAC unit takes all filters into consideration for the orientation response computation compared to standard methods which simply choose the maximum value from responses of a set of filters. Additionally, a pyramid pooling operator is used to ensure the scale invariance. The proposed method is evaluated on a public dataset. Experiments and ablation analysis demonstrate the effectiveness of the proposed method.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The contributions of the paper are: 1: The authors propose to use self-similarity of curvilinear structure of vessels. To this end, the introduced network requires less parameters than conventional approaches. Typically, different standard methods which take maximum value from filter outputs, the introduced multi-angle conv unit combines all the information of a set of filters.

1. A pyramid pooling layer is applied to ensure scale invariance of vessel segmentation.

2. The authors provide a line of experiments to evaluate the proposed method on a public dataset.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

I found the paper interesting. I have several questions about the paper.

1. The proposed MAC unit is to leverage the self-similarity of curvilinear structure of vessels. However, it is not clear how to utilize the self-similarity information.

2. Similarly, the MAC unit is able to take all the information of a set of filters. But, it is not clear what is the advantage of response shaping?

• Please rate the clarity and organization of this paper

Satisfactory

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors list most of the details of the method in the paper. It should be straightforward to reproduce the results.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

1. The overall presentation can be improved. The newly introduced MAC unit is designed to capture self-similarity of the vessel. From the description, equations and figures, it is difficult for readers to understand the underlying motivations. For example, it would be more clear to compare with the standard method which simply takes the maximum value of the responses from a set of filters.

2. It would be better to provide detailed analysis or description for the figures. For example, it is not clear of the illustration of the bottom right figures. Thus, it is confusing for readers to understand M or m in Eqn. 2 and Figure. 2.

3. Since there are multiple public vessel datasets. It would be great to provide more comparison results on other datasets in the experiments. Also, there are more up-to-date results on CHASE_DB1 dataset.

borderline reject (5)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My recommendation and comments are based on two aspects.

1. It is not convinced that the proposed response shaping is able to capture the self-similarity structure of vessels. Although the simple method takes the maximum value of a set of filter outputs, the network can fuse information of filters information in the whole network.

2. The overall presentation can be improved largely. It is not clear to understand the motivation of the MAC unit design and the difference between the simple steerable methods.

• What is the ranking of this paper in your review stack?

3

• Number of papers in your stack

5

• Reviewer confidence

Somewhat confident

### Review #2

• Please describe the contribution of the paper

The authors have introduced a novel model for analysing curvilinear structures which are composed of self similar elements at arbitrary orientation and scale. The system learns a set of filters which can be transformed easily to produce responses at a range of angles. The authors show that this can be extended to include a range of scales. According to the results, the resulting model is very parameter efficient.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

In the reviewer´s opinion, the main strengths of the paper are: -The arbitrary orientations and scales that can be analysed. -The reduction in the number of parameters. -The competitive results

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

In the reviewer´s opinion, the main weaknesses of the paper are: -Lack of a exhaustive comparison with state-of-the-art methods (also those after the U-net implementation referenced in the paper with the same database) -The system is not completely described. -The authors do not provide the total number of parameters (and a comparison of this number with other systems), and no information of computational time is available. -The results are good, but quite similar to those in the state-of-the-art methods.

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reviewer could not repeat the results as the code is not available.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The paper is quite good, but in the reviewer´s opinion some points should be extended: -There is a lack of a exhaustive comparison with state-of-the-art methods (also those after the U-net implementation referenced in the paper with the same database) -The system is not completely described. -The authors do not provide the total number of parameters (and a comparison of this number with other systems), and no information of computational time is available. -The results are good, but quite similar to those in the state-of-the-art methods. -A discussion section justifying the clear benefits of this structure against the recent (2020 and 2021) ones.

borderline accept (6)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

-There is a lack of an exhaustive comparison with state-of-the-art methods (also those after the U-net implementation referenced in the paper with the same database) -The system is not completely described. -The authors do not provide the total number of parameters (and a comparison of this number with other systems), and no information of computational time is available. -The results are good, but quite similar to those in the state-of-the-art methods. -A discussion section justifying the clear benefits of this structure against the recent (2020 and 2021) ones. The reviewer was not able to repeat the results (no code found).

• What is the ranking of this paper in your review stack?

4

• Number of papers in your stack

1

• Reviewer confidence

Confident but not absolutely certain

### Review #3

• Please describe the contribution of the paper

This article proposes a new unit for CNN networks called MASC Unit. This unit learns oriented filters to specifically handle curvilinear structures such as blood vessels in images. This unit is designed to approximate rotation-invariant filters by sharing filters between similar orientations (up to $\frac{\pi}{2}$ rotations and reflection) to significantly reduce the number of parameters of the model. A scale-invariance strategy is finally used to handle multiscale curvilinear structures. The authors conducted experiments (including ablative studies) on the public CHASE-DB1 dataset of retinal images, and compared their results with the state of the art.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• The novelty of the proposed approach
• The approach provides very good results while using far fewer parameters
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

My main concern is that the authors did not explain their work clearly enough for it to be reproducible. They should better describe their global architecture and the parameters. To that end, the authors may find in the detailed comments a few remarks to help them improve the clarity of their article.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The article lack details on the architecture and parameters to be reproduced.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
• The authors should add a short paragraph or section to describe their global architecture (Fig 1 (a-b)), and explain what the numbers mean in x-x Conv and x-x MASC Blocks.
• The authors do not explain nor discuss the architecture of the MASC Block. This should be also detailed.
• The authors should explain why and how two MAC Units are used per MASC Unit (are they applied on the same input ? do they share weights ?) and how they handle their outputs (what is “message integration” ?).
• Section 3.1 describes the intuition behind the proposed learnt oriented filters. The authors start by describing the simple approach and explain that their approach takes into account the response from all orientations instead of the orientation associated with the strongest response. If I understand correctly, this is their key idea to learn the oriented filters. The authors should better illustrate this point by adding a figure (e.g. what to expect with the simple implementation versus their proposed method).
• Regarding the second initialization strategy, the authors mentioned “During optimization, we vary the parameters defining the Gabor kernel and the elements of $B_i$”. How are the filters $B_i$ initialized ?
• The authors should provide the hyperparameters values used in their work: batch size, number of epoch, the parameters for the Gabor filter initialization etc.
• The authors should consider performing cross validation to improve the statistical significance of their results. Especially as some performance in the ablative study seem very similar.
• For future work it would be interesting to apply the proposed architecture on images containing both curvilinear and non curvilinear structures. I would also be very interested to see this work extended to 3D case.

Overall, the figures are very confused and should be significantly reworked.

• Figure 1
• It is not clear where the MASC Unit is in Figure 1 (b). Figure 1 (b) shows parts of the MASC Unit, such as MAC Unit and the pyramid pooling, which are displayed again in Figure 1 (c). To avoid confusion, the authors should only display MASC Unit in Figure 1 (b).
• Figure 1 (c) is also very confusing. In the text, $R(x)$ is supposed to be the output of one MAC Unit without pyramid pooling, and the normalization factor is supposed to be applied on $R(x)$. However in Figure 1 (c) the normalization factor is applied on the output of the pyramid pooling. The authors should add a separate figure for MAC Unit to clarify this matter.
• The L2 pooling “node” mentioned in the caption should appear in the figure.
• Figure 2 is not clear at all. What does each column represent ? What do both rows in “initial” and “after training” represent ? What do the 5 images names correlation 1, 2… 5 represent ? The authors mentioned that they correspond to the matrix M. Is there one M per MASC Block ? If so, why not 10 (one for each MAC Unit used) ? In any case, the authors should comment on this figure in the text to explain what they observe, both on the graphs and on the images.
• Figure 3 (c) seems interesting but is not commented on anywhere in the text. Please comment on this figure.
• Figure 3 (d) is too small and seems very similar to Figure 2. The associated text states “ The shape response approach has encouraged the kernels to represent rotated versions of the same pattern.”. This is a very interesting statement, but it does not seem substantiated by this figure in its current form. This point should be better explained and illustrated.

Typos and minor corrections

• The acronym MACU should be defined.
• Abstract: “to achieve a given performance than…” ==> “to achieve a better performance than…”
• Sec 3.4 “The input is to two nodes”. I assume there is a typo here but I am not sure what the authors mean. Please clarify.
• Sec 3.4 “By stacking 5 MASC blocks which includes 2 independent 8-directional MAC Unit, a MASC-5-2-8 model is illustrated in 1(a).”. This sentence is not clear, please rephrase it.
• Sec 4: The authors mentioned “the Ground truth of 1stHO”, please explain what “1stHO” is.
• Sec 4: “The plan of testing our algorithm with other tasks and initialize it compatible is in our future work.”. This sentence is not grammatically correct, please rephrase.
• Sec 4, ablation experiments: “As a baseline replaced all MACU blocks with 8-8 convolutional layer with 3 × 3 kernel size (using more parameters), MASC-replaced. This demonstrating the benefits of the MASC Units.”. Please clarify this sentence. Overall, the phrasing of the ablation experiments section should be improved.

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The article is overall well written and I am really excited by this work. Steerable filters were very popular for curvilinear structure segmentation, but they are computationally costly and complex to use in practice. Proposing a deep learning-based implementation is therefore very interesting.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

3

• Reviewer confidence

Confident but not absolutely certain

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

All reviewers recognized the relevance of the proposed approach and efficiency of the latter in terms of parameter reduction and performance. The reviewers suggested to clarify several aspects of the work in terms of justification (capturing the self-similarity of vessels) and pipeline (why and how two MAC Units are used per MASC, applied to which inputs, etc). References to some important works also appear to be missing such as: Bekkers et al. Roto-translation covariant convolutional networks for medical image analysis (2018), in: International Conference on Medical Image Computing and Computer-Assisted Intervention, Springer. pp. 440–448. Andrearczyk et al., Local Rotation Invariance in 3D CNNs (2020), in: Medical Image Analysis, 65(101756)

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

# Author Feedback

We thank the reviewers for their useful comments. We have made minor changes to the paper to clarify some sections as requested. Here we address the three main points in the reviews.

1) What is the motivation for the work? We aimed to design a system to locate curvilinear structures which took advantage of prior knowledge of their symmetries in order to minimise the number of model parameters used. We thus created a set of filters which are encouraged to be rotated versions of a base filter shape. Though MASC is not inspired by the idea of ‘matched filters’, they shared some common characteristics. The MASC approach can also be thought of as a variant of an ‘attention’ mechanism. The directional response vector V can be regarded as the query vector, and the optimal response matrix M can be treated as the keys. Unlike the common self-attention unit, here, the encoders for query and key are the same. Also, it does not use context information to compute keys but uses a learned bank of patterns, the directional patterns. Comparing to ViT(vision transformer) on ImageNet, a medical task typically does not contain too many different kinds of objects. This mechanism is different from the ‘self-attention’. In the paper, we describe it with ‘self-similar’, and named it as response shaping.

2) Explain the MASC Unit more clearly, particularly the response shaping. The general pattern of curvilinear structures was part of the prior knowledge encoded in the model. We leverage this strong prior assumption by initializing the filters with Gabor like ridged shapes. A natural approach would be to apply each filter at a point and choose the maximum response as the main output. However, when training such a system the information from each example only contributes to one of the filters at a time. As improvement, the model does not only encourage the convolutional responses to be strong but also expects the responses with a precise distribution among direction channels, this process is the response shaping. The distribution is defined in the matrix M. Comparing to the approach of taking the simple maximum, where only single directional response is used, a MASC unit focuses on the distribution and reuses the information from the unmatched directions. These unmatched directional kernels are not some random filters, their responses are also informative. Depicting the input pattern with N correlation scores related to N other known patterns(filters) is more accurate than only using the maximum. From the aspect of gradient flow, the proposed method can include more parameters in propagation, which strongly benefits the training.

3) Availability of code: We will make a reference implementation of the code publicly available.