Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Huimin Huang, Nan Zhou, Lanfen Lin, Hongjie Hu, Yutaro Iwamoto, Xian-Hua Han, Yen-Wei Chen, Ruofeng Tong

# Abstract

Semi-supervised learning (SSL) algorithms have attracted much attentions in medical image segmentation due to challenge in acquiring pixel-wise annotations by using unlabeled data. However, most of existing SSLs neglected the geometric shape constraint in object, leading to unsatisfactory boundary and non-smooth of object. In this paper, we propose a shape-aware semi-supervised 3D medical image segmentation network, named 3D Graph-S2Net, which incorporates the flexible shape information and learns duality constraints between semantics and geometrics in the graph domain. Specifically, our method consists of two parts: a multi-task learning network (3D S2Net) and a graph-based cross-task module (3D BGCM). The 3D S2Net improves the existing self-ensembling model (i.e., Mean-Teacher model) by adding a signed distance map (SDM) prediction task, which encodes richer features of object shape and surface. Moreover, the 3D BGCM explores the co-occurrence relations between the semantics segmentation and SDM prediction task, so that the network learns stronger semantic and geometric correspondences from both labeled and unlabeled data. Experimental results on the Atrial Segmentation Challenge confirm that our 3D Graph-S2Net outperforms the state-of-the-arts in semi-supervised segmentation.

# Link to paper

SharedIt: https://rdcu.be/cyl2J

N/A

N/A

# Reviews

### Review #1

• Please describe the contribution of the paper

The paper presents a semi-supervised learning strategy for 3D medical image segmentation, which integrates a CNN-based segmentation network with a graph convolutional network and trains them in the mean-teacher framework. For the CNN part, it adopts a multi-task design that predicts both segmentation mask and SDM. The proposed method is evaluated on the Left Atrium dataset with comparisons to the prior work.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• The paper presents a hybrid neural network trained with the mean-teacher strategy, which seems reasonable for the semi-supervised segmentation.
• The proposed method achieves good results on the Left Atrium benchmark.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• The novelty of this work is low in several aspects and the paper lacks a discussion on the related work:
• The two-branch design of segmentation mask and SDM has been used in [8].
• The graph-based feature enhancement is widely used in the semantic segmentation literature, such as non-local neural networks (CVPR 2018), and latent GNNs (ICML 2019).
• The mean-teacher framework is proposed in [10] and a two-branch segmentation network with MT training has been validated in [8].
• The presentation of the main component, the BGCM network, is unclear, and adopting such a complex design is not well justified.
• In Sec 2.2, some of the notations are not clearly defined. What are ‘C’ and ‘L’ in the graph projection?
• Why do you need a bilateral graph? Why not simply use a non-local neural network?
• Experimental evaluation is not very convincing due to the following reasons:
• While the paper presents an ablative study on its BGCM module, it lacks comparisons with the default non-local graph neural network.
• The overall improvement over the prior SOTA [8] is underwhelming, around 1% in both 16 and 8 labeled settings. For the visualization, it seems to be a biased selection of results, as it shows the UA-MT is better than SASSNet but in Table 2, UA-MT is worse than SASSNet on average.
• Only one dataset is evaluated and the paper should include at least one more dataset for testing.
• Please rate the clarity and organization of this paper

Satisfactory

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The submission include a code but some of the paper presentation lacks clarity.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

See item 4 above for detailed comments.

• Please state your overall opinion of the paper

probably reject (4)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper has several over-claims on its contributions, and its experimental evaluation lacks convincing results to justify its model design.

• What is the ranking of this paper in your review stack?

3

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

### Review #2

• Please describe the contribution of the paper

This work proposes a shape-aware semi-supervised 3D medical image segmentation framework called Graph-S^2Net. 3D Graph-S^2Net mainly consists of two parts: a 3D S^2Net that is formulated as a mean-teacher multi-task learning network that tries to predict both the segmentation map and signed distance map; and a 3D BGCM that is formulated as a bilateral graph convolutional network that tries to explore the relations between the semantic maps and signed distance maps. By conducting extensive experiments, the authors show that the proposed method outperforms the state-of-the-art methods on the Atrial Segmentation Challenge dataset.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) The paper proposes different auxiliary tasks for semi-supervised medical image segmentation such as predicting signed distance map (handled by 3D S^2Net), and semantic segmentation and signed distance map correspondence (learned by 3D BGCM); it is surprise to see all of these tasks works well together and contribute the final performance improvement. (2) The paper does a good job on ablation study to show the effectiveness of each model component. (3) Overall, the paper is well written and clearly presented to readers. Figures are well made to illustrate the main network architectures and experiment results.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

(1) By reading this paper only, it is a little bit hard for readers to see how the 3D BGCM is trained and get updated. Upon checking the code, if I understand it correctly, the training of 3D BGCM is also guided by Lsg and Lsdm. This point should be made clearer in the paper. (2) The motivations of some operations inside 3D BGCM is not well described. For example, the operations of graph projection that maps the feature map X onto a set of node features H in the graph domain that requires conv, anc, multiplication and softmax layers is not well motivated. Admittedly, I am not an expert on graph projection, therefore would love to know more motivations behind this design. Is this the common way to do a graph projection? If so, what are the purposes of each step? If not, why is this design better than other graph projection methods. I think it would help with the clarity of this paper and better motivate the readers. (3) The author only provides model evaluation on a single dataset, which may not be sufficient enough to validate the model effectiveness. It will make this paper stronger if the authors can provide other validation from different datasets.

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors provide code for this submission which consists the construction of the proposed model architecture. This will help increase the reproducibility of the paper. However the whole training code is not provided. It would be a great help for its reproducibility if the complete code is provided.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Please find the comments in section above. Other minor suggestions are: (1) For the SSL experimental results, how did the authors get them? Do the author run it multiple times and obtain the mean of the validation metrics? Can the author also report mean and variance of different runs and compare them with 16 labeled and 8 labeled cases? Usually, we will see a larger variance if the labeled data is too limited in SSL. (2) It may be better to plot the horizontal and vertical axis in Fig 3 (c) to make it self-explained, although the authors state them in the figure caption.

• Please state your overall opinion of the paper

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall this paper is clearly written and well motivated. The proposed method 3D Graph-S^2Net is novel and proven to be effective in 3D segmentation tasks. Extensive experiments are done to show it surpasses the current state-of-the-art methods.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

### Review #3

• Please describe the contribution of the paper

(i) They propose a 3D Graph-S2Net to enforce semantic and geometric constraints in semi-supervised medical image segmentation. It combines a multi-task learning framework (3D S2Net) and a graph-based cross-task module (3D BGCM) reasoning between tasks. (ii) They propose a 3D BGCM to enforce duality constraints between semantics and geometrics by using bilateral graph convolution, which globally mines the intra-task and inter-task relations.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The author propose a 3D BGCM to enforce duality constraints between semantics and geometrics by using bilateral graph convolution, which globally mines the intra-task and inter-task relations.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1, I’m not sure if you modifications the templates, but I find the space near some functions and figures is smaller than other papers. On the MICCAI2021 website, a document said that “Using commands like \vspace and \hspace in LaTeX is strictly prohibited.” and “No modifications to the templates are permitted. Failure to abide by the formatting guidelines will result in immediate rejection of the paper.” https://miccai2021.org/files/downloads/MICCAI2021-Submitting-to-MICCAI-Avoiding-Desk-Reject.pdf 2, Some papers have already optimized the segment label and SDM together, hence it may not your contribution.

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors have already published their code.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

1, The compared method in your experiment is all implemented by yourself? You’d better reference some paper’s results using the same dataset. 2, I think your method would consume much GPU memory, you’d better list your hardware environment.

• Please state your overall opinion of the paper

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

They combine the CNN and GCN in the same framework and outperform the state-of-the-art methods.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper presents a semi-supervised 3D medical image segmentation network，including a multi-task learning framework and a graph convolutional network. The multi-task learning framework adds a SDM prediction task to improve the existing self-ensembling model. The graph convolutional network tries to explore the relationship between the semantics segmentation and SDM prediction task. The proposed method is evaluated on the Left Atrium dataset with comparisons to the prior work. The paper presents a hybrid neural network that works well together and contributes to the final performance improvement. The motivations of some operations inside 3D BGCM are unclearly described, and whether such a complex design is necessary or justified. The results of testing on only one dataset were not convincing enough.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

# Author Feedback

Comment 1: The motivations of some operations inside 3D BGCM are unclearly described, and whether such a design is justified. (Reviewer #2, Meta Review) Compared with previous graph-based methods, there are mainly two novelties of 3D BGCM: (1) bilateral graph reasoning models the intra- and inter-task relations, which achieves higher performance (justified by our experiments). The detailed motivation can be referred to [Comment 3]. (2) graph projection aggregates pixels with similar features as each anchor to one node, which has less computational cost (justification is shown in following step 2.(2)). The motivation of each operation (step) inside graph projection is summarized as below: Step 1: we reduce the size of input feature map $X$ via convolution ($\phi$ and $\rho$) with stride $\gamma$. Step 2: we use an average pooling ($Anc$) with stride $\varepsilon$ to obtain anchors of nodes. These anchors represent centers of each region of pixels, whose benefits are twofold: (1) The pooling leads to compact representations by averaging over features to remove the redundancy. (2) Benifiting from stride $\varepsilon$ and $\gamma$, the computation cost in Step 3 is reduced from $O(H^2W^2T^2C)$ in non-local modules to $O(H^2W^2T^2C/(\gamma\varepsilon )^3)$. Step 3: we take the multiplication of $\phi(X)$ and anchors to capture the similarity between anchors and each pixel, and obtain the projection matrix $P$. Step 4: we use softmax layer to constrain the range of $P$ to (0,1). Step 5: we map $X$ into the graph domain by multiplying $\rho(X)$ and $P$, thus each node represents a region in the images.

Comment 2: Experiments on one dataset were not convincing enough. (Reviewer #1, 2, Meta Review) Additional experiments on pancreas were conducted to further illustrate the generalization performance of our model. Here we used NIH pancreas segmentation dataset [3] (62 CT scans for training, 20 CT scans for testing), which is widely used in semi-supervised segmentation. The comparison results with 12 labelled training scans are shown in below and our 3D Graph-S2Net outperformed other existing methods. V-Net: (Dice: 70.63%, Jaccard: 56.72%, ASD: 6.29, 95HD: 22.54) EM [28]: (75.31%, 61.73%, 3.88, 11.72) CCT [29]: (76.58%, 62.76%, 3.69, 12.92) MT [10]: (75.85%, 61.98%, 3.40, 12.59) UA-MT [12]: (77.26%, 63.82%, 3.06, 11.90) SASSNet [8]: (77.66%, 64.08%, 3.05, 10.93) Our 3D Graph-S2Net: (78.77%, 65.13%, 2.08, 8.17)

Comment 3: Why do you need a bilateral graph, instead of non-local network? (Reviewer #1) There are mainly two advantages of our bilateral graph: (1) Different from the non-local network that only captures intra-task relations, our bilateral graph convolution module can mine the intra- and inter-task relations simultaneously, which explores co-occurrence relations and diffuse information between tasks. (2) Less computational cost: Given the feature map $X\in \mathbb{R}^{H\times W\times T\times C}$, the non-local network needs large computation of $O(H^2W^2T^2C)$; while our graph convolution has a less complexity of $O(|N|^2C+|N|C^2)$, where $|N|$ is the number of nodes ($|N|\ll HWT$).

Comment 4: The novelty of this work is low in several aspects. (Reviewer #1) Compared with previous work, there are mainly two novelties of our method: (1) We observe the duality constraints between segmentation and SDM prediction: the segmentation provides the smoothness and continuity constraints; while the SDM enforces a global shape and boundary constraints. Inspired by this, we design a 3D BGCM to explore co-occurrence relations and diffuse information between tasks, which achieves better result than Ref.[8] in Table 2. (2) Compared with non-local network, our graph-based method has less computational cost and directly reason over regions with explicit semantic meaning [15, 16, 20]. Additionally, our bilateral graph module can also mine the intra- and inter-task relations simultaneously, which is lacking in non-local and latent GNNs.

# Post-rebuttal Meta-Reviews

## Meta-review # 1 (Primary)

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper presents a semi-supervised 3D medical image segmentation network，including a multi-task learning framework and a graph convolutional network. The multi-task learning framework adds a SDM prediction task to improve the existing self-ensembling model. The graph convolutional network tries to explore the relationship between the semantics segmentation and SDM prediction task. The authors did well in rebuttal. Most of the questions and concerns raised by reviewers (including meta reviewer) are well addressed.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

## Meta-review #2

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The proposed method 3D Graph-S^2Net seems novel and effective in 3D segmentation. Extensive experiments have showed its superior performance over some state-of-the-art methods.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

## Meta-review #3

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposes a semi-supervised segmentation approach that leverages geometrical constraints. This is enforced by correlating a predicted signed distance function with semantic segmentations using graph super-nodes.

One reviewer questions novelty to [8] (two-branch design with distance fct and segmentation), comparison choices, and note confusion in terms.

A second reviewer note “surprising that all tasks work well together”, questions motivation and explanation of the cross-talk module.

A third reviewer questions novelty of the dual branch, and also complains about spacing violation.

A consensus is raised on use of existing methods, lowering novelty claims, however, their combination shows improvements in atrial segmentation, a difficult task. This is partially addressed in the rebuttal since the use of distance function and segmentation have been proposed in [8]. Improvements appears to be due to a use of a bilateral graph, which may be insufficient to claim strong novelty. The authors have also provided an additional experiment, which is beyond the role of a rebuttal, final experiments should be provided at submission time.

For these reasons, Recommendation is toward Rejection.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

21