Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Song Wang, Yuting He, Youyong Kong, Xiaomei Zhu, Shaobo Zhang, Pengfei Shao, Jean-Louis Dillenseger, Jean-Louis Coatrieux, Shuo Li, Guanyu Yang

# Abstract

Renal chamber segmentation on CT images is of great significance to the diagnosis and treatment of kidney diseases. However, due to the particularity of 3d renal CT images, it is difficult to do fine annotation. Therefore, it is necessary to develop a weak supervision method to train the segmentation model on the weakly supervised dataset. In this article, we propose a novel deep learning framework, named the Cycle Prototype Network. This framework has three main contributions: 1) Proposed a cyclic prototype learning framework for weakly supervised learning, which can form regularization through the reverse prediction from query set to support set to make model more robust in the weakly supervised scenario; 2) Proposed a Bayes weakly supervised learning module based on multimodal prior knowledge, which can learn prior knowledge from multimodal unlabeled data and perform error correction autonomously, achieving the purpose of expanding the training set size and improving the generalization ability of the model; 3) Introduced fine decoding feature extracting network combining location information and fine-grained feature information. It has better learning ability for detailed features. We verified the cycle prototype network on a data set of 60 three-dimensional CT images, the dice for cortex and medulla prediction reached 0.7936 and 0.7905 respectively, which is a significant improvement compared to other methods. These results show that our proposed method has great application potential.

SharedIt: https://rdcu.be/cyl26

N/A

N/A

# Reviews

### Review #1

• Please describe the contribution of the paper

In this paper, the authors propose three innovative structures to solve the problems of kidney segmentation caused by unclear boundary, thin structure and large anatomy variation in 3D CT images of kidney. The Cycle Prototype Learning is used to improve the robustness of the model. The Bayes Weakly Supervised Module generates accurate pseudo labels. The Fine Decoding Feature Extractor combines global information and local information to achieve fine segmentation. The experimental results in the paper show that the proposed framework achieves improvement by about 20% than the typical prototype model PANet.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This paper presents a Fine Decoding Feature Extractor (FDFE) for fine-grained feature extraction. It combines global morphology information and local detail information to obtain feature maps with sharp detail, so the model will achieve fine segmentation on thin structures.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

This paper is based on the cycle network, however the comparison methods do not involve this kind of method.

• Please rate the clarity and organization of this paper

Satisfactory

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The author can make the code public.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Comments Regarding the 3D Renal Compartments Segmentation on CT Images, this paper proposes a weakly supervised learning framework, Cycle Prototype Network. In this paper, the authors propose three innovative structures to solve the problems of kidney segmentation caused by unclear boundary, thin structure and large anatomy variation in 3D CT images of kidney. The Cycle Prototype Learning is used to improve the robustness of the model. The Bayes Weakly Supervised Module generates accurate pseudo labels. The Fine Decoding Feature Extractor combines global information and local information to achieve fine segmentation. The experimental results in the paper show that the proposed framework achieves improvement by about 20% than the typical prototype model PANet.

My concerns regarding the proposed framework are listed as follows.

1. What are CTA and CTU images?
2. It is mentioned in Section 2.2 that the prior knowledge extraction process uses the different appearance of compartments in CTA and CTU images to produce a prior prediction of renal compartments in this paper. The definition of prior knowledge here is rather vague. If we can give an accurate text definition or picture description, it will help to enhance the persuasiveness of the paper.
3. It is mentioned in Section 2.3 that a combination of morphology and detail information makes the output feature maps have sharp detail features. However, in Fig. 2 (c), there is no morphology shown. The following text does not point out in detail how morphology is combined with detailed information to solve the problem of thin structure. It would be better to talk about it at this point.
4. What is the difference between FDFE and U-Net? Why can the FDFE extract the preferable feature? Please justify the contribution of this paper.
5. Do FDFE share parameters during forward and reverse?
6. As mentioned in “CPL structure for consistency regularization,” support and query are different samples, how to predict one with a label to another without a label?
7. Is a sample a set of CT images?
8. What is the role of mask average pooling?
9. The training processing is confused, only 4 images in the training set? By the way, the title involves “….Weakly-supervised …”
10. In Table 1, I notice that the result of U-Net without prototypical methods has reached Dice of 72.7%. If combined with prototypical methods, U-Net maybe exceeds the current best Dice of 78.4%.
11. This paper is based on the cycle network, however the comparison methods do not involve this kind of method.

borderline reject (5)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper condutes the renal compartment segmentation, this application is novel, but the structure of the paper is not well organized.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

2

• Reviewer confidence

Very confident

### Review #2

• Please describe the contribution of the paper

The Authors propose A Cycle Prototype Learning (CPL) framework and a Ccle Prototype Network (CPNet) which consist of a so called Bayes Weakly Supervised Module (BWSM), and a Fine Decoding Feature Extractor (FDFE). It is highly influenced by the PANet paper. The authors use own dataset and train only on 4 out of 30 well annotated images achieving a dice score of 79,1%.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Interesting problem - weakly annotated datasets are common in the medical and bio medical field. The qualitative and quantitative results seem good. It’s interesting, that only four fully annotated images were needed. It’s a good example and variation of the PANet suited for their problem.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper tackles many things and it’s hard to say what the real impact is here. It lacks a more detailed ablation study. It seems that methods used for comparison are not well suited.

• Please rate the clarity and organization of this paper

Satisfactory

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The method will be hard to reimplement, since it requires a specific data and the data is not public.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

A better ablation study focusing on the most impactful part of work would be great to see.

borderline accept (6)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method seems to work well. It seems to be a good extension of the PANet. I am not convinced if the comparison is fair, though.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

3

• Reviewer confidence

Somewhat confident

### Review #3

• Please describe the contribution of the paper

This paper introduces a novel weakly supervised learning approach for 3D renal compartment segmentation. A novel Cycle Prototype Learning (CPL) is proposed to learn consistency, a Bayes Weakly Supervised Module (BWSM) to produce accurate pseudo labels, and a Fine Decoding Feature Extractor (FDFE) to learn fine-grained features. I find the idea of this paper is interesting and novel, and the proposed method achieves promising results.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• This paper is well-written and structured, and explains all novel designs clearly.
• The idea of cycle prototype learning is novel and interesting and novel;
• The use of prior knowledge Bayes theory to improve the generalization ability makes sense.
• FDFE combines location information and detail information to extract fine-grained features, which can capture local details more reliably.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Generally, this paper is interesting and well written. Here, we only provide some minor comments. It is not clear why the proposed method can improve the generalization ability. It would be much better if the authors could also provide some experiments to highlight this point? The codes and models are not provided.

• Please rate the clarity and organization of this paper

Excellent

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The codes and models are not provided.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

This paper introduces a novel weakly supervised learning approach for 3D renal compartment segmentation. A novel Cycle Prototype Learning (CPL) is proposed to learn consistency, a Bayes Weakly Supervised Module (BWSM) to produce accurate pseudo labels, and a Fine Decoding Feature Extractor (FDFE) to learn fine-grained features. I find the idea of this paper is interesting and novel, and the proposed method achieves promising results. However, it is not clear why the proposed method can improve the generalization ability. It would be much better if the authors could also provide some experiments to highlight this point?

strong accept (9)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Generally, I like the idea of this work. I feel the use of weakly supervised learning approach for 3D renal compartment segmentation is very interesting. The experiments clearly show its superiority compared to existing alternatives.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Very confident

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The authors introduce a pipeline for kidney segmentation, in which the core of the framework is based on a weekly supervised learning. The authors introduced a novel Cycle Prototype Learning (CPL) to learn consistency, a Bayes Weakly Supervised Module (BWSM) to produce accurate pseudo labels, and a Fine Decoding Feature Extractor (FDFE) to learn fine-grained feature. Local data sets were used for evaluation and training used only on 4 out of 30 well annotated images. They achieved a dice score of 79,1% with around 20% increase compared with the prototype model PANet [16]. Overall, the paper quality is satisfactory and touches base on an important problem for biomedical image analysis, i.e., weakly annotated datasets. Evaluation seems promising as well. However, there are some points to be addressed to highlight the paper significance. For the data in table 1, provide the p-values of an appropriate statistical test. Another point is that what is the minimum number of training data sets for acceptable performance, Fig.5 (b) is not very informative beyond n=5. This should be addressed or at least discussed in the paper. Please add more details about the mask average pooling, benefits, effect, etc. Please add the symbols (is, qs, ys, etc.) to Figure 2 Moreover, the authors introduce three main components/structures and the results lacks details about the ablation study in Table 1 to see the impact. One more point is that compared methods are standard deep learning method and the work is based on the cycle network, Table 1 does not contain this kind of network.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

# Author Feedback

We want to thank all three reviewers and AC for their very positive appreciation of our work: a)Large innovation.(R1-“innovative structures”, AC-“novel”, R3-“idea of this paper is interesting and novel”.) b)Clinical significance.(R2-“Interesting problem”, AC-“important problem for biomedical image analysis”.) c)Great writing.(R3-“well-written and structured”, “explains clearly”, AC-“quality is satisfactory”.) d)Great performance.(R2-“qualitative and quantitative results seem good”, R3-“improve the generalization”.)

Thanks AC for complete summary and great suggestion:

-Why improve generalization(R3) Our innovations significantly improve generalization via consistency regularization, pseudo-labels generation and fine-grained feature extraction. Our CPL structure forces the features consistent in forward and reverse process making a consistency regularization for great generalization. Our BWSM enlarges training dataset via combining priori knowledge with unlabeled data making better generalization. Our FDFE spatially align the features from different paths for fine-grained feature fusion, thus achieving better generalization for thin structures in large space.

-Misunderstanding of our comparison(R1, R2) a)Actually, the cycle network has been compared in our experiment. PANet in Tab.1 is a cycle network. It takes a reverse process which has the typical characteristics of cycle structure, although they did not claim or analyze it in their study. Our model has 22.4% Dice higher than PANet, owing to our further study on the prior knowledge from unlabeled data and the fine-grained features in FDFE. The PANet only works on fully labeled dataset, making large limitation.

b)Our comparisons are convincing. We compared two classical deep learning networks (Unet, Segnet) making convincing proof for the superiority of our CPL in our complex and few-label situation. We compared a classical cycle-based prototype network (PANet) making convincing proof for great generalization from our FDFE and BWSM.

-Misunderstanding of our ablation studies(R2) Actually, we have analyzed the details of our ablation study (Tab.1) in the “Analysis of innovations” section, illustrating the impacts of our innovations clearly. Our BWSM embeds priori knowledge from unlabeled images, making 2.2% Dice improvement. Our FDFE extracts fine-grained features compared with the PANet, thus achieving 20.6% improvement. Compared with the standard DL models, the cycle prototype structure makes 18.9% improvement owing to the cycle regularization.

-Our informative ablation of label amount(AC): a)Our ablation(n<6) is informative for the illustration of our performance changing. Fig.5(b) have demonstrated the increase of model performance when label amount increases. The increase becomes slower and tend to be flat when the n up to 4.

b)Our ablation(n<6) is informative to give evidence for our label amount setting in our few-label task. As demonstrated in Fig.5(b) and emphasized in our paper, the performance increase tend to be slow when n=4, so 4 is the minimum number of training data sets for acceptable performance.

-Thanks for other minor suggestions: *CTA is CT Angiography, CTU is CT Urography[doi: 10.1148/rg.263055186, 10.1007/s00330-007-0792-x].

*Prior knowledge is the grayscale difference caused by the contrast agent at different stages of metabolism (CTA, CTU). Our BWSM extracts this difference to generate pseudo labels, thus improving model’s generalization.

*FDFE shares parameters in our CPL structure making consistency regularization.

*Our FDFE outperforms U-net. Our FDFE spatially aligns the global features in decoder with the local features from skip-connection via up-pooling, thus making fine grained feature fusion. While the U-net loses spatial alignment in up-sampling limiting the fusion of fine features.

*Masked average pooling is used to extract prototypes which represents the classes from the features in same class’s region.

# Post-rebuttal Meta-Reviews

## Meta-review # 1 (Primary)

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have properly clarified the concerns/issues raised.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

## Meta-review #2

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal has clarified several major concerns raised in the first round review. The authors have compared with a cycle network via the comparison with PANet. The ablation study has demonstrated the improvement of the proposed method.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

## Meta-review #3

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal clearly addressed concerns raised during review.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3