Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Himashi Peiris, Zhaolin Chen, Gary Egan, Mehrtash Harandi

Abstract

Segmentation of images is a long-standing challenge in medical AI. This is mainly due to the fact that training a neural network to perform image segmentation requires a significant number of pixel-level annotated data, which is often unavailable. To address this issue, we propose a semi-supervised image segmentation technique based on the concept of multi-view learning. In contrast to the previous art, we introduce an adversarial form of dual-view training and employ a critic to formulate the learning problem in multi-view training as a min-max problem. Thorough quantitative and qualitative evaluations on several datasets indicate that our proposed method outperforms state-of-the-art medical image segmentation algorithms consistently and comfortably. The code is publicly available at https://github.com/himashi92/Duo-SegNet.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_40

SharedIt: https://rdcu.be/cyl2K

Link to the code repository

https://github.com/himashi92/Duo-SegNet

Link to the dataset(s)

https://www.kaggle.com/c/data-science-bowl-2018

http://medicaldecathlon.com/

Reviews

Review #1

Please describe the contribution of the paper

This paper aims at semi-supervised medical image segmentation. The authors proposed an adversarial co-training network including two segmentation network and a critic network. The segmentation networks are used for dual-view segmentation and the critic network predicts a confidence map. The proposed method is evaluated on three different medical datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Semi-supervised medical image segmentation is important in real practice.
2. The authors use three medical datasets for evaluation.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The technique novelty is somewhat incremental as the whole framework is very similar to an existing work.
2. The experiment is insufficient to demonstrate the conclusions.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors provide some codes in the supplemental material.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. The paper is not clear about multi-view or dual-view images. Are dual-view images from the same subject but different scanning views or are they random unpaired images from different subjects?
2. The critic network estimate confidence map of predicted segmentation and ground truth segmentation. The adversarial losses (Eq.6 and Eq.7) seems that the final goal of the critic network is used to distinguish predicted segmentation distribution and ground truth segmentation. What is the advantage of using confidence map estimation, instead of using a classifier that directly distinguishes predicted segmentation from ground truth segmentation as adversarial loss?
3. The authors say, “In practice, we find that tuning the hyper-parameters of the network is not difficult at all and the Duo-SegNet works robustly as long as these parameters are defined in a reasonable range.” However, no experiment shows how sensitive the network regarding to the change of hyperparameters and no explanation about how to decide the “reasonable range”.
4. Paper [8] which the authors introduced and cited in the related work has a similar framework architecture (same co-training networks, dual-view setting and Lu loss) to address the same problem (semi-supervised image segmentation). However, no comparison of the proposed method with paper [8] has been shown in the experiment.
5. Similar to above point, the authors say that in paper [8] “the diversity is achieved via adversarial examples following VAT idea. We note that adversarial examples, from a theoretical point of view, cannot guarantee diversity… ” and the authors then explained the superiority of their method in contrast to [8]. I think it would be better to compare quantitative or qualitative results of the proposed method and paper [8] to demonstrate this point of view.
6. From Table 1, the improvement is quite subtle when compare the proposed method with VAT on all datasets (in most cases the dice improvement is about +0.01). Is this a significant improvement?
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The novelty of this paper is a little incremental as the main framework is similar to paper [8]. The experiment is insufficient to demonstrate their conclusion.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper introduces a semi-supervised medical image segmentation framework, which is composed of two parallel U-Nets for co-training and a critic network to compute an adversarial loss. Given labeled data, the two parallel U-Nets are trained with normal supervised loss and adversarial loss; given unlabeled data, in addition to the adversarial loss, a similarity loss is defined on the outputs of the two U-Nets, which has a similar form to cross-entropy. The segmentation models and the critic are trained in a min-max manner. The proposed method is evaluated on three datasets with relatively small size, on which it outperforms several other semi-supervised learning approaches.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The combination of co-training and adversarial loss is novel.
2. Employing a critic network to provide supervision signal for unlabeled data is shown to be very effective.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The employed datasets are relatively small, and only 20% in each datasets are used for testing, which makes the results less convincing.
2. The compared methods are all generic machine learning or CV methods, which may have not been well tuned for medical datasets. In fact, there have been quite a lot semi-supervised learning methods in medical image segmentation, ranging from post-processing-based method [1], adversarial/attention-based method [2] to co-training-based method [3], etc. Why non of these semi-supervised methods for medical image segmentation is mentioned?
[1] Bai, W., Oktay, O., Sinclair, M., Suzuki, H., Rajchl, M., Tarroni, G., … & Rueckert, D. (2017, September). Semi-supervised learning for network-based cardiac MR image segmentation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 253-260). Springer, Cham.

[2] Nie, D., Gao, Y., Wang, L., & Shen, D. (2018, September). ASDNet: attention based semi-supervised deep networks for medical image segmentation. In International conference on medical image computing and computer-assisted intervention (pp. 370-378). Springer, Cham.

[3] Xia, Y., Liu, F., Yang, D., Cai, J., Yu, L., Zhu, Z., … & Roth, H. (2020). 3d semi-supervised learning with uncertainty-aware multi-view co-training. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (pp. 3646-3655).
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility is good.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. The proposed method is called a dual-view approach. However, from Algorithm 1, it seems that the two U-Nets are actually trained with different data sequences rather than different views, which may compromise the power of the co-training paradigm.
2. In the ablation study, only the efficacy of the critic is studied. And it can be observed that the model performance degrades by a large margin without the assistance of the critic, falling behind both VAT and mean teacher. I am curious what role the co-training is playing in the model. Is the co-training structure necessary when the critic has presented?
3. It is mentioned that the 3D volumes are sliced into 2D images before fed to the model, then how the final DSC and MAE are obtained? Are them computed based on 2D slices or 3D volumes?
4. Similarly, when using 5% of the training data, does that mean 5% of total slices, or you first pick 5% of all subjects and then convert the subjects to slices? If it means 5% of all slices, there is a risk of information leakage since the slices from the same 3D scan can be very similar (especially when they are near to each other).
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The combination of critic and co-training models is novel and interesting, and semi-supervised learning is indeed a valuable research direction. Hence I hold a positive attitude to this paper. However, as mentioned above, there is a lack of comparison to related works in MIA area, and the small sizes of the employed datasets also flaw the reliability of the results. Overall, I recommend a borderline acceptance for this paper.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

This paper presents a novel semi-supervised medical image segmentation network. It formulate the problem of multi-view learning, which originally developed for classification scheme, to adapt it for segmentation scheme. Since multi-view learning require learning multiple hypothesis, hence the authors proposed a dual U-Net network (F1 and F2), where both their outputs is connected with a critic network (which simply is a discriminator) to ensure that the outputs of the dual U-Net are close to the ground-truth distribution and more importantly to establish some kind of agreement loss between multiple views. Such formulation permit to generate similar predictive distribution for both labeled and unlabeled data. Moreover, the way F1 and F2 are connected to each other and trained simultaneously, permit learning from one another during training. The authors validate their work on 3 different public datasets of different modalities and different annotation setting, showing enhanced results.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors presented a novel algorithm for semi-supervised segmentation, where labeled and unlabeled data are used. The novelty lies in adapting multi-view learning scheme from classification to segmentation and adding a critic network that stand for quantifying the agreements or disagreement across the views. The views here are somewhat similar to ensemble models, where we can train multi-networks and get multi-prediction, with main difference is that the learned hypothesis in multi-view learning are integrated within each other in which one network assist the prediction of the other network. The authors used U-net as basic architecture to build their dual network. Each network is fed with labeled and unlabeled data. Using the labeled data, supervised loss is used to guide the network. Using the unlabeled data, the author established a symmetric form of cross entropy and made it act as an agreement loss during dual view training, which permits using unlabeled data. The main idea behind this loss is that the 2 segmentation heads are going to generate similar segmentation masks for unlabeled data. This idea is novel and valid, as it can help the community who relies on pseudo labeling to adapt such approach to correct the noisy prediction. The authors leveraged professionally the min-max formulation to produce high confidence prediction for un-labeled data. Such formulation permit identifying confident parts of prediction mask and therefore it allows to enforce agreement across the views.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The critic network as well as the dual U-Net networks seems to depends heavily on the ground truth distribution. Therefore, we would expect such module to behave bad if not enough labeled samples are provided. Theoretically, the total number of training samples must be greater than the total number of unlabeled samples to be able to capture the data distribution in such formulation. However, in this paper, we see that the network is able to learn from very few examples such as under 5% of annotations, that is around 26, 90 and 146 annotated labels per task are only used compared to thousands of unlabeled data. My concern arise from the perspective that the authors must focus on this issue and clearly clarify why the network did not fall in the Bias or overfitting problem, how the network was able to capture such a complicated data distribution? Any kind of implicit regularization done at the level of adversarial learning? Does the network use inherently pseudo labels with high confidence map?
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors clearly presented their approach and their loss functions. The algorithm is presented in page 5. The author should clarify the architecture presented in Figure 1 in the caption, even if they mention that it relies on U-Net. The data used in this paper are publicly available and the authors provided the citation and the link. The authors provided a code and i would imagine they will make it publicly available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. The concept of Multi-view learning must absolutely be defined.
2. In general there are different scenarios for multi-view learning, one of them should be clearly described such that the one that fit the described problem.
3. The authors state in page 2 “ The consensus principle provides an efficient way with strong theoretical properties to benefit from unlabeled data as shown for example in the celebrated work of Blum and Mitchell [3].” —-> how did they benefit from unlabeled data? Please Illustrate.
4. Critic Network. Please clarify the concept earlier and specify that it break down to discriminator as in adversarial learning. Moreover, related work on Adversarial models must appear and get cited.
5. Page 2, “we formulate the learning as a min-max problem by allowing a critic to stand in as a quantitative subjective referee.” –> please elaborate more what does “quantitative subjective referee” refer to in your approach.
6. Please provide a description about the procedure illustrated in figure 1. make it self-contained.
7. Equation number (7): what is the effect on the network bias? what this function will do if it is always told that the example being fed is not from the ground truth??
8. In Experimental results, the authors are encouraged to provided more illustration about the quality of the results when only 5% labeled data are used. Why the model do not overfit? How it can still achieve good performance and why?
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper novelty lies in reformulating the multi-view learning scheme from classification to segmentation scheme while integrated it with critic network that is based on recent advancement in adversarial learning. The work shows an enhanced performance when a small labeled set is used and it show a good leverage of unlabeled data. I think the various MICCAI community would benefit from this approach and its formulation.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper combines co-training and adversarial learning. The concept of discriminator may be easier understood than critic. The main concerns of this paper are (1) the comparison to [8]in terms of performance; (2) comparison to other co-training-based semi-supervised algorithm in the field of MIA.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Author Feedback

We thank all the reviewers for their invaluable comments. We start the rebuttal by providing answers to general comments of the reviewers and follow it up with answers for specific questions Q from each reviewer R. We’ll use REV for “revised version”.

General Comments: Multi-view learning (MVL) makes use of multiple distinct representations of data and benefits from the resulting relationships to achieve accurate models. Here, “representations” can either be various views of the original data or features. Generally speaking, our algorithm can exploit different scanning views to learn segmentation masks. However, in our work, we focused on a more challenging problem and empirically use single images (slice) to train the networks. It seems that the term multi/dual-view, from a learning perspective, might be confused with views of volumetric scans for medical data. We will clarify this in REV. Our main contribution is to design a learning algorithm that can perform segmentation on single slices using two networks. Furthermore, our algorithm makes use of unlabeled data to exchange information between the dual-views, addressing the MVL requirement (complementary models to learn from self-reliant features).

R1-Q2 (classifier instead of critic) : Since we deal with unlabeled data, a classifier cannot be used directly to compare predictions against ground-truth (GT) (no GT for unlabeled scans). In contrast, the adversarial loss can be applied as it only requires knowledge about whether its input is fake or real.

R1- Q4, Q5 (comparison with cotraining) : We evaluated the Deep cotraining(DCT) method for 5% labeled spleen data. This leads to DSC of 80.10%. Our algorithm achieves DSC of 82.53%, a significant improvement (2.43%) over DCT.

R1-Q6 (subtle improvement in DSC) : The improvements in DSC are not subtle (as it reflects pixel-based accuracy). We also note that recent papers (eg., BiO-Net [1] in MICCAI) show a similar trend (1% improvements in DSC).

R2-Q2 (effect of cotraining) : We performed an extra experiment with only one network (L_u=0) and the model ends up with a DSC of 75.61%, justifying the importance of dual-view.

R2-Q3 (3D vol sliced into 2D images): We follow the practice described in recent MICCAI papers [1] (evaluation metrics are obtained for 2D slices, followed by averaging them).

R2-Q4, R3-Q8 (labeled/unlabeled split, overfitting): There is no leakage as the subjects in test data are different from training data. For our experiments, we use 5% of total slices in training set as labeled data. Here our assumption is that the radiologist has annotated a few slices (if it is 5% from spleen dataset, then 4 to 5 random slices) from each volume and the rest of the slices are unlabeled ones. We conjecture that overfitting is avoided because 1- the use of unlabeled along critic ensures the model not to just learn patterns in labeled data, 2- the criss-cross exchange of confidence maps between the two network for training (for unlabeled data) will reinforce the benefit of using the critic. We also note that co-training is widely used for robust learning, implying the framework is less prone to overfitting. We however could not find a theoretical justification to accompany here.

R3-Q3 (benefit from unlabeled data): In our algorithm, we benefit from unlabeled data via 1- criss-cross exchange of confident regions, 2- improving the critic which in essence minimizes an upper-bound of error. To justify this, we conducted an experiment on spleen dataset without unlabeled data (training two models with only 5% of labeled data) and achieved a DSC of 76.67%.

R3-Q3 (eq 7) : The reviewer is correct in that the critic needs to see ground-truth masks in order to be effective. Feeding the critic with only predicted masks will break-down the algorithm.

Other comments are duly noted and REV will address them fully.

References

Bio-net: Learning re-current bi-directional connections for encoder-decoder architecture.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper combines co-training and adversarial learning. The concept of discriminator may be easier understood than critic. The paper is well-written and easy to understand. The experiments results are promising. The main weakness of this paper is the lack of novelty and lack of comparison to the other co-training algorithms (e.g. those mentioned by reviewer #1). It is simply adding a discriminator to a 2D co-training framework, while both have been well studied in previous literatures. The rebuttal does not address the primary AC’s concern on “The main concerns of this paper are (1) the comparison to [8]in terms of performance; (2) comparison to other co-training-based semi-supervised algorithm in the field of MIA.” It is expected to see direct discussion on comparing to those literatures in the rebuttal.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

14

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The main points of criticism of R1 (the only reviewer who rejected the paper) have been sufficiently addressed. The authors report results for deep co-training [8] in the rebuttal. I hope the authors will also find space to include this in the final submission. With the other two reviewers rating the paper positively in the initial review, I also recommend acceptance.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal addressed the criticism about lack of comparisons to [8]. My major concern about this paper is in the lack of clarity in defining what multi-view/dual-view means. Only in the Algorithm 1 definition on page 5 once can understand that it refers to different subsets of labeled training data rather than the more traditionally accepted definitions. In fact all 3 reviewers were confused about this point and commented about it. This needs to be clarified and properly defined early on in the paper. I am on the fence about this paper because on one hand I feel that a re-review is necessary to ensure clarity about this critical point; on the other hand, the reviewers were generally positive about the paper otherwise and the criticism on lack of comparison to [8] was addressed in the rebuttal.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

back to top

Duo-SegNet: Adversarial Dual-Views for Semi-Supervised Medical Image Segmentation