Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Raunak Dey, Yi Hong

Abstract

We introduce a neural network framework, utilizing adversarial learning to partition an image into two cuts, with one cut falling into a reference distribution provided by the user. This concept tackles the task of unsupervised anomaly segmentation, which has attracted increasing attention in recent years due to their broad applications in tasks with unlabelled data. This Adversarial based Selective Cutting network (ASC-Net) bridges the two domains of cluster based deep learning methods and adversarial-based anomaly/novelty detection algorithms. We evaluate this unsupervised learning model on BraTS brain tumor segmentation, LiTS liver lesion segmentation, and MS-SEG2015 segmentation tasks. Compared to existing methods like the AnoGAN family, our model demonstrates tremendous performance gains in unsupervised anomaly segmentation tasks. Although there is still room to further improve performance compared to supervised learning algorithms, the promising experimental results shed light on building an unsupervised learning algorithm using user-defined knowledge.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_23

SharedIt: https://rdcu.be/cyl5X

Link to the code repository

https://github.com/raun1/ASC-NET

Link to the dataset(s)

https://www.med.upenn.edu/cbica/brats2019/data.html

https://competitions.codalab.org/competitions/17094

https://smart-stats-tools.org/lesion-challenge

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes a new Generative Adversarial Network (GAN) for segmenting lesions from medical images, and test their model in brain tumours, liver lesions, and multiple sclerosis lesions. The proposed architecture consists of a single U-net based encoder, 2 U-net based decoders, and a discriminator. One of the decoders serves to separate the entire region of interest from the background, and the other selects anomalies within the region of interest. For test datasets, the authors looked at MS-SEG2015 consisting of 21 scans from 5 patients with multiple sclerosis, BraTs 2019 consisting of 335 T1 scans of brain tumours from 259 subjects, and LiTS consisting of 130 abdominal CT scans of patients with liver lesions.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Using dual decoders for outputting two separate segmentation masks for the original image and using an exclusionary loss function to promote their separation is quite creative for separating intensity peaks in the test datasets.
- The model improved results significantly relative to other GANs
- The custom dice loss to minimize the intersection of the two segmentation regions was well thought out and suits the pipeline.
- This work is the first to apply an unsupervised segmentation algorithm to the Brats 2019 and LiTS liver lesion public datasets.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The model was compared to only a single other model, another GAN that was not developed for medical imaging applications.
- Slices from a given individual appear in both the training and test datasets at the same time. The authors should not pool all slices at once and randomly draw from that pool. Separating out training and validation sets based on the patient that they come from will prevent crosstalk between the training and validation sets. This occurred with one of the test datasets but not others.
- Model novelty is limited beyond incorporating an established U-Net architecture into a GAN setup.
- The model still achieved pretty low Dice scores on selected validation sets.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors provided sufficient details for model parameters on the encoder, datasets, and training schemes. In terms of preprocessing including the method for rescaling images (e.g., bilinear interpolation) is needed. A better description of the encoder inputs would aid reproducibility. Including hardware used for training would also aid in reproducibility.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- The results and motivations for the work should exist beyond outperforming a single model. There are other methods used for segmentation, how does the proposed methods compare to supervised learning or other unsupervised learning techniques?
- The paper would benefit from the report of other metrics beyond the dice coefficient.
- The use of erosion and dilation in postprocessing may alter boundaries and adversely bias the implementation in a real-world setting. Postprocessing should not consist of erosion and dilation even though it improves the dice score; other morphological postprocessing should replace this instead.
- Please include details on the experimentation or justification for selected hyperparameters (filters, normalization, dropout)
- Fig.2 Please include colour bars on images, and axis divisions on graphs.
- Page 1, Introduction: 1st paragraph: remove passive voice from this paragraph
- Page 2, Introduction: 2nd paragraph: This sentence is confusingly split with commas and contains unnecessary adverbs, “Meanwhile, one branch connects to a GAN’s discriminator network, which allows introducing the knowledge contained in the reference image distribution. With the discriminator component aiding, the network can separate images into softly disjoint regions; that is, the generation of our selective cuts is under the constraint of the reference image distribution.”
- Page 3, Introduction: 2nd bullet: poor wording “Besides, our method outperforms the AnoGAN family and other popular methods presented in [3] on the publicly available MS-SEG2015 dataset.”
- Page 3, Main Module: replace “The M” with M Page 3, Main Module: replace “like the” with “similar to the”
- Page 6, MS-SEG2015: Please include scan type in the MS-SEG2015 dataset. Resizing does not match other 2 datasets.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Incomplete validation, poorly done or unjustified pipeline steps, and limited novelty weigh down this paper, and limit comparison beyond a single other model not developed for medical imaging.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

The paper proposed an approach to unsupervised anomaly detection named as Adversarial Based Selective Cutting network (ASC-Net). The main idea is to leverage adversarial learning to partition an input image into two cuts with one cut in the reference distribution. The method is benchmarked on multiple datasets where promising results are achieved.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The proposed idea is interesting and well-motivated. The proposed model is also quite simple yet effective.
- The evaluation is conducted on multiple datasets, and the model seems quite generic and achieves promising results on all settings.
- Overall the paper is clearly-written and very easy to follow.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The discussion on the difference with AnoGAN family in Section 4, but could be further elaborated on the major differences (especially wrt methodology) with competing methods, thus to clearly understand the technical contribution of the paper.
- Also experimental analysis could be improved to further understand the reason behind the improvements. Currently the paper is quite thin on the experimental or theoretical analysis side, thus it’s hard to understand why the method works, and to gain further insights from the paper.
- The choice of clustering (thresholding T) seems ad-hoc without sufficient justification. Also it seems this part is not reflected in the training loss, thus the trained model might be sub-optimal.
- Runtime (training & inference) should be reported to better understand the computational complexity.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Major details are provided, but it could well be some of the minor details are neglected in the main paper. It would be better if the code can be provided to ensure productivity.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Please address the concerns in the weakness section.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed idea is interesting, and the model is shown to be effective. However, I have some concerns around the actual technical contribution compared to previous work, limited experimental analysis, and some missing details.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper
- This paper focus on anomaly detection (binary segmentation) by introducing an unsupervised two-cut split guided by an reference distribution using GANs.
- Three components were introduced in the proposed framework: • The Main module: The author developed a dual-branch network based on a simplified version of CompNet to construct the normal and abnormal component of the raw input image • A reconstructor R followed with threshold-based pixel-clustering (segmentation) • A Discriminator based on user-defined reference image distribution
- The author proposed a two-stage training strategy for the proposed anomaly detection • Step 1: Cycle-training of the GAN framework. • Step 2: Add Generator (M)-augmented images to formulate a pseudo reference distribution to further train the GAN framework
The proposed framework has been extensively evaluated on three publicly available datasets.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The author a proposed a novel two-cut algorithm to achieve anomality detection, with discriminator used only for detecting reference distributions, therefore without the explicit need for reconstruct the image using GAN.
- The author conducted comprehensive evaluation of the proposed framework using three independent publicly available dataset, and compared with state-of-the-art performance for each of the proposed dataset. Besides quantitative evaluation results, the paper contains representative figures that clearly showed the visualization of the results of the proposed framework when applying on different dataset in different tasks.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Although the overall results is presented both qualitatively, and quantitatively, it is not clear to me how much contribution each components brings to the proposed framework.
- Specifically, multiple lose functions were introduced in this work, but I didn’t see the clear purpose of using the reconstructor R followed by the thresholding based clustering? • What’s the benefit of this step as appose to simply as using the I_{wc} to define the anomality?
- Also, it is not clear how much improvement the second round of training provide compared to the first round of training
- Different number of cycles for the 2-different training stages were used for three different tasks. The author shall explain the reason to choose these hyperparameters to inform the reader for the purpose of reproducibility
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- The network architecture, parameters are layout in detail
- The details about the public dataset used were mentioned clearly. Different number of cycles for the 2-different training stages were used for three different tasks. The author shall explain the reason to choose these hyperparameters to inform the reader for reproducibility
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

• Page 3 - Section 2.1: Better to include Reconstructor R explicitely in Figure 1. • Page 5-6: It’s better to separate the section “3. Applications” into two distinctive sections: • “Materials” (or “experimental data”) • And “results” • Page 4- shall there be no subscription 2 in the MSE loss formula: • Figure 6: shows sample images with different setup, but no details about their training parameter setup were mentioned
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- A novel two-cut approach was proposed to achieve anomality detection in an unsupervised (or semi-supervised) manner.
- The experimental setup and is clearly presented, which includes comprehensive evaluation of the proposed framework using three independent publicly available dataset, and showed promising results when comparing with state-of-the-art performance for each of the proposed dataset.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

3
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This work presents an U-Net trained in an adversarial fashion for the segmentation of anomalies, i.e. tumors and lesions, in different organs and modalities.

The method is novel and, generally, clearly explained, although there are some technical remarks by the reviewers. The main concerns come from the reported results. On one hand, as pointed by the reviewers, the obtained are rather low. On the other one, these results are much lower than those reported by the methods participating in the original challenges. The authors miss to position themselves against the methods reported in those methods. In the specific case of the MS-SEG challenge, the authors compare to [3]. This work studies the potential of autoencoders for anomaly detection, but its goal is not to develop a method for this task. Therefore, the comparison is not a fair one. Please better explain which are the advantages of the presented work w.r.t. the works presented in each of the challenges.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

Author Feedback

We thank the reviewers and the meta reviewer for the valuable comments.

The main concerns raised by the reviewers are the experimental comparison and the reported results. In this submission, we tackle the problem of unsupervised anomaly segmentation in medical images. Towards the same task, Ref. [3] did an exhaustive study on comparing multiple existing autoencoder-based models, VAE-based models, and GAN-based models, including methods in [4, 8, 23, 24] that were original proposed for anomaly detection in medical images. Since the MS-SEG challenge is the only public dataset used in this study, we compare ours to these methods in [3] via reporting our results on the same dataset. Compared to the best Dice scores in [3], we have significant gains in performance, which are increased by 23.24% before post-processing and 20.40% after post-processing. Also, on the BraTS 2019 dataset, we compare with f-AnoGAN, which performs the best after post-processing in [3]. Experiments show that f-AnoGAN has difficulty reconstructing the normal images required for anomaly segmentation, while our method obtains a 63.67% Dice score for brain tumor segmentation. By contrast, Ref. [4] reports a score of ~50% on the BraTS 2017 dataset. Although there is still a performance gap between our method and existing supervised methods for these challenges, our proposed model outperforms the state-of-the-art models for unsupervised anomaly segmentation. Our encouraging results shed light on building an unsupervised learning approach for addressing the tedious labeling issue in the medical domain.

Different from the AnoGAN family, our method handles the unsupervised anomaly segmentation problem in a completely new way. In particular, the AnoGAN family operates as a reconstruction-based method and needs faithful reconstruction of normal images to function properly. Differently, we treat the anomaly segmentation as a constrained two-cut problem, which only requires a semantical and reduced reconstruction for clustering. We argue that our goal is to obtain an anomaly segmentation mask, not a perfect reconstruction, which is not essential for solving the original task. The integration of U-Nets and GAN provides an effective solution for directly working on anomaly segmentation in an unsupervised manner. Our approach opens up a new venue for unsupervised anomaly segmentation.

Regarding the contribution of each module, the main module generates two selective cuts to semantically reconstruct the original image for clustering, while the discriminator brings user-defined knowledge into the normal branch in the main module. The other anomaly branch has a disjoint loss to separate it from the normal branch. Moreover, under the constraint of the reconstructor, we prevent the anomaly branch from generating an empty image if anomalies exist. Additionally, to obtain the anomaly segmentation mask, we choose the reconstructor’s output, not the anomaly branch output, because of the consistent appearance of the reduced reconstruction across applications. We will improve our description of the module contributions in the final version.

We thank the reviewer for pointing out the potential issue of using erosion and dilation as post-processing. In this submission, we use erosion and dilation as simple operators to further improve the connectivity of the generated anomaly mask. Even without this post-processing step, our method outperforms all methods studied in [3]. In future work, we will explore a better post-processing step as suggested by the reviewer.

We will report the runtime of our model, provide more justification of the choice of the threshold for clustering, and improve the description of experimental details and settings in the final version. Also, we will release the source code after acceptance.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The ideas presented on this work are interesting, as highlighted by the reviewers. However, I find that the work lacks it is both light in the methods, as there is no sufficient analysis on the components of the methodology and it was highlighted by the reviewers, and the experiments, which was one of the strongest remarks brought up in the reviews.

The rebuttal has tried to provide answers to this points, but to properly address these points in the paper would require a further round of reviews which is not achievable for a conference.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

16

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

I agree with the reviewers and the meta reviewer on the interest and the novelty of the approach. It has also been evaluated on three separate datasets. In the rebuttal the authors have clarified how their method compares to others on the same public dataset via the results reported in [3] although this should be better explained in the final text itself.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The reviewers’ concerns are justified with regards to presented work. However in the rebuttal the authors clarify their position of this work being one of unsupervised segmentation. However I agree with R1’s assessment that the pipeline is poorly explained. Also, the figures are not informative and there are no tables for comparing the results. The authors should have also provided information about runtime in the rebuttal.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

back to top

ASC-Net: Adversarial-based Selective Network for Unsupervised Anomaly Segmentation