Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Laura Daza, Juan C. Perez, Pablo Arbeláez

Abstract

The reliability of Deep Learning systems depends on their accuracy but also on their robustness against adversarial perturbations to the input data. Several attacks and defenses have been proposed to improve the performance of Deep Neural Networks under the presence of adversarial noise in the natural image domain. However, robustness in computer-aided diagnosis for volumetric data has only been explored for specific tasks and with limited attacks. We propose a new framework to assess the robustness of general medical image segmentation systems. Our contributions are two-fold: (i) we propose a new benchmark to evaluate robustness in the context of the Medical Segmentation Decathlon (MSD) by extending the recent AutoAttack natural image classification framework to the domain of volumetric data segmentation, and (ii) we present a novel lattice architecture for RObust Generic medical image segmentation (ROG). Our results show that ROG is capable of generalizing across different tasks of the MSD and largely surpasses the state-of-the-art under sophisticated adversarial attacks.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87199-4_1

SharedIt: https://rdcu.be/cyl3D

Link to the code repository

https://github.com/BCV-Uniandes/ROG

Link to the dataset(s)

http://medicaldecathlon.com

Reviews

Review #1

Please describe the contribution of the paper

This paper is aimed at contributing to the literature related to developing segmentation algorithms that are robust to adversarial attack. First a framework to extending the AutoAttack framework for use in segmenting volumetric medical image data and second proposing a new lattice architecture for performing segmentation, termed Robust Generic medical image segmenation (ROG). Both qualitative and some quantitative comparisons are shown in the paper and the supplementary material regarding performance vs. other methods in the presence of adversarial attacks.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Addressing the problem of adversarial attacks in medical image segmentation is an important goal. The results shown in figure 1 clearly tell the most promising part of the story, that the new ROG lattice approach trained with free AT, outperforms nnUNet and ROG, both trained on clean images.

The ROG lattice, shown in figure 2, has some level of architectural novelty, especially due to the reduced number of parameters vs UNet++, one of the most current lattice-like approaches.

Consideration of class imbalances during training is another positive.

It is a strength of the paper that the authors evolve their approach to test on PGD and AUtoPGD-CE over different tasks.

The visual results shown in Figure 5 and figures 2 and 3 in the supplementary material are all interesting and promising.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The results shown in the MSD test set (reporting DICE scores) in Table 2, while claimed by the authors as quite positive for their method, really do not show that the ROG net significantly outperforms the alternative methods (MPU Net, nnUNet, etc) on the 10 tasks that are reported and tried. Perhaps I am missing something, but the raw DICE numbers reported are not all that supportive for ROG.

To some extent the paper reads more like a general computer vision/ ICCV/ CVPR paper than a MICCAI paper. The results are reported with any real insight regarding the actual impact of adversarial attacks, nor whether the test sets truly represent what could go on in medical image acquisitions. While not something that should discount this effort for MICCAI, the authors need to consider the impact and consequences of what adversarial attacks may occur in medical image data.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The discussion is reasonable regarding reproducibility, although more information regarding the number training and testing runs made would be helpful. The statistics reported are generally reasonable.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The paper generally reads well, although more insight into the real-world adversarial attacks that medical image segmentation algorithms could face would have been helpful. The MSD tasks are simply different organ segmentations— discussion or testing regarding actual use of these results for disease quantification and outcome prediction would be even more helpful. Further explanation of the lattice architecture and putting the results of Table 2 in a more complete context would be helpful as well.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Bringing the entire problem of adversarial attack more in focus for the medical image segmentation community is important and there are several mildly novel ideas in the paper. Furthermore, Figure 1 does seem to show that the ROG lattice method is clearly useful in the face of the adversarial attacks. My rating is slightly lowered due to the somewhat formulaic style of the paper, reading a bit more like a general computer vision paper than a medical image analysis paper.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper explored adversarial robustness on medical image segmentation. The major contribution includes (i) setting up a benchmark to evaluate adversarial robustness on MSD and (ii) proposing a new architecture for robust medical image segmentation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. This paper investigates the adversarial robustness of medical image segmentation networks, which is barely noticed in this field.
2. Experiments are done on MSD which contains 10 tasks.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. One major weakness is the motivation of the paper. Does adversarial examples ever exist in medical AI? I personally only have seen artificial adversarial examples of medical deep learning systems. The value of this topic needs to be further justified.
2. The justification of ROG. ROG only contains very common units from other segmentation networks (sep conv, instance normalization, etc). Except for the efficiency which is basically another topic, I cannot see any motivation of this design for improving adversarial robustness. Nor can the experiment proves it, because no numerical comparison on adversarial robustness are set up between ROG and nnUNet/C2FNAS, only fully supervised training results, which is out-of-scope of this paper. Some figures of examples cannot prove it since one can always find an example to visually illustrate the advantage.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Easy to reproduce if codes are provided.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Please see the weakness part of this review. The motivation and the design/justification of ROG really needs to be improved. Minor: as for the title, “robust” usually has other meanings, such as domain robustness. This paper only investigates adversarial robustness.
Please state your overall opinion of the paper

reject (3)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Motivation: why are the adversarial examples becoming a concern for medical image segmentation. ROG: design and justification are very unclear.
What is the ranking of this paper in your review stack?

7
Number of papers in your stack

8
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The paper introduces adversarial attacks into the medical segmentation decathlon (MSD) benchmark scenario. It then presents a new segmentation method (Robust General medical segmentation, ROG) for MSD. On top of that, it adds Adversarial Training to the method to address the “robustness challenge” (the susceptibility to adversarial attacks). Finally, it evaluates ROG and adversarial training on the MSD test set.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well-written and easy to follow. Contributions are clearly outlined, easy to understand and significantly simplify the review process. The paper introduces an extension to the AutoAttack framework, that makes AutoAttack applicable to segmentation. While adversarial attacks on segmentation are no novelty, this extension appears to streamline adversarial attacks for the scenario. The paper introduces a lattice architecture for the neural network, which in turn is no novelty by itself (lattice architectures are the common backbone for segmentation with NAS and a generalization of the unet architecture). While this architecture does not achieve state of the art performance (authors claim it is state-of-the-art, see weaknesses), the architecture appears to be more robust to adversarial attacks (even when not trained by adversarial training). The paper introduces adversarial training to MSD (I am not aware of other published adversarial training in medical image segmentation publications, but two pre-prints). The paper documents extensive results well presenting reference performance numbers for the MSD test set, robustness evaluation, per task and category vulnerablity comparisons and analysis (only criticsm here: for multi-category tasks, it is difficult to gather the category associated with a specific color).
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Relevance. While adversarial attacks are interesting from a understanding point of view, I have yet to find a good argument beyond that. Here is my adversarial defense: cryptographically sign the image, cryptographically sign the segmentation. Solved! (and should probably be done independent of adversarial attacks) Even the perceptibility requirement is unclear to me: If an attacker can modify the image, why would the attacker not just replace it with a completely different image?

Performance on MSD: The presented method does not reach state-of-the-art performance. Authors claim “competitive performance on clean images”, but I think this is overreaching (the difference to the state-of-the-art is significant, 9% performance delta on average dice, specifically for Dice performance numbers, where one percent is often the basis for publication).

Minor: I assume robustness is evaluated on the validation split, which might bias the ROG robustness results, since design decisions might have been made based on the validation performance. (However, this is unclear from the paper. While technically data-leakage, this is probably acceptable within the scope of MSD.) The text in images is often too small.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper uses the publicly available medical segmentation decathlon dataset and promises to publish the code of this paper. As a consequence, reproducibility should be good. Without the code, reproduction of results might still be possible, since the paper reports hyperparameters. However, the exact training/validation split for the dataset remains unknown.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Relevant citations for adversarial training in medical image segmentation: https://www.medrxiv.org/content/10.1101/2021.01.17.21249704v1.full https://arxiv.org/abs/2006.13555
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper is well written and evaluated and provides sufficient novelty in my opinion. (The method novelty itself is very limited, as this is just an assembly of methods introduced elsewhere).

The issue is relevance, as outlined in weaknesses. So the question is: why should we care about adversarial attacks? And as this paper does not provide understanding about adversarial attacks, which value does the paper contribute to the field? To me this basically is “something the auhtors did”… but the practical impact of adversarial attacks and defenses remains unclear. (i.e. just cryptographically sign…)

The fact that authors promise to publish the source code after acceptance may just be that value. (value: testing for adversarial robustness may be streamlined)
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The reviewers generally agree that there is a sufficient (but not outstanding) level of novelty and value. e.g. extension of AutoAttack, to the paper; however, there is disagreement on the relevance of adversarial attacks in the context of medical AI. The authors should justify the motivation taking into account the question by the reviewers in this context. There were also concerns relating to insufficient justification for ROG and somewhat unimpressive results. These should also be addressed in the rebuttal
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Author Feedback

We thank the reviewers for their valuable insights and for their interest in the impact of our work. In particular, R1 and R2 acknowledge the importance of addressing adversarial attacks in medical segmentation, a problem that is “barely noticed in this field”. R3 notes the contribution of our framework for streamlining the evaluation of adversarial robustness. Also, R1 and R3 agree on the significance of the extensive experimentation we conducted on the Medical Segmentation Decathlon and on the advantages of performing adversarial training in this framework. We will now respond in detail to the concerns raised by the reviewers: Motivation: R2 and R3 express doubts about the relevance of adversarial examples in the medical field. In recent years, the study of adversarial attacks has gained interest in the MICCAI community because of the brittleness of deep learning methods in the face of imperceptible perturbations [16,23]. This behaviour has raised concerns about the deployment of DL systems in real-life scenarios [17] due to the increasing confidence on automatic methods [15], and the ever-growing risks of cyberattacks in hospitals [38]. Therefore, to advance towards robust medical segmentation, it is crucial to reliably assess the robustness of algorithms designed for this task. Following R1’s suggestion, we will highlight the possible impacts of adversarial attacks in medical images. Architectural Design: R1 and R3 question the need for a new architecture on general medical segmentation, given the existence of methods that were designed for the MSD. We observe that the study of this task is very recent and only three published references have addressed it on the complete MSD framework. However, they all present practical issues that prevent their adoption for the main goal of our paper: studying the unexplored dimension of adversarial robustness in medical segmentation. First, the architecture of [32] is optimized through expensive Neural Architecture Search and lacks publicly available code. Second, the method of [13] uses an ensemble of networks for each MSD task. Third, ROG generalizes better than the method of [24] on the MSD. The single lattice architecture in ROG is light, easy to train and it generalizes well across the MSD. Moreover, its computational efficiency is critical when studying adversarial robustness, as learning typically involves longer training and inner optimization loops that significantly increase the computational needs. Performance on Clean Images: ROG obtains competitive results in most tasks of the MSD with a substantial increase in efficiency. Specifically, compared with [32], ROG has 6.5x and 5x fewer parameters and FLOPS, respectively, at the cost of 6% average Dice over the 10 tasks. More importantly, the main contribution of our work is a framework to measure a new dimension of medical segmentation, one in which accuracy on clean images is only the first point of the evaluation curves of Fig. 1 (R1). Our main empirical results provide a strong baseline that can be efffectively protected against SOTA adversaries. Benchmark Details: we share R3’s minor concern on a possible bias when evaluating robustness, and we took the following measures to prevent it: the methods with Free AT were tested over the validation data, since the submissions to the MSD server are limited. Also, to ensure generalization, all design choices were made based on the results over two of the ten tasks in the MSD. We will clarify this in the final version.

Broader Impact: our work can serve as basis for studying the critical, yet largely unexplored dimension of adversarial robustness in the medical domain. In addition to addressing security concerns of highly sensitive data, this new field has the potential of enhancing interpretability of deep learning representations [10]. We thank R3 for the suggested citation: [38] Joel, M. et al: Adversarial Attack Vulnerability of Deep Learning Models for Oncologic Images. (2021)

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper makes two contributions: 1) Extension of the Auto-Attack framework to medical image segmentation and 2) A robust general purpose medical image segmentation neural network architecture. With respect to the first contributions, some reviewers questioned whether adversarial attacks in the medical domain are a real problem and whether simpler solutions such as cryptographically signing an image could not be considered. The rebuttal partially addresses these criticisms. Cyber-attacks on hospitals are mentioned which appears to be a valid example. While the problem of adversarial attacks might be less of an issue for medical image analysis, the topic has been gathering attention at MICCAI and extension of AutoAttack to medical image segmentation is a good contribution in my opinion. The criticism with respect to the second contribution, i.e. ROG, were related to its lower performance on clean images on MSD tasks and lack of proper motivation for the architecture from the point of view of robustness. On this front, the rebuttal is less convincing to me. As in the paper, the rebuttal emphasizes the lower number of parameters in ROG but does not clarify how this is beneficial for adversarial robustness. Overall, I agree with R3 that the claims about clean image performance is overreaching and the rebuttal answers this by stating that the SOTA methods on MSD are not applicable to this problem because they use neural architecture search or ensembles of networks. However, it is not clear to me why the use of such techniques would rule out adversarial training. Overall, I think the paper is borderline reject due to these issues that remain with the ROG approach.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

20

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper revisits the problem of adversarial attack in the medical segmentation problem. This is an important field which could also generate more robust model given the limited number of annotation. The proposed benchmark may attract people’s attention to this field. The main weakness of this paper is the introduction of the ROG which does not show any benefit or necessity. Why can’t the user use UNet/UNet++ or other network directly? IMO, if the author deletes the portion on ROG and focus on giving more details on the adversarial attack part will make this paper stronger and more useful to the readers. e.g. the author could add more description on the AutoAttack, more on the evaluation of other forms of attacks, and show more results on the impact of adversarial attack.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Most of the major concerns (such as the motivation raised by R2) have been well addressed.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

back to top

Towards Robust General Medical Image Segmentation