Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jeremy Tan, Benjamin Hou, Thomas Day, John Simpson, Daniel Rueckert, Bernhard Kainz

Abstract

Supervised learning of every possible pathology is unrealistic for many primary care applications like health screening. Image anomaly detection methods that learn normal appearance from only healthy data have shown promising results recently. We propose an alternative to image reconstruction-based and image embedding-based methods and propose a new self-supervised method to tackle pathological anomaly detection. Our approach originates in the foreign patch interpolation (FPI) strategy that has shown superior performance on brain MRI and abdominal CT data. We propose to use a better patch interpolation strategy, Poisson image interpolation (PII), which makes our method suitable for applications in challenging data regimes. PII outperforms state-of-the-art methods by a good margin when tested on surrogate tasks like identifying common lung anomalies in chest X-rays or hypo-plastic left heart syndrome in prenatal, fetal cardiac ultrasound images. Code available at https://github.com/jemtan/PII.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_56

SharedIt: https://rdcu.be/cyl6w

Link to the code repository

https://github.com/jemtan/PII

Link to the dataset(s)

https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposed an outliers detection method with self-supversion. The main idea is that they designed a pretext task where poison image interpolation is adopted to mimic abnormality.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The outlier detection has clinical meaning.
    • Self-supervison is a trend now.
    • Using PII sames can generate better image patches than FPI. And they have the best results qualitatively and quantitatively compare to other comparison methods.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Even the PII looks better than FPI. But I think the novelty of this work is limited at technical side.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper provides sufficient details about the models/algorithms, datasets, and evaluation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • Since the Chest x-ray14 predicts 14 diseases, where some diseases are quite obvious and others are not. The author can present their performance in each diseases to see which kinds of outliers have the most significant improvement.
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I think the novelty is somehow limited from a technical perspective. But the problem itself has some clinical meaning and their design of the method sounds reasonable.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    This work uses self-supervised training techniques that are called FPI and PPI in order to make structural defections in normal training samples. Then trains a U-Net to specify the region of defection. As the whole process is done in an unsupervised training procedure, this can be seen as an unsupervised abnormal region segmentation task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper proposed an efficient method for anomaly detection Experimnetal results shows that the method is able to detect the anomalous region very well. The paper is well organized.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The main concrn about this paper is it novelty. It is very similar to the cut-paste method CVPR2021, however, the fake regions are made in a better way with some theoretical foundations. It would be good to also test cut-paste and report its results in comparison to the proposed method.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The details of implementation with more ditails should be explained in the paper. However, it seems the method is reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    -Compare your method with the mentioned method above. There are several self-supervised method which is exploited for the anomaly detection, referring and discussing why you are better those methods is valuable.

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The pape propose an efficent method which is able to detect the anomluse. There are some concerns about the similarly of idea with the cut-paste method. I think the idea is somewhat different.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    The paper adresses the problem of subtle outliers detection in medical images. It proposes the use of Poisson seamless image blending technique to create subtle patch variations in the training set for self-supervised anomaly detection. The method is evaluated on two datasets, in comparison to SOTA anomaly detection methods. The results show an improved performance in terms of average precision.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • the Poisson interpolation, introduced in graphics for seamless image blending, is used here in the context of patch interpolation to create more natural and subtle patch variations than the convex combination proposed in Tan et al.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • except for the interpolation method, the paper strongly builds on the methodology introduced earlier by Tan et al. for self-supervised anomaly detection (same binary cross-entropy loss, same wide residual encoder-decoder)
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors have referenced the public dataset used in the evaluation. They give quite enough details on the architecture used as well as the optimization parameters, the training time, and so on. It is also mentionned in the paper that the code will be made available by the time of the conference.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The context of the paper is well introduced. It tackles a topic of high interest in the MICCAI community. The related work section is well organized, it follows the same structure as the paper by Tan et al.

    The method is clearly presented, both the Poisson image editing, as well as the general framework for anomaly detection. Figure 1 provides a clear illustration of the limits of the FPI method and the contribution of the paper. It could be useful to indicate what values of alpha were chosen for the two illustrative cases (subtle and dramatic).

    Regarding the experimental setup, two datasets were considered one of which is public (Chest Xray) with a relatively large number of images. The evaluation metric is well defined and appropriate.

    The quantitative results show the superiority of the proposed method over state-of-the art anomaly detection methods. However, the qualitative results could benefit from more description. In the examples illustrated in Figure 3, we see that the PII method seems to correctly detect image-level abnormalities in the case of Mass and HLHS. However, it is of great interest to know if the image areas with a higher score (PII) actually correspond to physiological abnormalities. In other words, at a pixel level, how does the detection performs in localizing the abnormality (especially for the HLHS result) . Just a few words in the results section would be enough. Also, the paragraph above Fig. 3 is a bit unclear. Why and how is FPI sensitive to sharp edges (it does not seems so in Fig 3) ?

    A few typos to correct:

    • page 2 in Contribution: “usefulNess”
    • page 4 last paragraph: replace 4 by “four”
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well organized, it is clear. The experimental setup is rigorous. The problem tackle is, I beleive of high interest to the community. The novelty is mostly incremental, with reference to the paper of Tan et al. on Foreign Patch Interpolation.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    There are concerns around the paper’s novelty but the practical relevance is good. Authors should provide further clarification regarding the technical novelty especially with respect to the papers cited by R2 and R3.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2




Author Feedback

We thank the reviewers for their positive comments and constructive feedback!

One of the main concerns is the novelty of PII (ours) compared to FPI and CutPaste. The biggest difference between PII and FPI is the way that patches are combined. FPI uses a convex combination of two patches, whereas PII uses Poisson blending. Because patches come from different images, a simple convex combination can lead to obvious artifacts/discontinuities to which the model can easily overfit. Poisson blending automatically matches intensity levels of different patches and blends them seamlessly. This creates more subtle irregularities during training and it helps the model to better detect real abnormalities, even in challenging cases where data is less uniform (e.g. chest X-ray and fetal ultrasound).

Most self-supervised tasks use geometric transformations (Golan and El-Yaniv, 2018); image filtering, Sobel edge detection or blurring (Tack et al., 2020); simple linear combinations of pixels, e.g. MixUp (Zhang et al., 2018); patch extraction, e.g. CutOut (DeVries and Taylor, 2017); or patch re-arrangement, e.g. Jigsaw (Noroozi and Favaro, 2017). We believe PII is one of the first self-supervised methods that solves partial differential equations on the fly to generate training samples dynamically. To achieve this, we use multiprocessing to generate samples in parallel. Our Poisson blending method is adapted from Poisson Image Editing (Perez et al. 2003). The original Poisson Image Editing method takes the image gradient from both patches and discards the weaker gradient. But in PII, image gradients are scaled by an interpolation factor, which allows us to control which image gradients take precedence. This creates more variety in training samples, i.e. more ways in which two patches can be combined. It also helps create a self-supervised task with varying degrees of difficulty, ranging from very subtle to more prominent structural differences.

Regarding CutPaste, as far as we know, it was uploaded to arXiv on April 8, 2021, about one month after the MICCAI submission deadline. There are several major differences. CutPaste takes a patch from one image and moves it to a new location in the same image, whereas PII combines patches from different images, creating more variety of combinations, which are more specific to medical imaging problems. CutPaste is a binary classification task at the image or patch level, whereas PII regresses different degrees of abnormality and directly predicts pixel-wise abnormality scores. As seen in the CutPaste experiments (Li et al., 2021), their method is suitable for detecting manufacturing defects such as cracks or scratches. However, many medical abnormalities, such as pneumonia, do not have sharp discontinuities. Meanwhile, congenital heart defects occur organically (during heart formation) which leads to cardiac structures that are very natural looking, albeit pathological. This is unlike breakages seen in manufacturing defects. PII is able to tackle these challenging cases by creating irregularities that are blended naturally into the surrounding context.

We will modify the submission to clarify these points. The reviewers also make great suggestions for additional experiments and comparisons. Due to limited space in the current submission, we plan to follow these suggestions in future work.



back to top