Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Minh-Son To, Ian G Sarno, Chee Chong, Mark Jenkinson, Gustavo Carneiro

Abstract

Longitudinal imaging forms an essential component in the management and follow-up of many medical conditions. The presence of lesion changes on serial imaging can have significant impact on clinical decision making, highlighting the important role for automated change detection. Lesion changes can represent anomalies in serial imaging, which implies a limited availability of annotations and a wide variety of possible changes that need to be considered. Hence, we introduce a new unsupervised anomaly detection and localisation method trained exclusively with serial images that do not contain any lesion changes. Our training automatically synthesises lesion changes in serial images, introducing detection and localisation pseudo-labels that are used to self-supervise the training of our model. Given the rarity of these lesion changes in the synthesised images, we train the model with the imbalance robust focal Tversky loss. When compared to supervised models trained on different datasets, our method shows competitive performance in the detection and localisation of new demyelinating lesions on longitudinal magnetic resonance imaging in multiple sclerosis patients. Code for the models will be made available on GitHub.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87234-2_63

SharedIt: https://rdcu.be/cyl89

Link to the code repository

https://github.com/toson87/MSChangeDetection

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

Authors present a method to detect changes in MS lesions on follow-up imaging. First, they use a dataset of baseline-followup image pairs that do not show change and simulate change using a VAE. Next, they train a Siamese U-Net that takes the baseline + simulated-followup-with-change images, to predict a change map. This approach is then evaluated on a real data set with changing MS lesions.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Authors address a relevant problem of change detection in MS. They have a properly sized data set to demonstrate the effectiveness of their method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

In my opinion, simulating or generating synthetic images is a complicated exercise, when you have actual data available that shows change. Authors do not explain why they use this complex approach, instead of simply training the Siamese network on images with change? A baseline comparison with a supervised method is lacking.

Next to that, it is unclear to what extend change can be detected. Data set characteristics that describe the amount of change between baseline and follow-up is missing. Big changes are of course easier to detect, but subtle changes are clinically more meaningful. It is unclear what amount of change is detectable with this method.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

In my opinion, the paper is a bit hard to understand at first and some items are hidden in the Supplementary. It appears the basic steps are described in the paper and reproducing should be possible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

A baseline comparison should be added, where the Siamese network is directly trained on the Change dataset.

In my opinion, this is not a ‘true’ anomaly detection technique. This is a regular semantic segmentation method, although with a very complicated data augmentation / generation method followed by a regular Siamese net. In a ‘true’ anomaly detection approach, the target or structure of interest is unknown. That is not the case here: because authors generate the target and then train a regular neural network. Also this method is not unsupervised: because authors know that the NoChange set does not contain change: hence it is supervised. I’d suggest to remove the terms ‘unsupervised’ and ‘anomaly detection’ fully.

Figure 1 is hard to understand and the caption is not very informative.

The comparison with the WMH method of Hongwei Li is a bit unfair. First, this method was not trained on the used data set. Second, this method is designed for a different pathology (WMH of presumed vascular origin; not MS) although with very similar MRI characteristics. Next, no baseline performance on lesion detection is reported. I guess this method does not even detect the lesions in the dataset of the authors; hence it is not surprising it also does not detect any change. I’d suggest to either remove it or re-train this method from scratch on the dataset of the authors.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is an interesting approach, although authors should better explain the rationale behind their method.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

2
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

In this work the authors present an algorithm for self-supervised lesion change detection in Longitudinal Multiple Sclerosis Brain Imaging, which does not require detailed, voxel-level annotated training sets. They demonstrate high detection rates for lesion changes in multiple sclerosis imaging, which are comparable to fully-supervised models.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The strength of the paper lies in the original idea of using a dataset where no longitudinal changes in MS lesions were present and then simulating changes and employing this as training for longitudinal lesion change detection and segmentation with a siamese U-net. The performance of the proposed method seems to also outperform other published methods [21].
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The authors claim that they are on the same level as other standard methods presented in the literature [9,19], but no results are presented. Why not state the numbers from these works and or evaluate them on a subset of your change dataset, e.g. using 75% of your change datatset in training the algorithms presented in [9,19] since you have manual labels available and then making a quantitative comparison for the rest of the Change data.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Code: The authors state that they will share their code on GitHub. Data: The dataset is a clincial datset and it is not clear if this can be made available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

I thank the authors for their work and putting focus on this important clinical topic.

Besides the major issue of comparison to other state of the art algorithms that you might not be able to address in, the other minor issues I have are the following: 1) Though you provide a thorough and detailed method description, there’s a complete lack of description regarding your population. 2) It’s also not stated with which time interval the images were taken and if they were all from the same type of scanner since you utilize mutli-site data. These details would benefit the article. 3) It would also be interesting to see the distribution of changes in the Change dataset in terms of how many are newly appearing lesions as in your example (a) (clinically this could be more relevant) and how many increase in size as in your example (e). Can the model cover these cases evenly well?
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors address an important clinical problem, they use a novel approach as well as evaluate it on a large dataset. The only minor issue I have with this paper is an additonal proper comparison to other state of the art algorithsm.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Somewhat confident

Review #3

Please describe the contribution of the paper

This paper introduces an unsupervised anomaly detection method to detect the changes in lesions on serial imaging. Their approach is 2 fold: VAE to generate synthetic non-lesion-based images and then Siamese-3D UNET combined with focal Tversky loss function to detect any change(anomaly) in serial imaging.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper is well written and tackles a very interesting problem seen in the medical community of detecting a change in lesions on serial imaging.
2. The set of experiments consists of a good amount of model comparison along with different datasets to experiment on.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The authors’ should mention the non-lesion data size of all the datasets used for experiments as they have used data augmentation and increment approach.
2. In experiments and evaluation section the authors’ mention of using mini-batch size of 2. The concern is that reaching optima with batch size of 2, is not considered a good direction.
3. The authors’ have experimented with just BCE and FTL, I will suggest authors’ to checkout more loss functions. https://ieeexplore.ieee.org/abstract/document/9277638
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

N/A
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. The authors’ should mention the non-lesion data size of all the datasets used for experiments as they have used data augmentation and increment approach.
2. In experiments and evaluation section the authors’ mention of using mini-batch size of 2. The concern is that reaching optima with batch size of 2, is not considered a good direction.
3. The authors’ have experimented with just BCE and FTL, I will suggest authors’ to checkout more loss functions. https://ieeexplore.ieee.org/abstract/document/9277638
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper proposed by considered lesion change as an anomaly created an unsupervised learning approach by considering some advanced ML techniques. This paper is a good representation of how ML can be applied to medical community even in cases of less data.
What is the ranking of this paper in your review stack?

4
Number of papers in your stack

7
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper covers an interesting topic and all reviewers highlight the clarity and innovation of the work. Comments regarding direction of experiments and clarification of some details should be taken into account
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

Author Feedback

We thank the reviewers for their thoughtful comments.

Reviewer 1. Many different types of changes need to be detected and accounted for in diagnostic radiology practice. These may include appearance of new or disappearance of old lesions e.g. demyelinating plaques in multiple sclerosis (MS); changes to existing lesions e.g. growth of tumors; and changes to existing anatomical structures e.g. enlargement of ventricles. It is exceedingly difficult to obtain detailed annotations for all possible changes in order to train a fully-supervised model. For this reason, direct comparisons with a fully-supervised model is not possible, since there are no longitudinal datasets with annotations of ALL relevant changes.

Our approach is therefore based on the premise that perturbating the latent space of the variational autoencoder allows synthesis of a large variety of changes. That is, the change discrimination network is exposed to a broad distribution of possible changes, beyond what can be reasonably obtained by manual pixel-/voxel-wise annotations. For this reason, our method is in fact an anomaly detection approach, as the target change is unknown. Supervision is provided only in the NoChange pairs; the learning of lesion changes is self-supervised by the perturbation and augmentation method.

To demonstrate the utility of our method, we demonstrate high lesion change detection rates in longitudinal MS imaging, even though our method is not specifically targeted to detecting MS lesion changes. The caveats of our comparison with the WMH method of Li Hongwei have been discussed. As indicated by the Reviewer, the FLAIR characteristics of WMH of vascular origin are very similar to those of MS lesions. However, the comparison also serves to demonstrate that detecting changes in lesion segmentations does not necessarily produce an optimal solution for segmenting lesion changes.

Our test dataset includes annotations on MS lesion changes only, and this is reflected in the high false positive detection rate. It is unknown if these false positives reflect other, non-MS lesion changes. Quantifying these and the extent of change that can be detected is beyond the scope of our work and requires further evaluation in a clinical setting.

We will update the caption for Figure 1 and include more information.

Reviewer 2. A direct comparison with other methods in the literature we cited was not possible as either the model was not available in the public domain [9], or was trained on an additional MRI T1 sequence [19].

We did not collect patient demographics data. Imaging was performed at multiple sites, on different scanners, and different spatial resolutions. The time interval between scans ranged from 3 to 15 months. Assessing the distribution of changes in the Change dataset is an interesting question. The clinical relevance of the detected changes needs to be taken into context with the patient’s clinical background. Unfortunately in our test dataset, since the clinical information provided in the imaging reports lacks detail, it is difficult to correlate with the patient’s history e.g. type of MS such as relapsing-remitting, primary progressive etc… As mentioned above, changes beyond MS lesion changes are likely to have been detected, but the usefulness and relevance of all detected changes using our method requires further clinical evaluation.

Reviewer 3. As described in the manuscript, 192 × 192 × 16 crops from 237 NoChange pairs of scans were used during training. Our model was trained on 3D data on a single GPU. Due to GPU memory constraints, it was not possible to train with a batch size of greater than 2 without sacrificing model complexity. Note that during training, both the autoencoder and discriminator are loaded on the GPU. Loss functions indeed influenced the ability to detect and segment localized changes. Exploration of other loss functions, as suggested by the reviewer, will enable further optimization of the model.

back to top

Self-supervised Lesion Change Detection and Localisation in Longitudinal Multiple Sclerosis Brain Imaging