Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Chenghao Liu, Xiangzhu Zeng, Kongming Liang, Yizhou Yu, Chuyang Ye

Abstract

Convolutional neural networks (CNNs) have greatly improved the performance of brain lesion segmentation. However, accurate segmentation of brain lesions can still be challenging when the appearance of lesions is similar to normal brain tissue. To address this problem, in this work we seek to exploit the information in scans of healthy subjects to improve brain lesion segmentation, where anatomical priors about normal brain tissue can be taken into account for better discrimination of lesions. To incorporate such prior knowledge, we propose to register a set of reference scans of healthy subjects to each scan with lesions, and the registered reference scans provide reference intensity samples of normal tissue at each voxel. In this way, the spatially adaptive prior knowledge can indicate the existence of abnormal voxels even when their intensities are similar to normal tissue, because their locations contradict with the prior knowledge about normal tissue. Specifically, with the reference scans, we compute anomaly score maps for the scan with lesions, and these maps are used as auxiliary inputs to the segmentation network to aid brain lesion segmentation. The proposed strategy was evaluated on different brain lesion segmentation tasks, and the results indicate the benefit of incorporating the anatomical priors using our approach.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_18

SharedIt: https://rdcu.be/cyhLK

Link to the code repository

https://github.com/lchdl/NLL_anomaly_detection

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

In this work, the spatially adaptive prior knowledge can indicate the existence of abnormal voxels even when their intensities are similar to normal tissue. Authors compute anomaly score maps for the scan with lesions, and these maps are used as auxiliary inputs to the segmentation network to aid brain lesion segmentation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The normality representation achieved by registering a set of reference scans of healthy subjects to each scan with lesions is shown to improve the segmentation results.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The computation time should be mentioned.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Seems OK. I would advice the authors to make the code available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Please comment the potential effect of different registration methods. There are many alternatives to ANTs. What woul dbe the effect of using a default method for spatial normalization (as used in SPM for example)?
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The normality representation achieved by registering a set of reference scans of healthy subjects to each scan with lesions is shown to improve the segmentation results.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

Authors compute tissue intensity prior probabilities for brain MR images. Next, they create maps indicating where a new brain MR image deviates from the computed probabilities. This is fed as an extra channel into nnU-Net, which then has a better performance in segmenting lesions.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The method of the authors is a blend between anomaly detection (identifying lesions without training data) and regular semantic segmentation (nnU-Net). The approach of using anatomical priors from healthy subjects is interesting. The evaluation and validation is good, with proper sized data sets and relevant metrics and applications.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The papers main weakness is that it is neither a good anomaly detection method nor a good semantic segmentation method. There are prior publications on ‘pure’ anomaly detection (i.e. only looking at healthy data and thus essentially building intensity priors), like DOI:10.1117/12.2216358 DOI:10.1016/j.media.2020.101952 DOI:10.1148/ryai.2021190169 DOI:10.1038/s41598-021-87013-4 (and a lot more when including arxiv).

Next to that, the results are not very convincing. I commend the authors for providing mean + sd and performing statistical testing, but it also shows that the improvements are marginal (+ 0.03 DSC) and mostly non-significant.

Given that the two strategies used are relatively simple (intensity or distribution based), I would assume nnU-Net should be able to compute similar features internally; if provided with normal data as well. My assumption is that nnU-Net by itself is already quite good in computing feature for both the background and foreground classes (normal and abnormal); and hence including this specific feature does not add much.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper is clear, well-written, and from my understanding can be exactly reproduced from the description provided by the authors.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

I think this is an ok paper, but the limited novelty combined with the limited results do not make it very attractive. In my opinion, authors could make a choice in either going for a proper / pure anomaly detection system; or continue on this path but look for a modification that has a more (significant) impact on the results compared to the baseline.

When looking at Figure 2, it seems that the proposed scores by themselves are already good enough to identify the lesion. Why don’t authors evaluate this by itself as an anomaly detection method? On the other hand, Figure 2 might be the best-case example, but in that scenario more representative examples would be welcomed.

Symmetry based anomaly detection approaches were also proposed by Samuel Botter Martins. Authors might consider including a reference in the Introduction.

The work of Baur (ref [2]; and more in literature) is mainly evaluated on bright / hyperintense lesions. I think (but have no proof) that the work of Baur has an implicit bias towards bright / hyperintense anomalies. This also shows in your results: worse performance for baseline+[2] in Table 1 and better results in Table 3; compared to baseline. Because of this potential implicit bias and that the work of Baur has never been demonstrated on hypo- or iso-intense lesions; I feel it is a bit unfair to include this in Table 1 as a ‘competing method’.

Author should clarify how MRI intensity values are standardized during acquisition or reconstruction. MRI can produce vastly different intensities for the same tissue / protocol / etc. I assume the included data sets have undergone intensity standardization, which could play a large role in the two presented strategies.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The work is nice, interesting, and appears scientifically sound. The method is of limited novelty and the results are moderate.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

2
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The authors address the problem of brain lesion segmentation when the appearance of lesions is similar to normal brain tissues. The segmentation of such lesions is improved by learning the anatomical priors of normal tissues, computing anomaly score maps for the scan with lesions and using these maps as auxiliary inputs to the segmentation network.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The problem of the segmentation of lesions with an appearance similar to normal brain tissue is nicely demonstrated with a limitation of the state-of-the-art (SOTA).
- The paper is well-written and easy to follow.
- Each component in the methodological contribution is supported with visualization.
- The proposed method is validated by comparison versus SOTA. The experiments are supported by quantitative and qualitative evaluation on two lesion segmentation applications.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Limitations of the method are not discussed (For example, demonstrating cases with lower Dice score, along with possible analysis of the cases with low Dice scores). As a reader, such discussion helps in understanding where the method could fail.
- The results in figures 3,4 would have been more explanatory if the FP and FN pixels were highlighted in different colors, in addition to TP.
- Although the proposed method is compared with nnUNet, the comparison of the method with the previously published lesion segmentation approaches on the publicly available ATLAS dataset is missing (Please refer to the table 5 in the paper: “Review: Application of Deep Learning Method on Ischemic Stroke Lesion Segmentation, Zhang Y., Liu S., Li C., Wang J.”)
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- From the reproducibility point-of-view, the open-source implementation of the proposed framework is not provided, which in-general benefits the community.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- Did authors investigate vanilla UNet instead of nnUNet? It would be interesting to check if the similar results are obtained irrespective of the network architecture.
- Which intensity normalization technique was used?
- Were the cases selected randomly for Figures 3 and 4? Probably, images with the best, mean and worst dice score could have been selected for the demonstration of results on entire scale.
- Inference time is not reported.
Minor comments:
- page2: Reformat very long sentence: Based on the observation that although the intensity of brain lesions can be similar to the normal tissue elsewhere, the locations of these lesions help to discriminate them, we propose to use the scans of healthy subjects to provide voxelwise reference intensities of normal tissue, so that abnormal intensity at each voxel can be better identified.
- page2: grammatically incorrect: In addition, the intensities of the registered scans at each voxel can also be considered samples drawn from a distribution of normal tissue intensities at that voxel.
- page3: of the reference images or a distribution -> of the reference images and a distribution
- page6: Rephrase: The nnUNet model trained with no auxiliary information - i.e., with the intensity image input only was considered the baseline method for comparison.
- page7: We have also computed the average -> We also computed the average
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Good methodological contribution supported by in-depth evaluation.

Visualization of results could be improved a little (as explained in section 7), along with the discussion on limitation of the proposed method.
What is the ranking of this paper in your review stack?

8
Number of papers in your stack

2
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The authors combine anomaly detection and semantic segmentation to improve lesion segmentation. All reviewers favor accepting the work, and I think the paper presents a good proof of concept and is suitable for the MICCAI conference. The reviewers however suggest using more up-to-date network architectures and presenting more comprehensive experimental results (including runtimes), which seem to be addressable in a future extended version.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Author Feedback

Reviewer 1 (R1) and Reviewer 3 (R3) are interested in the computation time. Each registration from a reference image to a test image took about 5~20 minutes, depending on the image dimension, and 10 such registrations need to be performed for each test subject. The anomaly score computation and network inference took about 1 minute in total for a test image. We will better clarify this. Note that the multiple registrations can be performed in parallel and even replaced by faster GPU-based and/or learning-based registration, such as VoxelMorph, for acceleration. These improvements can be explored in future work. Future work will also explore the impact of different registration methods as suggested by R1.

R1 and R3 also ask about the code availability. In this work, we used open-source software ANTs and nnU-Net to implement our method, and the computation of the anomaly score is also simple. We will elaborate the whole pipeline on GitHub and provide the details of each step, including the script for computing the anomaly score maps, so that the community can easily reproduce our method.

Reviewer 2 (R2) and R3 ask about how MRI intensities were standardized. If the reference images and patient images are from the same domain, i.e., they are acquired on the same scanner with the same acquisition parameters, then only z-score normalization is performed by the nnU-Net when images are fed into the network. No other standardization is performed. For example, such preprocessing was applied to the in-house DWI dataset used in our experiments. If the reference images and patient images come from different domains, i.e., they are acquired on different scanners possibly with different scanning parameters as well, then histogram matching is needed. For example, for the ATLAS dataset used in this work, the reference images are from one domain and the patient images are from another domain. Then, the histogram of each reference image is matched to a random patient image before computing the anomaly score. The z-score normalization is still performed when the intensity image is fed into the network. We will better clarify the intensity normalization procedure.

R2 believes that nnU-Net itself can learn the anomaly internally given the normal data. However, if the 10 normal maps are fed into the network directly, it would require much more GPU memory to run the algorithm and could lead to a reduced patch size, which loses the global image context and adversely affects the segmentation performance. Therefore, we choose to compute anomaly maps before the network performs lesion segmentation.

R3 suggests that different network architectures can be investigated, for example, vanilla U-Net or the networks developed for the ATLAS dataset. We selected the nnU-Net as it has consistently performed well for a wide range of medical image segmentation tasks and is considered state-of-the-art. We have actually experimented with vanilla U-Net before using nnU-Net. But its baseline performance was much worse than that of nnU-Net, where extensive image preprocessing and postprocessing are designed to boost the segmentation performance. Therefore, we did not report the results achieved with the vanilla U-Net. In addition, although several networks have been developed for the ATLAS dataset as well, the baseline performance of nnU-Net is comparable to the results reported in the literature. Therefore, in this work, we only used the nnU-Net as the baseline method, but the investigation of other network structures can definitely be explored in future work.

R3 is interested in the limitation of the proposed method. As mentioned above, registration is time-consuming, which is a major limitation. But it can be addressed with accelerated learning-based registration in future work. We will briefly add this discussion.

Other minor issues will be also addressed.

back to top

Improved Brain Lesion Segmentation with Anatomical Priors from Healthy Subjects