Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Xinru Zhang, Chenghao Liu, Ni Ou, Xiangzhu Zeng, Xiaoliang Xiong, Yizhou Yu, Zhiwen Liu, Chuyang Ye

Abstract

Brain lesion segmentation provides a valuable tool for clinical diagnosis, and convolutional neural networks (CNNs) have achieved unprecedented success in the task. Data augmentation is a widely used strategy that improves the training of CNNs, and the design of the augmentation method for brain lesion segmentation is still an open problem. In this work, we propose a simple data augmentation approach, dubbed as CarveMix, for CNN-based brain lesion segmentation. Like other “mix”-based methods, such as Mixup and CutMix, CarveMix stochastically combines two existing labeled images to generate new labeled samples. Yet, unlike these augmentation strategies based on image combination, CarveMix is lesion-aware, where the combination is performed with an attention on the lesions and a proper annotation is created for the generated image. Specifically, from one labeled image we carve a region of interest (ROI) according to the lesion location and geometry, and the size of the ROI is sampled from a probability distribution. The carved ROI then replaces the corresponding voxels in a second labeled image, and the annotation of the second image is replaced accordingly as well. In this way, we generate new labeled images for network training and the lesion information is preserved. To evaluate the proposed method, experiments were performed on two brain lesion datasets. The results show that our method improves the segmentation accuracy compared with other simple data augmentation approaches.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_19

SharedIt: https://rdcu.be/cyhLL

Link to the code repository

https://github.com/ZhangxinruBIT/CarveMix.git

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

Authors propose a data augmentation method to produce training data for brain lesion segmentation. Given a pair of annotated training images, the proposed method combines a region of interest (ROI) according to the lesion location and geometry, replacing the corresponding voxels in other labeled image.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper proposes an lesion-aware augmentation method specific for brain lesion segmentation. Compared to other augmentation methods, experiments showed it presents a better performance.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- results discussion could be more detailed, specially pointing up limitations of the method
- it is not clear if comparison of the augmentation methods: CarveMix, Mixup and CutMix with TDA was fair, since TDA was also performed for these synthetic images. Does it mean that the nnU-Net trained with CarveMix, for example, was trained with more images than the one trained only with TDA (table 2)?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Authors provided code and data and specified the hyperparameters used.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- reference [8] for state-of-the-art performance of brain lesion segmentation is from 2017 and should be updated
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

It presents a novel approach for data augmentation with some good results, but it did not present a detailed discussion.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

The paper proposed an augmentation method to improve brain lesion segmentation. The idea is to caved ROI of brain lesion and produced synthetic images by replacing in another image.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Augmentation is important topic particularly in medical field due to lack of data. Therefore the method is simple and innovative at the same time.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

It seems that images are well registered, otherwise ROI may end up in ‘unrealistic’ areas. Brain size and shape may vary from one to other and lesion may appear in ventricle. ATLAS data are publicaly available so the results should be compared with state-of-the-arts methods The reason that Mixup(0.42, 0.57 DC) and Cutmix(0.24, 0.07 DC) failed miserably is not clear and was not discussed well.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I have no comment on this
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Investigating on why two other meth
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method can be innovative however, the evaluation is not properly done.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

4
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The authors present a new data augmentation method for lesion segmentation that is lesion-aware. The method is based on the CutMix and Mixup approaches, where cropped regions of different training images are combined. Instead of cropping random regions or linearly combining the labels, they crop the lesioned area and label of one image and combine it with the other one. According to the results, CarveMix outperforms the other data augmentation approaches used for comparison.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The manuscript is well-written and easy to follow.
- The idea is novel, interesting and simple. It does not require any specific architecture and could be used in any segmentation application as is.
- The method tries to preserve the lesion geometry and its surroundings. Thus, the cropping is lesion-aware and not random.
- The results show an improvement in generalisation over other data augmentation techniques without the need of acquiring new images or using complex architectures.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- I expected to see results with a baseline method that does not use data augmentation. While not necessary, it would be a good way to gauge how much improvement that technique can provide.
- The baseline method (TDA) is part of all the other approaches. So, technically, the final results of the CarveMix method are not truly independent from the TDA approach (this is pointed by the authors in page 7) and might be actually benefitting from conventional data augmentation, too.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors used both public data and a publicly available repository. The newly developed strategies are also clearly explained with details on the manuscript and should be easy to reproduce. Regarding the data, the authors used a public dataset and a smaller private one. For the private one, some basic details are given (scanner, b-value, resolution and number of images). While I think that some more details on the DWI data wouldn’t hurt (like the number of diffusion directions, pre-processing, etc.), I believe these details are not necessary.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
While I believe the manuscript is well-written and clear in general, there are some changes I think could further improve the paper:
- I think “Algorithm 1” is unnecessary. The pseudo-code is mostly text which is covered in detail on the description of the method and it does not provide any new information. That space could be re-used with something else. For example, Figure 2 illustrates qualitative results for the training with the whole dataset (100% case). I was really curious about the qualitative results on the worst-case scenario (12.5% of the training dataset) and I believe it would make a bigger impact.
- There are also some small details missing. For the evaluation part, are the image pairs for CarveMix, CutMix and Mixup the same? If the pairs are different, I am not sure the comparison is actually fair. How was the train/test split done? Was it random? Did the authors choose the testing cases according to a specific criteria? I assume it was random, but if that was the case, it might have been interesting to run some experiments with other splits to see if CarveMix always outperforms other approaches.
- Since the paper focus on strokes, which usually only appear once per brain, I was interested on the authors insights on CarveMix for other diseases were multiple lesions might be present (such as multiple sclerosis). Is the method valid as is or should it be reformulated to define M_i accordingly?
- This is a final minor comment, but it is worth pointing out that while the approach is lesion-aware, it is not necessarily “structure”-aware. If not careful, lesions might be “pasted” into healthy structures or tissue. Using non-linear registration after cropping could potentially solve that by introducing other issues. I think the manuscript could benefit from a couple of sentences to address that.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While I believe there are parts of the manuscript that could be slightly improved, I believe the idea is both interesting and novel. Not only that, but the fact that such a simple technique to augment an existing dataset can produce such an improvement with a well-studied segmentation framework (nnUnet) without acquiring extra images or using more complicated adversarial architectures is worth mentioning.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This submission proposes a lesion-aware data augmentation method for brain lesion segmentation. The proposed method is interesting and novel, which is recommended for acceptance. However, the reviewers have some concerns about the presentation of the method, the reasonable lesion locations in the generated images, the evaluation of the proposed method, baselines, etc. Also, are there any artifacts on the boundaries of the lesion in a generated image? Will these artifacts bias the lesion segmentation? Please try to addresses these concerns and questions in the final version. Also, please double-check the numbers in Table 2, especially the CutMix column.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

Reviewer 2 (R2) and Reviewer (R3) are interested in whether unrealistic synthetic images are generated. Like Mixup, CutMix, or other augmentation methods based on image mixing, we do not explicitly require that the synthetic images are realistic, and thus it is possible to have artifacts in the generated training images. However, existing works have shown that unrealistic synthetic images are also able to improve network training despite the distribution shift, and there exists a tradeoff between the distribution shift and augmentation diversity (Raphael Gontijo-Lopes et al., ICLR 2021). We have also tried to perform image registration before mixing the images, so that the synthetic images are more realistic. However, this has not led to improved segmentation performance. This clarification also relates to the question of Reviewer 1 (R1) about the limitation of the proposed method. One limitation is that it is unknown how to achieve an optimal tradeoff between the diversity and realisticity of the synthetic images, and it would be interesting to explore the topic empirically or even theoretically in future work. We will better clarify that unrealistic synthetic images are allowed in the proposed method.

R1 wonders whether CarveMix, Mixup, and CutMix used more training samples than TDA. We would like to clarify that the same number of epochs and the same number of batches per epoch were used for each method during network training. Since TDA was performed randomly online, all methods have used the same number of training samples. Therefore, the comparison with TDA is fair, and we will better clarify this.

R2 suggests that the state-of-the-art methods for the ATLAS dataset should also be compared. We selected the nnU-Net as it has consistently achieved the state-of-the-art performance in a variety of medical image segmentation tasks. nnU-Net shows that with carefully designed preprocessing and postprocessing, even with the standard U-net structure the segmentation performance is comparable to those achieved with more advanced network architectures. Our results indicate that this is also the case for the ATLAS dataset, where the baseline performance of nnU-Net is comparable to the results reported in the literature (for example, see Kehan Qi et al., MICCAI 2019 and Zeju Li et al., TMI 2020). Please also note that our augmentation method is agnostic to the network structure. Therefore, even if there is a network that outperforms nnU-Net, our method can still be integrated with this network for improved performance.

R2 wonders why the performances of Mixup and Cutmix were bad when only 12.5% of training data was used. The poor performance could be due to the scarce true annotated data, which limits the diversity of the generated data. Moreover, Mixup and Cutmix can generate many unrealistic images. Since the total number of synthetic and real training images was set to 1000, when true annotated images with realistic appearances were scarce, the unrealistic images could overwhelm the real images, leading to degraded network training. Future work could further explore the cause of the poor performance of Mixup and CutMix when the amount of real training data is small. Note that our method still performed decently given the small number of annotated training scans.

R3 is interested in the results without any data augmentation, as well as the results of CarveMix without TDA. We have actually performed such experiments, which are not reported due to page limit. The results without any augmentation are worse than the results of TDA, and the results of CarveMix are better when it is integrated with TDA. Since in practice TDA is always applied in nnU-net, we believe it is fairer to use the results achieved with TDA as the baseline performance, and for fair comparison all other methods are also integrated with TDA. We will better clarify this in the paper.

Other minor issues will also be addressed.

back to top

CarveMix: A Simple Data Augmentation Method for Brain Lesion Segmentation