Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Zhendong Liu, Van Manh, Xin Yang, Xiaoqiong Huang, Karim Lekadir, Victor Campello, Nishant Ravikumar, Alejandro F. Frangi, Dong Ni

Abstract

The performance of deep segmentation models often degrades due to distribution shifts in image intensities between the training and test data sets. This is particularly pronounced in multi-centre studies involving data acquired using multi-vendor scanners, with variations in acquisition protocols. It is challenging to address this degradation because the shift is often not known a priori and hence difficult to model. We propose a novel framework to ensure robust segmentation in the presence of such distribution shifts. Our contribution is three-fold. First, inspired by the spirit of curriculum learning, we design a novel style curriculum to train the segmentation models using an easy-to-hard mode. A style transfer model with style fusion is employed to generate the curriculum samples. Gradually focusing on complex and adversarial style samples can significantly boost the robustness of the models. Second, instead of subjectively defining the curriculum complexity, we adopt an automated gradient manipulation method to control the hard and adversarial sample generation process. Third, we propose the Local Gradient Sign strategy to aggregate the gradient locally and stabilise training during gradient manipulation. The proposed framework can generalise to unknown distribution without using any target data. Extensive experiments on the public M&Ms Challenge dataset demonstrate that our proposed framework can generalise deep models well to unknown distributions and achieve significant improvements in segmentation accuracy.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_43

SharedIt: https://rdcu.be/cyhMl

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper presents a method to train networks for segmentation in a robust way wrt to style/slight domain shift (for instance, different vendors or acquisition protocol). What the authors propose is a training algorithm, which does not involve any network modification.

The method follows a curriculum strategy (i.e., starting from ‘easy’ examples, and progressively making them harder), by using a style transfert network: slowly shifting the examples toward a common style. The resulting algorithm is both faster to train, and to run at inference, than leading methods from the challenge the authors are comparing to.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper is very well written and easy to follow
- The Figures and Table are well made and add value to the paper
- Thorough evaluation, with many reported metrics and many relevant baseline methods
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

No major weakness
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Without access to the code, it might be a bit difficult to reproduce exactly the authors’ results, though it would be possible to re-implement something quite close, based on Algorithm 1 and other experimental details.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

It is not fully clear if the method requires the style transfer at inference. This should be clarified in the camera ready version.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well written, with a good motivation, and present a clear method that works. The evaluation is thorough
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper describes a curriculum learning based approach for cross-domain (centre, vendor and disease) image segmentation. The model first leverages existing art WaveCT-AIN for transfer target image to source domain in the same style. The stylized image is iteratively updated via adversarial gradient of a simultaneously trained segmentation network. For stability, the adversarial gradient is smoothed locally via average pooling and up-sampling.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The overall framework seems to be novel. Leveraging adversarial samples and style transfer together for robust cross domain segmentation is reasonable. The idea of local gradient smoothing approach itself is interesting. It normalizes the gradient via average filtering which helps alleviate sudden peaks and stabilize the training process.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The presentation of the methodology is poor. The description of the methodology is too high-level without any technical details disclosed. The entire approach significantly relies on a pretrained style transfer network which is from an existing literature. If the network does not work to a certain accuracy, the entire approach easily fails.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Overall this approach is likely to be reproducible, since the algorithm is rather straightforward combination of established components. The dataset is open-public as well.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
Overall the idea of this paper is interesting. However, the details are not well represented and there are certain limitations of the approach.
1. The entire approach has an assumption that the style transfer does a good job. Otherwise weighting between stylized image (could be random images if WaveCT-AT is not well trained) and train image will simply produce a random noisy image which will for sure not suitable for training a segmentation network.
2. Is local gradient smoothing really a good approach for this segmentation task? Using the author notion, let f() denote the segmentation network. This network essentially approximates a 2D piecewise constant function. Assuming the output of f is smooth, the gradient near the segmentation boundary is then very high. Adding gradient smoothing may lead to a poor approximation. Please clarify.
3. I checked the website of M&M challenge. This challenges consists of multi anatomies, LV, RV, MYO. Which one is the target in the experiment of this paper? Please add more details of the dataset and task.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall I think the idea of this cross domain segmentation work is interesting. The results seem promising compared to top methods in the open challenge. The ablation study well justifies the necessity of each component.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

The author propose a novel framework to ensure robust segmentation in the presence of such distribution shifts on MICCAI 2020 M&Ms challenges.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Different from data augmentation strategy, another method for model generalization ability is proposed. The author modified style transfer model i.e., WaveCT-AIN with “style fusion” operation, then adding “local gradient sign” to stabilize training procedure.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The reason of proposing this new method is not correct.
The author claims “Although DA methods can mitigate model overfitting to some degree, they cannot guarantee the ability of deep models to generalize multi-centre, multi-vendor data, typically encountered in real clinical scenarios.” However, lab of Fabian Isensee (author of nnUNet) also joined M&Ms and rank 1st (https://arxiv.org/pdf/2011.07592.pdf). Their strategy is just using suitable data augmentation strategy and is better than results of author.

What’s more, the author shows comparison of different computation times in Table 2. However, “P1” (rank 1st plan in M&Ms, nnUNet team) just needs one segmentation model and does not need other sophisticated module such as “Style Transfer” module and complicated training strategy such as “Local Gradient Sigh”, which means training/inference time of their plan is also more likely shorter than the author’s.
1. Lack of novelty. The author only introduce WaveCT-AIN (ISBI2020, which is also inspired by WaveCT) with small modification. “Style Fusion” is two Hadamard product operations and one summation operation, “Local Gradient Sign” is adding average pooling, ReLU and upsampling operation in fast gradient sign method (Goodfellow et al.)
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I believe the numerical results of this paper is true because I also paid attention to M&Ms challenge in 2020.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

“Local Gradient Smoothing” is an interesting method and adversarial strategy could be used in many tasks. I encourage the author extend experiments in many other fields rather than only in medical image segmentation task. Only if in various tasks its performance is better than normal method such as “SCL” with those simple operations in “Local Gradient Sign” can reviewer accept it.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The numerical results is actually not impressive and the modification of WaveCT-AIN is not big.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper focuses on an important problem of domain shift. The proposed model leverages WaveCT-AIN, which makes the novelty low. The experiments are failed to compare with some of the miccai 2020 M&Ms results.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

Author Feedback

We thank all the reviewers (R) for their thorough reviews. The novelty is clarified thoroughly and the comparisons with all M&Ms methods have been fairly conducted and will be given in the final version.

Q1: Method novelty. (#R2, #R3) A1: Our work has remarkable novelty regarding the methodology and performance. First, as far as we know, this is the first investigation of curriculum learning for the domain shift problem in medical image segmentation. It provides a novel and promising approach for the community (open-source code) with superior performances in M&Ms Challenge. It clearly proves that gradually learning with adversarial, hard examples contributes strong generalization capacity at very low computation costs. Second, we propose a new design for the adversarial attack based hard example generation. Different from the FGSM which uses the adversarial signals to directly perturb original images for attack, we take the signals to balance the image fusion for attack. The fusion controls the attack and avoids harmful generation. This modification brings significant improvement in the training stability over FGSM. Third, we devise the important Local Gradient Sign (LGS) module for effective and general gradient manipulation. Based on our knowledge, LGS is the first module which can soften the adversarial attack in a local manner and protect the curriculum learning from crash. For clarity, our focus is the novel investigation of curriculum learning. Style transfer (ST) is only a part of sample initialization. We consider the WaveCT-AIN as a basic ST for its high-quality and fast speed.

Q2: Code and more applications. (#R1, #R3) A2: We have released our code on GitHub via an anonymized link (https://github.com/MICCAI-1393/LSCL). We were collecting images and validating the proposed method on more challenging applications and summarizing them in the journal version.

Q3: Details about the inference. (#R1, #R3) A3: In our design, the ST and LGS operations are only required during training for data augmentation and attack mitigation of hard samples, respectively. During testing, the trained model is lightweight and efficient without using these two modules.

Q4: Role of style transfer. (#R2) A4: ST is only an optional module of our system to generate initial samples and increase the variabilities of training set. We chose WaveCT-AIN to obtain both high-quality generation and quality-efficiency balance, due to its pre-training on large dataset and high generalization ability.

Q5: Comparison with M&Ms Challenge. (#R2, #R3) A5: As the reviewers indicate and the M&Ms Challenge requires, our targets are the LV, RV and MYO. We adopted the same evaluation criteria and ranking method as the M&Ms challenge performs. We have presented fair comparisons with all competition methods in M&Ms. From the comprehensive view of efficacy and efficiency, our method achieves the best results. Due to space limitations, we only reported P2-P3 teams in M&Ms. We will add more details in the final version.

Q6: Motivation and comparison with nnUnet. (#R3) A6: Our proposed method has strong clinical needs. nnUnet is indeed a promising solution with all P1-P3 (rank 1st to 3rd in M&Ms) teams in the M&Ms challenge. However, it needs both exhaustive data augmentation and heavy model ensemble/post-processing. Its training is very complicated and time-consuming (the training/inference time of P1 is 60h/1s). In this regard, we propose a general and light-weight 2D framework with comparable performance to P1 and much higher efficiency (the training/inference time of our method is only 5h/0.2s).

Q7: Details of Local Gradient Sign. (#R2, #R3) A7: For clarity, LGS only operates on the gradient maps during training, rather than the segmentation prediction. Therefore, LGS does not enlarge the prediction loss or magnify the gradient around the segmentation edge. LGS aims to reduce the influence of adversarial perturbations for stable curriculum learning.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have clarified the novelty in the rebuttal and they will include all the teams in M&Ms challenge in the final version.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The view of the reviewers ranged widely with respect to the technical novelty and the experimental results presented in this paper. After reading the paper, reviews and the rebuttal my opinion is that this paper has sufficient technical novelty and promise in terms of experimental results.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

After reading the author’s rebuttal and the paper I am borderline with this paper, leaning towards rejection. I feel some claims in the rebuttal are overclaimed and not supported in the experiments (e.g., superior performance or the claimed significant improvement in the training stability over FGSM, which is not demonstrated). Selecting samples in the target domains has been investigated in the medical segmentation literature (see [1] for example), and this was never discussed. More importantly, despite the complexity of this approach, the main improvement seems to come from the Test-Time-Augmentation, which diminishes the real impact of the proposed model. Furthermore, important concerns from reviewers are not explicitly addressed in the rebuttal, but given as promises (e.g., comparison to additional approaches or results in another dataset).
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

back to top

Style Curriculum Learning for Robust Medical Image Segmentation