Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Chen Chen, Kerstin Hammernik, Cheng Ouyang, Chen Qin, Wenjia Bai, Daniel Rueckert

Abstract

Deep learning-based segmentation methods are vulnerable to unforeseen data distribution shifts during deployment, e.g. change of image appearances or contrasts caused by different scanners, unexpected imaging artefacts etc. In this paper, we present a cooperative framework for training image segmentation models and a latent space augmentation method for generating hard examples. Both contributions improve model generalization and robustness with limited data. The cooperative training framework consists of a fast-thinking network (FTN) and a slow-thinking network (STN). The FTN learns decoupled image features and shape features for image reconstruction and segmentation tasks. The STN learns shape priors for segmentation correction and refinement. The two networks are trained in a cooperative manner. The latent space augmentation generates challenging examples for training by masking the decoupled latent space in both channel-wise and spatial-wise manners. We performed extensive experiments on public cardiac imaging datasets. Using only 10 subjects from a single site for training, we demonstrated improved cross-site segmentation performance, and increased robustness against various unforeseen imaging artefacts compared to strong baseline methods. Particularly, cooperative training with latent space augmentation yields 15% improvement in terms of average Dice score when compared to a standard training method.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87199-4_14

SharedIt: https://rdcu.be/cyl3Q

Link to the code repository

https://github.com/cherise215/Cooperative_Training_and_Latent_Space_Data_Augmentation

Link to the dataset(s)

https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html

2 https://www.ub.edu/mnms/

Reviews

Review #1

Please describe the contribution of the paper
The paper presents two contributions:
1. targeted latent space data augmentation
2. an end to end frame work that first generates segmentation maps and then refines them.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

I like the idea of targeted latent space data augmentation. I think the second contribution is not so significant or novel.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Some parts of the paper is not very clear. For example the latent code decoupler.
- I think generally the latent space data augmentation needs more explanation and analysis.
- the term target masking is very confusing. I think what the authors mean is “targeted masking”
- since the masking is done in latent space with very low resolution, and if the targeted masking is targeted on the more deteministic locations, e.g. cardiac structure, i wonder if the proposed method can be applied to small object segmentations such as lesions were the lesion information may not be present in such low resolutions.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I think the paper is reproducible. Many items in the reproducibility check list are respected.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

I think the idea of latent data augmentation is interesting. However, I wonder if it actually solves the problem of domain generalization. In table 1, the proposed method performs significanly worse than a baseline model on M&Ms dataset and it only performs better on synthetic domain shifts such as random motion ghosting . In general the experiment section of the paper is not easy to read. The authors mentions three runs for the experiments. Please provide standard deviations for these runs. -explanation of figure 2 can be improved.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Even though i think its an interesting approach due to unclarity of the experiment section and significantly worse performance a real data domain (M&Ms), Im inclined for borderline rejection of this paper.
However, I wont be upset if the paper is accepted.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper proposed a two-stage image segmentation pipeline that utilizes a base shape segmentation and a secondary refinement network in conjunction with latent space masking for improved generalization capabilities under distribution shifts, which is done by encouraging cooperation between both networks for segmentation robustness towards generated hard latent samples.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- To the best of my knowledge, the proposed two-stage segmentation training setup with joint latent-space augmentation is novel.
- Experimental results are convincing, showing improvements over existing image-space augmentation methods.
- The motivation behind the latent space masking schemes seem sensible.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Latent data augmentation is not novel and has been done in e.g. various metric learning approaches, such as in Lin et al., “Deep Variational Metric Learning”, Duan et al., “Deep Adversarial Metric Learning”, or Zheng et al, “Hardness-Aware Deep Metric Learning”. Especially the latter encourages meaningful latent sample generation to improve generalization under distribution shifts. A rephrasing and better discussion of related works would thus be encouraged. In addition, some ablation experiments comparing the proposed approach to existing latent-space manipulation work would offer additional insights into the actual benefits of the proposed method.
- Comparison to non-learned refinement methods such as via e.g. Conditional random fields (see for example Christ et al. “Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields”) would better highlight the actual benefit of joint cooperative training of a FTN and STN.
- It would be very helpful if the authors provided convergence figures to better understand how such a cooperative two-network training setup lengthens or hurts training convergence as compared to a single network baselines, and thus better understanding the cost of opting for the proposed setting.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

I would argue that the paper is not easily reproducible, as crucial information such as the exact used augmentations are not made available, and some training settings are unclear (e.g. how exactly is p sampled?). Even taking into account the supplementary, my opinion still stands, as the latter only provides the used network architecture.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Please see the “main weaknesses” section for suggestion on things that should be improved.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The relative improvements of the cooperative training compared to the single network performance are encouraging, and the (mostly) improved performance of the latent-space augmentation method compared against image-space augmentation methods in my eyes should warrant acceptance as a whole.

However, a better discussion of existing work, additional training details and further information regarding the computational impact of a two-network cooperation would be more helpful in better placing the actual benefits of the proposed method.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

The method is inspired by the two-system model in human behaviour sciences, where a fast-thinking system makes intuitive judgment and a slow-thinking system corrects it with logical inference. This strategy allows obtaining the context information to improve the segmentation in complex cases
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The method presents a new formulation to segment images by combining data augmentation in the latent space and multi-task learning. This latent space contains abstract representation of both image and shape features and challenging examples can be generated by manipulating this space. The multi-task learning achieves good results with a limited dataset
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The method could be compared with other results from the state of the art about the left ventricle segmentation, in order to validate its importance. The computational time is not presented to see the efficiency.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The method is reproducible, using public dataset and all information about parameters is shared and the code is available in Github.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Authors present an interesting idea to improve the segmentation by devising new data augmentation and multi-task learning approaches. The method could be used in cardiac imaging, comparing results with other strategies to evaluate better its performance. As well, it could be nice to see the computational time.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method is clearly described, with potential to be used in cardiac imaging and complex scenes.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Somewhat confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The reviewers found this paper to be interesting and to include novel aspects. Experimental evaluation was also thought to be a strength but lack of comparisons to non-learned refinement methods was commented as a weakness. There were issues raised regarding low clarity and low reproducibility. It was also commented that latent space augmentation in general is not novel and the work should be put in context in this regard. The authors should address these criticisms in their rebuttal.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

Author Feedback

We thank AC and all reviewers for agreeing that the proposed cooperative training with latent space data augmentation is interesting. Both R2, R3 agree that our method provides ‘encouraging’ and ‘convincing’ improvements over baseline methods for domain generalized and robust segmentation. AC acknowledges its novelty, R3 highlights its potential to be applied to other applications.

There are 3 major criticisms from reviewers, summarised by AC: 1) lack of comparison to non-learning based refinement methods, 2) the clarity and reproducibility, 3) the novelty of the latent space data augmentation component.

1) R2 suggests comparing our method to non-learning-based refinement methods, e.g. conditional random field (CRF) [1]. We argue that such a comparison may not be necessary. Recently, a denoising autoencoder (DAE)-based refinement [2] has demonstrated the superiority of learning-based methods against the commonly used CRF. In Table 2, we compare our method to this advanced method [2]. Results show that our method greatly outperformed it (Dice score 0.6901 vs 0.6077).

2) R1 raises concern about the clarity of methodology, such as the missing details about the ‘latent code decoupler’. For the ‘latent code decoupler’, we have described the architecture in Sec. 2.1, and put a detailed diagram in Supple. Fig. 5. We will improve its clarity in a revised version and cross-reference the diagram.

R2 criticizes the reproducibility of this paper, raising questions about the “exact data augmentation and training settings such as how exactly is p sampled?” We have to point out that we have mentioned that: ‘p is randomly selected from [0%,50%]’ (Sec. 3, p6). In addition, we will publish the code of our method, as mentioned in the paper. Both R1 and R3 support us: ‘The method is reproducible’ (R1,3); ‘using public datasets and all information about parameters is shared’ (R3).

3) R2 criticizes that “latent space data augmentation is not novel and has been done in various metric learning approaches”. This may be an important misunderstanding. We have to point out that we augment data in a different way. Our method is feature masking-based and has the advantage of generating various ‘realistic’ hard images and segmentations to benefit training (Fig.3). By contrast, the metric learning work recommended by R2 [3] performs feature interpolation and requires paired images from the same/different categories to generate synthetic data. This method is not originally designed for segmentation tasks. To the best of our knowledge, our work is the first to explore the benefit of latent space data augmentation for robust segmentation in the medical domain. This is also reflected in R1, R3’s views.

The remaining concerns are minor. R1 concerned about our performance on the M&Ms dataset. We have to point out that our method clearly outperforms standard training and 3 baseline methods on the M&Ms dataset, and more importantly, achieves the highest average Dice score among 6 datasets (Table 1). The strongest SOTA method (Adv Bias), however, only performs better on 2 datasets relevant to bias fields and intensity shift that the method is designed for. It gets the worst performance on the ‘RandSpike’ dataset. Our method outperforms it on all other datasets. R1 also asks about its applicability to small object segmentation. We think our method can be applied to various tasks, as it is generic and flexible. We thank all reviewers for providing constructive suggestions and pointing out typos (e.g. R1: ‘target masking’-> ‘targeted masking’) and will incorporate them accordingly in a revised version.

Christ et al. “Automatic liver and lesion segmentation in CT using cascaded fully convolutional neural networks and 3D conditional random fields.” In MICCAI’16.

Larrazabal et al. Anatomical priors for image segmentation via post-processing with denoising autoencoders. In MICCAI’19.

Zheng et al. “Hardness-aware deep metric learning.” In CVPR’19.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal has satisfactorily answered the questions put forth by the reviewers. More specifically, the differences from existing latent space augmentation methods were clarified, and the lack of comparisons to non-learned refinement techniques were justified. The authors are encouraged to clarify the same issues in the final version of their paper. A couple of minor issues such as lower performance on some datasets and questions about small object segmentation remain; however, overall the paper passes the bar for acceptance in my opinion.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper added latent space-based data augmentation into the medical imaging segmentation task, under a multi-task learning framework. The paper is well written and easy to understand. The experimental results are convincing. The author addresses most issues raise by the reviewers. Given the complexity of the algorithm, it is recommended to open source the code.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

All the reviewers found that the proposed method is interesting and novel in the first round. Reviewer 1 gave a borderline rejection due to unclarity of the experiment section as indicated in the review. Overall, the paper introduces an interesting training approach based on latent space data augmentation. The main criticisms (lack of comparison to the non-learning based refinement methods, reproducibility and novelty of latent space augmentation) are all satisfactorily addressed.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

back to top

Cooperative Training and Latent Space Data Augmentation for Robust Medical Image Segmentation