Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Dani Kiyasseh, Albert Swiston, Ronghua Chen, Antong Chen

Abstract

Deep learning algorithms for cardiac MRI segmentation are highly dependent upon abundant, labelled data. However, current healthcare scenarios are characterized by abundant, unlabelled data and scarce, labelled data that are located across disparate medical centres. To account for this, we propose a self-supervised semi-supervised meta-learning framework, $\text{S}^{4}$ML, that leverages distributed labelled and unlabelled data to quickly and reliably perform cardiac MRI segmentation given scarce, labelled data from a potentially different distribution. We show that $\text{S}^{4}$ML outperforms baseline methods when transferring across medical centres, cardiac chambers, and MR sequences. We also show that this behaviour holds even in extremely low-data regimes, indicating the utility of $\text{S}^{4}$ML when abundant, labelled data are unavailable.

SharedIt: https://rdcu.be/cyl1s

N/A

https://www.ub.edu/mnms/

Reviews

Review #1

• Please describe the contribution of the paper

This paper addresses a key challenge in the community of generalising deep learning based segmentation networks to diverse multi-centre data, given limited labelled data and unlabelled data. The proposed meta learning approach (S4ML) is employed to segment the left atrium in cardiac LGE-MRI data, demonstrating that learned features can be transferred across segmentation tasks (cardiac chambers), and multi-sequence cardiac MRI data from multiple centres.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

(1) The paper addresses real world challenges in cardiac MRI segmentation - dealing with limited labelled data, accommodating domain shifts native to multi-centre data, and generalising across multi-sequence MR data.

(2) The proposed meta learning framework is generic and agnostic to the segmentation network architecture employed, useful for integration with existing approaches to improve robustness and generalisability to unseen data exhibiting domain shifts.

(3) The proposed approach is trained and evaluated using open source challenge, promoting reproducibility in the future.

(4) The data and network architecture ablation experiments are well conceived and informative. And the results obtained using the SSML approaches in the lower data regimes are particularly impressive.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

(1) No statistical significance tests have been conducted/reported for the segmentation errors presented in Fig.2. These must be included to enable effective comparison all methods. Similarly, no significance tests are reported in Tables 1 and 2 and error standard deviations are not included either. These must be included for effective comparison.

(2) While DICE and HD are well established metrics for assessing segmentation accuracy, quantifying and comparing a clinical index such as LA volume using the predicted segmentations (for each method, with respect to the ground truth) will indicate whether specific approaches are better at preserving this clinical biomarker. This would provide insights to the clinical value of the proposed framework.

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors utilised open source challenge data for all experiments presented in the paper, promoting reproducibility in the future (following public release of the code).

The manuscript would benefit from a clearer description of the number of unlabelled and labelled samples used in each set of experiments. This detail is currently entirely consigned to the supplementary material and should be part of the main body of the manuscript.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

This reviewer recommends that the authors reorganise their paper such that the description of the training, validation and test data used in all experiments is part of the main body of the manuscript. This can be swapped with the section on convergence for example. This would improve the overall clarity of the paper.

Additionally, as mentioned above, it is imperative that the statistical significance of the presented segmentation errors are evaluated and reported in the manuscript to enable effective interpretation.

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The proposed framework addresses key challenges in medical image segmentation, is generic and flexible, lending itself to integration with existing segmentation approaches. The experiments are well motivated and designed and the results, particularly in the low labelled data regime are impressive. Overall, this paper will be of significant interest to the MICCAI community.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Very confident

Review #2

• Please describe the contribution of the paper

The paper proposed a method which combines meta-learning and contrast-learning, to realize cross-center and cross-chamber segmentation given scarce labelled data. Specially, the paper presented an interesting scenario, i.e., transfer learning across cardiac chambers.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Novel application: transfer learning across center and across cardiac chamber. While transfer learning and modality adaption have been an important topic for medial image analysis, cross-chamber transfer learning has a clear novelty of application.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Experiment results: from the reported results, the performance superiority is not significant. From the results in Figure 2 and Table 1, there is no stable improvement of the proposed method when different fraction of the training samples, or different network architecture is used. Sometimes, the result is even lower than random initialization. Therefore, these results cannot validate the effectiveness of the proposed method.

Inconsistent configuration with the claimed experiment aim. While the author claimed that the proposed method can be used for scarce labelled data, the training samples in this work, even for F=0.1, seem not that scarce (355 slices).

Unclear description. While meta-learning is used as a framework, it’s not clearly described what is the T learning tasks during meta-learning? It seems some classification and regression tasks are used. These need to be clearly described.

• Please rate the clarity and organization of this paper

Satisfactory

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility seems OK. Training and test code will be provided.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The procedure of the meta-learning task should be cleary described. More description on the motivation of aross-chamber transfer learning, its challenge and the corresponding solutions in this work. More convincing results are required.

The names of different method can easily make confusions. Please check the equation 3 for the uperscript of RHS and LHS. Unclear label in Fig. 3. What is the gray curve?

probably reject (4)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Un-convincing results to valdiate the effectiveness of the method.

• What is the ranking of this paper in your review stack?

4

• Number of papers in your stack

5

• Reviewer confidence

Very confident

Review #3

• Please describe the contribution of the paper

This paper proposed a self-supervised semi-supervised meta-learning framework to leverage labeled and unlabeled multi-site data for annotation-efficient cardiac segmentation. They have validated their performance on two datasets.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

++ The problem they tackled, i.e., how to utilize labeled and unlabeled multi-site data, is interesting.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

– Regarding their problem settings, in the first paragraph on Page 2, there is a sentence, i.e., “Specifically, we leverage left ventricular…. LGE MRI data.”. In multi-site learning, why it is practical to fast adopt a model previously trained in one site for one organ to a different site to segment a different organ?

– In methods, the unlabeled data and labeled data from each site are not properly denoted mathematically. It may cause some misunderstandings whether each site has both labeled and unlabeled data, or each site could only have one of them.

– It is unclear why to bring self-supervised learning in their framework and how self-supervised learning would help to solve their problem.

– The key components in their framework, i.e., contrastive learning and Meta-learning, are directly adopted from previous works. What are the technical contributions in this paper?

– In their experiment, they did not mention the proportion of labeled data and unlabeled data. Also, in 2020 Cardiac M&M dataset, since there are four sites of data, why splitting the training data according to the proportion instead of the site? It could be better if you split four sites of data into four-fold, randomly pick three of them for meta-training and take the remaining one for testing, and then perform four-fold cross-validation.

– Despite the top figure in Fig. 2 is colorful, when it comes to the comparisons with the SOTA methods, a table with concrete numbers would be more helpful to observe the performance.

• Please rate the clarity and organization of this paper

Satisfactory

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

In the method, the descriptions of the input and training details are unclear. Without code available, I am not positive about it.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. The motivation of your problem setting should be emphasized, such as why a framework that could deal with both labeled and unlabeled data in multi-site learning is more practical.

2. Several descriptions in the method are unclear, such as the mathematical denotation of the inputs, the overall training strategy and objective function, and the meaning of $\theta$ and $\theta_{t}$ in Fig.1. Also, the abbreviations like SSML and SML should be properly explained in the places they have first shown up.

3. In the experiment, the performance would be easier to observe if the top figure in Fig.2 could be a table with numbers. Also, in the table, it would be better if you could indicate which methods are the proposed methods, i.e., change “SSML+SML” to “SSML+SML (Ours)”.

probably reject (4)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The descriptions of the method are unclear. Since the major components like contrastive learning and meta learning are directly adopted from other’s work, the novelty seems incremental.

• What is the ranking of this paper in your review stack?

5

• Number of papers in your stack

5

• Reviewer confidence

Somewhat confident

Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper proposed a self-supervised semi-supervised meta-learning framework to leverage labeled and unlabeled multi-site data for annotation-efficient cardiac segmentation. Although this paper is well-written, there are still some issues proposed by the reviewers, such as the lack of statistic test for the experiment results to make them more convincing, the motivation of the problem setting and the introduction of self-supervised learning into their framework, and some unclear descriptions. Detailed comments have been provided to the authors, and they should revised them carefully for further consideration.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

Author Feedback

We thank the reviewers for providing us with valuable feedback. We group your comments and address them with tags [R#].

Motivation [R3, R4]. We believe that our methods are motivated by a realistic and increasingly common clinical setting characterized by three components. 1) Cardiac MRI data are located at distinct medical centres (e.g., health system with centres at different geographical locations). 2) Some medical centres contain specific MR sequence data (e.g., cine MRI) that are abundant and either a) labeled with segmentation maps of a cardiac chamber (e.g., LV Endo), or b) unlabeled with these maps due to the burden associated with labelling. 3) Other medical centres contain scarce data (from another MR sequence, e.g., LGE) that are labeled with segmentation maps of a different cardiac chamber (e.g., LA Endo). Such scarcity could be due to the presence of a rare or novel medical condition or due to limited medical imaging infrastructure. We will modify the introduction to clarify this motivation.

Contributions [R4]. We believe that our framework offers several benefits relative to baselines. Previous work has assumed that either a) all data are stored in a centralized server, b) only unlabeled data, or c) only labeled data are available for exploitation. In contrast, we design a unified pre-training framework that is flexible enough to exploit both abundant, labeled and unlabeled data located at distinct medical centres. Given cardiac MRI data from a previously unseen MR sequence (LGE) and located at a previously unseen centre, our goal is to segment a previously unseen cardiac chamber (LA Endo) quickly (with few training epochs) [evidence in Fig. 3] and efficiently (with few data points) [evidence in Fig. 2]. Such a system has broad applicability and allows researchers and medical practitioners to exploit any and all datasets at their disposal (labeled, unlabeled, and regardless of size and MR sequence). It also adds value to settings with limited computational resources and data (particularly that which is currently inaccessible in silos and thus not exploited). This, in turn, can contribute to improving patient outcomes.

Statistical significance of results [R2, R3]. For each fraction, F, and metric (Dice and Hausdorff), we identify the top-performing method and conduct an exhaustive set of paired t-tests to substantiate our findings (all findings here: https://tinyurl.com/S4MLCode). As a concrete example, we find that 94% of the 18 paired tests (Dice metric at F=0.25 and F=0.5) conducted between SSML -> SML Leap and the remaining methods exhibit p < 0.01. This suggests that our framework of exploiting both labeled data (to learn parameters relevant to segmentation) and unlabeled data (to learn rich and generalizable representations) during meta-training is beneficial. We will modify Fig. 2 to reflect these findings. [R4] We also have tables to reflect the raw values underlying Fig. 2 (please visit https://tinyurl.com/S4MLCode).

Implementation details [R4]. To reflect the most challenging, yet realistic, setting of “transfer across medical centres, cardiac chambers, and MR sequences”, we chose to meta-train on the Cardiac M&M dataset and meta-test on the Atrial Segmentation dataset. We had also conducted internal experiments on the relatively less challenging setting of transfer across only medical centres (using the four sites in the Cardiac M&M dataset), obtaining results consistent with those shown in the manuscript. [R2] We will also replace Fig. 3 in the manuscript with Table 3 in Appendix A to facilitate reproducibility.

Reproducibility [R4]. We have shared an anonymized version of our code, raw results, and network parameters here: https://tinyurl.com/S4MLCode.

Clarity [R2, R3, R4]. We have improved the description of the methods, our use of mathematical symbols, and Fig. 1, to improve clarity. We will also refer to SSML + SML as “Vanilla S4ML” and SSML -> SML as “Sequential S4ML” to avoid confusion.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The work was aimed to tackle a challenging task, the weak supervised segmentation of LA MRI images, via the framework of meta-learning. The method seems sound, and the results are OK. In the rebutall, the authors have answered the concerns raised by the reviewers well, such as the the lack of statistic test, the motivation of the problem setting and the introduction of self-supervised learning into their framework.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8

Meta-review #2

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper addresses some key challenges in the MICCAI community. The reviews have highlighted the need to enhance the motivation for problem setting and some statistical tests for the results. I found the authors rebuttal was reasonably responding to the concerns raised by the reviewers, and hence I recommend acceptance for this paper.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Meta-review #3

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

While the application scenario of this work is interesting I align with reviewers concerns related to the proposed methodology and bring an important question on the empirical validation. The answer from the authors concerning the contribution (R4) is not convincing. While the exploration of unlabeled data under a distributional shift has not been explored, empirical validation does not demonstrate that current semi-supervised techniques fail under this scenario. Furthermore, it is not true that current methods exploit b) only unlabeled or c) labeled data, as there exist many semi-supervised segmentation methods that leverage both, sometimes resorting to self-training. On the technical side, this work basically borrows existing methods from existing literature, some of them questionable (for example reptile is not the best meta-learning strategy, especially under domain shift), which makes the methodological contribution marginal.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10