Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Nanqing Dong, Irina Voiculescu

Abstract

A label-efficient paradigm in computer vision is based on self-supervised contrastive pre-training on unlabeled data followed by fine-tuning with a small number of labels. Making practical use of a federated computing environment in the clinical domain and learning on medical images poses specific challenges. In this work, we propose FedMoCo, a robust federated contrastive learning (FCL) framework, which makes efficient use of decentralized unlabeled medical data. FedMoCo has two novel modules: metadata transfer, an inter-node statistical data augmentation module, and self-adaptive aggregation, an aggregation module based on representational similarity analysis. To the best of our knowledge, this is the first FCL work on medical images. Our experiments show that FedMoCo can consistently outperform FedAvg, a seminal federated learning framework, in extracting meaningful representations for downstream tasks. We further show that FedMoCo can substantially reduce the amount of labeled data required in a downstream task, such as COVID-19 detection, to achieve a reasonable performance.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87199-4_36

SharedIt: https://rdcu.be/cyl4j

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors transfer the MoCo contrastive learning approach to a federated learning setting.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- first application of MoCo in federated setting
- novel way of aggregating models after federated training round. The key novelty is the way to calculate importance of each update. Tailored to contrastive learning.
- statistical augmentation to account for nodes with different distributions in the data
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- missing explanations and references (e.g. oracle)
- over-complication of simple aspects (e.g. fedavg)
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

datasets, models methods and hyperparameters are clearly stated → good reproducibility
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- less overcomplication in the description in the paper
- back up statements (e.g. why use relu when it cuts off all negative values whereas sigmoid keeps it >0)
- standard deviation of results overlap, yet authors claim that methods outperform fed avg → evaluation of statistical significance should be done?
- federated learning is not a protocol but a collection of techniques and also not sufficiently privacy preserving it only gives control about governance and ownership (see Kaissis et al. 2020)
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The overall approach is interesting for the miccai community. There is some novelty to the proposed method, but often this is marginal (statistical augmentation + aggregation). The description of methods lacks clarity and justifications.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper proposes a novel model aggregation technique based on contrastive learning, which the authors show perform better compared to federated averaging.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The method could adequately address some of the inherent problems in the federated averaging approach, which are sensitivity to domain shift between different collaborators and the time required for model convergence.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The authors describe a “warm-up” period, which in my opinion is not required if a good initial model is used. Additionally, they are proposing multiple epochs of training, which can take the trained model’s gradients at local sites farther apart and bringing them to convergence might be tougher as the dataset and the federation grows. This is a major concern in a real-world scenario and appropriate experimentation is needed to address this in this paper.

The authors seem to have also missed some very related literature:
- https://doi.org/10.1038/s41598-020-69250-1
- https://doi.org/10.1038/s41746-021-00431-6
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The method and datasets being used are described in detail, but without the source code reproducibility is a concern.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Although I agree with the authors that federated averaging takes much longer for convergence, in practical terms, if appropriate data harmonization is done well across the participating nodes and a model trained on publicly available data (for the problem at hand) is used for initialization, model convergence is fairly quick.

I urge the authors to perform the appropriate experiments to address the main weakness of the paper I mentioned above.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Additional experimentation is needed, especially by using a strong initial model as the common starting point for all participants.
What is the ranking of this paper in your review stack?

5
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

This paper proposed a novel FL scheme using contrastive learning, meta-data transferring, and self-adaptive aggregation to utilize the unlabeled distributed data and address the non-iid problem. The method is validated on CXR from different sources.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Strengths: 1) How to utilize unlabeled data and address the non-iid distributed data are two important problems in FL. 2) The authors added a few new elements to FL and performed ablation studies in the experiment. 3) The writing is overall clear, and the methods are easy to follow.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Main Weakness: 1) A few wrong statements: 1-1) “Second, FL assumes non-IID data distribution” This is not an assumption for FL by default. It can be a character or challenge in FL. 1-2) “Note, FedAvg is robust against domain shift cross data nodes.” This is incorrect. Please see [1-3]. 2) Method: For the self-adaptive aggregation part, Eq 7 assigns larger weights to local models with a smaller r_k (representational similarity). A smaller r_k indicates a larger “difference between the ranks of i-th elements of the lower triangular of RDM for f{\theta^{t-1}} and RDM for f{\theta^t_k}.” Then the authors stated that “smaller rk indicates there is a bigger update in the representations, i.e., node k has extracted more meaningful representations.” I suspect the above statement that small rk means meaningful representations. If I understand it correctly, this seems opposite to the optimization philosophy in FL for the convergence purpose, where people try to bound the client shift (differences between the local and global weights or gradients) in each local updating round [1-3]. 3) Experiment: 3-1) The authors addressed the important non-iid problem but only compared FedMoCo with FedAvg. Comparing with the FL methods specifically for non-iid data (e.g. [1-2]) will make the comparison more powerful. 3-2) The ablation study results in Table 3-4 show the comparable performance (seems not significantly different) of the three variations of FedMoCo. I could not find the discussion about whether metadata transfer and self-adaptive aggregation are really helpful in the experiments.
References: [1] Li, Tian, et al. “Federated optimization in heterogeneous networks.” MLSys (2020). [2] Karimireddy, Sai Praneeth, et al. “SCAFFOLD: Stochastic controlled averaging for federated learning.” ICML (2020). [3] Li, Xiang, et al. “On the convergence of fedavg on non-iid data.” ICLR (2020).
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Most of the implementation details, such as platform, network architectures, learning parameters are listed.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

In addition the comments in “Main Weakness”, I have the follow minor comments.

1) “In contrast to previous works [26, 7, 3], we add a ReLU function between the last fully-connected layer and a L2-normalization layer, which will project all extracted features into a non-negative feature space.” What is the motivation of your modification?

2) There are many hyperparameters (i.e, \eta) are introduced but not well-discussed. I will be helpful if the authors could provide some justifications.

3) Typos: “there are a CNN encoder”;”and \theta_k^0 is nominated by \theta_k^t” –> do you mean “\theta_0^t is dominated by”, otherwise I hope the authors could explain more the statement.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall this paper provides an interesting idea and addresses the important problems in FL, which is also realistic in medical image analysis. However, I have concerns regarding the methods, experiments, and some statements in the submission. Please see the comments in “Main Weakness” and “Detailed Comments.”
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

4
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
The reviewers agree that the combination of contrastive learning in a FL setting is interesting and that the suggested method may have some novelty, but there are major concerns. The key strengths of the submission are that it suggests a method for a relevant topic and proposes an aggregation method for FL based on the importance. However, there are major concerns regarding the description of the method as well as the conducted experiments. I would like to invite the authors to address in particular the issues raised related to:
- Statistical Significance of the improvements
- Experiments with stronger initial model and comparison to FL that can tackle non-iid data
- Clarification of several statements and descriptions
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Author Feedback

We would like to thank the meta-reviewer for the invitation and all reviewers for the constructive comments. We wish to clarify that the main contribution is not intended to propose a new FL method; rather, we propose federated contrastive learning on medical data. We are not aware of any similar studies in the literature.

(1) Statistical significance of the improvements: We perform paired comparisons between the proposed method and the baseline methods initialized with the same random seed. Both the original and new results suggest that, given the same initial model, the proposed method outperforms the baseline methods consistently. We run a paired sample t-test with the upper-tailed alternative hypothesis and the p-values are less than 0.01. We will clarify these improvements in the final version.

(2) Experiments with stronger initial model: We agree with R2 that a strong initial model could accelerate the convergence. However, we wish to emphasize our aim to show that the proposed method can work without any pre-trained weights, in an unsupervised fashion [3,4,5,7]. This is one of our contributions. As R2 suggests, we have re-trained the model using weights pre-trained on ImageNet (via PyTorch). There is no performance gain in LCP. We think this happens because of the domain shift between ImageNet and CXRs, and ImageNet pre-trained weights make the model converge faster at the cost of a slight drop (2-3%) in performance. Since we are working on unlabeled data, the benefit of using initial models pre-trained on irrelevant datasets (e.g. ImageNet vs medical data) for federated contrastive learning remains an open question. We will include this discussion in the final version.

(3) Comparison with FL that can tackle non-iid data: We wish to clarify that the optimization goals of unsupervised contrastive learning and supervised learning are not aligned, especially under FL. The papers suggested by R3 for FL on non-iid data are mainly designed and validated for supervised methods. Yet none of these papers address contrastive learning specifically. So they are doing a similar job as FedAvg but do not directly improve the contrastive learning performance. By contrast, our method is designed to improve federated contrastive learning on medical data, which happens to be non-iid. To ease your concern, we have compared FedMoCo with FedProx (Federated optimization in heterogeneous networks) in one of our scenario (Table 3). FedProx (95.17) got similar results with FedAvg (95.25) but lower results on average than FedMoCo (96.02). Moreover, for downstream task in Table 5, we have FedProx (86.98), FedAvg (87.24), and FedMoCo (91.56). We guess the proximal term of FedProx does not help in contrastive learning. Further discussion of supervised FL can fit in the final version. Also, we wish to highlight that FedAvg does not make any assumptions in the learning paradigms, which is an ideal robust baseline.

(4) Clarification of several statements and descriptions: First, for R3’s question “2) Method”, we aim to increase the learning difficulty of contrastive learning to learn better representations [4,7,15,23]. Second, we use ReLU because Box-Cox power transformation (Eq.3) requires non-negative values. Third, Oracle is MoCo trained on a single node. Such minor issues will be addressed easily in the final version.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposes contrastive FL for the medical domain and the overall feedback is quite positive. In the rebuttal, the authors addressed the raised concerns convincingly. In particular, they performed a significance test, an additional experiment (as requested by R2) with a pretrained model and will clarify the statements.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper proposes to apply contrastive learning with FL scheme on medical images. It effectively utilizes the unlabeled data and has achieved good performance. The idea is somewhat interesting but the implementation (data augmentation + module aggregation) seems relatively incremental. The authors have roughly addressed the comments in the rebuttal.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

11

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The main criticisms of Reviewers and Meta-reviewers mainly concern the apparently marginal improvement obtained with the proposed framework, as well as the lack of a comparison with respect to FL paradigms more robust to the non-IID scenario. There is also a general lack of clarity and some wrong statements made in the manuscript.

The rebuttal reports the statistical quantification of the improvements, which appears to be significant at the level p<0.01 (upper-tailed alternative hypothesis, no correction for multiple comparison). No improvement has been reported when adopting pre-trained models at the client side, while FedProx was tested to include a benchmark robust to non-IID clients (Table 4 although Table 3 is mentioned in the rebuttal). Overall, results seem indicate a positive contribution of the proposed approach.

Although the experimental improvements do not appear striking, the reported results seem encouraging and address a relevant problem in the application of LF to medical imaging problems. The methodological contribution was also found interesting and sufficiently motivated. The rebuttal properly addressed some of the major remarks. On a side note, the authors are encouraged to fix and clarify several of the statements made throughout the paper (e.g. privacy preserving properties of FL, FedAvg and non-IID setting).
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

9

back to top

Federated Contrastive Learning for Decentralized Unlabeled Medical Images