Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Quande Liu, Hongzheng Yang, Qi Dou, Pheng-Ann Heng

Abstract

Federated learning (FL) has emerged with increasing popularity to collaborate distributed medical institutions for training deep networks. However, despite existing FL algorithms only allow the supervised training setting, most hospitals in realistic usually cannot afford the intricate data labeling due to absence of budget or expertise. This paper studies a new yet practical FL problem, named Federated Semi-supervised Learning (FSSL), which aims to learn a federated model by jointly utilizing the data from both labeled and unlabeled clients (i.e., hospitals). We present a novel approach for this problem, which improves over traditional consistency regularization mechanism with a new inter-client relation matching scheme. The proposed learning scheme explicitly connects the learning across labeled and unlabeled clients by aligning their extracted disease relationships, thereby mitigating the deficiency of task knowledge at unlabeled clients and promoting discriminative information from unlabeled samples. We validate our method on two large-scale medical image classification datasets. The effectiveness of our method has been demonstrated with the clear improvements over state-of-the-arts as well as the thorough ablation analysis on both tasks.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87199-4_31

SharedIt: https://rdcu.be/cyl4e

Link to the code repository

https://github.com/liuquande/FedIRM

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper explores a novel method to train a model using a semi-supervised mechanism, which could tremendously reduce the annotation cost for clinical sites participating in a federation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The method is a novel mechanism to leverage unlabelled datasets present in the clinical sites, which can be expensive and time-consuming to annotate with acceptable accuracy, to substantially increase the accuracy of a trained model.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are mostly correctly English issues. Grammar changes:

    • Abstract:
      • “…most hospitals in realistic…” > “…realistically, most hospitals…”
      • “state-of-the-arts” > “state-of-the-art methods”
    • Introduction:
      • “…which allows to learn…” > “…which allows a model to learn…”
      • “widely-existing” > “common”
      • “…which halts in an…” > “…which has an…”
      • “…the relationships exist naturally among…” > “…natural relationships exist among…”
      • “high-related” > “highly related”

    Furthermore, it seems that authors have skipped some related literature in the domain:

    • https://doi.org/10.1038/s41746-021-00431-6
    • https://doi.org/10.1038/s41598-020-69250-1

    The authors should define “FedIRM” prior to usage

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method and datasets being used are described in detail, but without the source code reproducibility is a concern in this domain.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    It would be highly recommended to also share the associated code.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    If the paper did not have grammatical inconsistencies and was also providing the source code, I would have selected a “strong accept”.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    The paper proposes an approach for semi-supervised federated learning to take advantage of unlabeled data on some clients.

    It computes average probability distribution per class per class from clients with labeled data, and use to regularize the prediction on unlabeled clients (using symmetric KL divergence).

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Evaluation was performed on real data from 2 popular datasets ( ICH Detection dataset, and ISIC 2018). Comparison were done to recent approaches. The results achieved claim superior performance.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Lack of novelty, analysis, and careful evaluation The method first estimates the average probability per class, from a minibatch of labeled data. It seems that such estimate will be different at each iteration (depending on the minimatch), and dependent on the minimatch size (analysis would be nice). Furthermore, if the network is perfectly trained the “Relation matrix M” becomes almost identity (with 1 indicating the correct class and zeros otherwise). In this case what is the justification of estimating M, instead of just using the identity M directly?

    If the M is identity than this semi-supervised approach becomes very similar to a conventional pseudo-labeling approach with cross-entropy loss. It would be nice to see the performance of the method if M is identity.

    Finally the main novelty (that I can see) is to estimate M on the fly from labeled mini-batch (instead of using Identity), which is by itself is a small contribution. Please correct me if my understanding is wrong.

    – The use of real data is appreciated, but it’s difficult to judge the performance, since a custom data splits were used. Perhaps consider adding a toy example on CIFAR/MNIST data (which many other FL methods use)

    – Finally, the absolute performance improvements seems very minor 2-3% compared to a simple FedAvg algorithm (on 2 clients). From Table1,2 it look like adding 8 additional clients (with more unlabeled data) for your method, only improves by a couple percent.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper itself is clear and the main method can be reproduced. Data splits and set up and not very clear. The parameters used for other (comparisons) approaches are also not very clear.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    see above

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    lack of novelty

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    This paper proposes a semi-supervised federated learning for image classification. There are labeled clients and unlabeled clients in the scheme. The labeled clients are trained in the standard way. The unlabeled clients are trained with a loss that enforces prediction consistency among different perturbations for each sample. To further facilitate the learning at the unlabeled clients, a novel disease relation modeling loss based on KL divergence is proposed. In each federated learning round, the mean feature vector (MFV) of each class is computed from the labeled clients and is shared to the unlabeled clients. The KL divergence is computed between the MFV computed locally at the unlabeled client and the MFV by the labeled clients. The quality of the MFV at the unlabeled client is computed on high quality samples selected by an uncertainty based sample selection procedure. The experiments are conducted on two public dataset. The results seems promising.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This may be the first semi-supervised federated learning for medical image classification.
    2. The presentation and organization of the paper is clear. Necessary details are included.
    3. In the reviewer’s opinion, semi-supervised learning is an essential part for federated learning. The proposed method enables the unlabeled clients who do not have any labeled data to participate in the model training. This could push the federated learning to another level of scalability.
    4. Technically, enforcing the consistency via mean feature vector from the labeled clients add robustness to the original semi-supervised learning framework. Because it takes into account the relations among different unlabeled and labeled samples. While the standard semi-supervised learning based on the consistency against perturbation does not consider such connection.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The uncertainty based training sample selection at the unlabeled client could only choose the easy samples for training. Could this result in an undertrained model that cannot handle hard cases? Is the uncertainty threshold h fixed throughout the training or it is adjusted? How is this threshold selected?
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Although the details of the training is presented in the paper, it may be still difficult to reproduce because the author did not confirm they will release the code or not. The datasets used in the paper are public datasets.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. The uncertainty based training sample selection at the unlabeled client could only choose the easy samples for training. Could this result in an undertrained model that cannot handle hard cases? Is the uncertainty threshold h fixed throughout the training or it is adjusted? How is this threshold selected?
  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. This may be the first semi-supervised federated learning for medical image classification.
    2. The presentation and organization of the paper is clear. Necessary details are included.
    3. In the reviewer’s opinion, semi-supervised learning is an essential part for federated learning. The proposed method enables the unlabeled clients who do not have any labeled data to participate in the model training. This could push the federated learning to another level of scalability.
    4. Technically, enforcing the consistency via mean feature vector from the labeled clients add robustness to the original semi-supervised learning framework. Because it takes into account the relations among different unlabeled and labeled samples. While the standard semi-supervised learning based on the consistency against perturbation does not consider such connection.
  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Overall the reviews are quite positive and they all agree that a semi-supervised FL approach for the medical domain would have a major impact, bringing together annotated data and unannotated data set. In particular, the strength of this paper is not only the importance of the addressed problem for medical federated learning, but also the reported performance and the overall clarity of the paper. However, this paper is not (!) the first one suggesting semi-supervised learning for medical FL and the reviews differ in terms of evaluation of the novelty as well as improvement over existing methods. I would like the authors for a rebuttal to address these concerns. In particular the relation of proposed approach to pseudo-labeling with cross-entropy loss.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5




Author Feedback

We thank the AC and reviewers for their valuable time. We are delighted to see that R1 and R4 are highly supportive on our paper, highlighting the “essential problem we studied”, the “novelty of our method” which can “push federated learning to another level of scalability”, with the “promising performance” on “two popular medical image classification tasks”.

R2 also highlights our superiority over recent methods on two tasks, with remaining concern on the difference of our method with pseudo labeling and several minor questions regarding performance improvements or evaluation. This rebuttal carefully addresses these comments point by point and demonstrates with experiments. Code will be released.

[To reviewer #2]

  1. Our method v.s. pseudo labeling (identity matrix). i) Pseudo labeling on unlabeled data usually requires the close assistance of knowledge from labeled data to ensure the correctness of generated pseudo labels. However, such assistance is lost in FSSL where the local dataset could be completely unlabeled. This will largely decrease the quality of pseudo labels in FSSL hence cannot effectively exploit the unlabeled samples. ii) In contrast, our method differs in its core idea to explicitly utilize the knowledge from labeled clients to assist the learning at unlabeled clients. During the training process before model convergence (perfectly trained), the relation matrix from labeled clients is in fact not identity (c.f. Fig.2(a)), and reflects the crucial relation information among classes. Learning from this dynamically estimated matrix can effectively mitigate the loss of task knowledge at unlabeled clients with the information from labeled clients, thus facilitating the learning of unlabeled samples under the data decentralization constraint. iii) We also experimented on ICH task using identity matrix, observing that our method (87.56%) outperformed it (85.24%) by 2.32% AUC.

  2. Clarification on novelty. The main challenge in FSSL lies in the isolation of labeled and unlabeled clients, which leads to the lack of task knowledge at unlabeled clients thus increasing the difficulty to exploit unlabeled samples. Our method explicitly addresses this by utilizing the knowledge of labeled clients to assist the learning at unlabeled clients through aligning their class relationships. This idea has never been explored, yet addresses a key problem in FSSL unsolved previously (novelty endorsed by R1 and R4).

  3. Improvements over FedAvg. i) We kindly clarify that our improvements, i.e., 4.16% and 1.81% AUC on the two tasks (2.98% on average) are comparable with literature, e.g., 2.85% in [30] with 80% unlabeled data, and 3.28% in [5] with 86% (more) unlabeled data. ii) We also performed paired t-test for significance analysis. The p-values of 0.0022 and 0.0073 (<0.05) demonstrated our significant improvements over FedAvg on the two tasks.

  4. Analysis on batch size. The mean feature vector at labeled and unlabeled clients is estimated over the whole local dataset and each minibatch respectively. We experimented on the effect of batch size, observing that our method is insensitive to its changes. For instance, increasing it from 12 to 48 with a step of 12 just incurs a maximum variation of 0.4% AUC. We consider that aligning the relation matrix across clients is crucial, yet the effect of batch size on this process is less critical as imagined.

  5. MINIST experiments. We experimented on MINIST, observing that our method (99.21%) showed clear improvements in accuracy over FedAvg (97.34%) under 80% unlabeled clients, and is comparable to the supervised upperbound (99.68%).

[To AC] We kindly clarify that we intended to express this work is one of the pioneer works studying FSSL for medical image analysis. We agree with AC that our work is not the first one suggesting this, and in fact also discussed the related study [29] in our submission. We shall remove the related claim which may cause misunderstandings from our paper.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a novel approach for semi-supervised FL for the medical domain. Overall the reviews are very positive and the authors addressed the minor issues that were raised by the reviewers. In particular, they clarified the relation to pseudo labeling in detail and will tune down the claim regarding novelty. Considering the original submission, the reviews and the rebuttal, overall a quite strong paper!

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    1



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Reviewers are positive over the design of the proposed method, major issues raised include language and writing, novelty, and method details. Rebuttal provides clarifications to most questions raised by R2. In my opinion, although this may not be “the first semi-supervised federated learning for medical image classification”, this work still provides a viable solution for clients with different levels of annotation, and thus it makes decent contribution. The performance gain, although not quite large, is still ok for such a task, especially considering the unlabeled portion. I would say with this rebuttal, authors successfully addressed some important questions, and thus I suggest acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    1



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The main contribution of the study, semi-supervise learning in federated learning for training on labeled and unlabelled clients, was overall judged positively by the reviewers. The experimental results were also found satisfactory by most reviewers for the proposed application. In this sense, the criticism towards the experimental validation concerned the lack of thorough evaluation: current results are based on custom dataset splits to define clients data, and the improvement with respect to FedAvg was found marginal for the proposed application with very few clients. The main recommendation of the meta-reviewer consisted in better positioning the study with respect to the alternative SSL approaches based on pseudo-labelling with cross-entropy loss. The rebuttal emphasises the following points:

    • standard pseudo-labeling may not work in the federate applications, whan entire local datasets could be unlabelled. The proposed method explicitly uses the knowledge from the labeled clients.
    • The relation matrix is not necessarily identity, especially during the initial iterations, and therefore its estimation is justified (experimental results are provided).
    • The improvement over FedAvg is statistically significant for both tasks.
    • The framework is robust to batch size
    • An additional experiment on MNIST still shows the improvement over FedAVg (80% unlabelled clients).

    As AC, overall I agree on the novelty of the framework and the relevance of the problem here addresses. It is still true that the study should provide a better positioning with respect to the state of the art. If the pseudo-labeling is not effective in the proposed application, it seems necessary to demonstrate this point by including this benchmark in the validation. Moreover, the positioning with respect to the state-of-the-art should be rediscussed in the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    10



back to top