Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xianjing Liu, Bo Li, Esther Bron, Wiro Niessen, Eppo Wolvius, Gennady Roshchupkin

Abstract

Confounding bias is a crucial problem when applying machine learning to practice, especially in clinical practice. We consider the problem of learning representations independent to multiple biases. In literature, this is mostly solved by purging the bias information from learned representations. We however expect this strategy to harm the diversity of information in the representation, and thus limiting its prospective usage (e.g., interpretation). Therefore, we propose to mitigate the bias while keeping almost all information in the latent representations, which enables us to observe and interpret them as well. To achieve this, we project latent features onto a learned vector direction, and enforce the independence between biases and projected features rather than all learned features. To interpret the mapping between projected features and input data, we propose projection-wise disentangling: a sampling and reconstruction along the learned vector direction. The proposed method was evaluated on the analysis of 3D facial shape and patient characteristics (N=5011). Experiments showed that this conceptually simple method achieved state-of-the-art fair prediction performance and interpretability, showing its great potential for clinical applications.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_78

SharedIt: https://rdcu.be/cyl6V

Link to the code repository

https://github.com/tsingmessage/projection_wise_disentangling_FRL

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    A representation learning method that keeps the information relevant to the task, while forgetting confounding biases.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    nothing to highlight.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The proposed method seems to be a specific case of adversarial training, where the bias predicted from the representation is given by a very simple ‘classifier’, correlation based. In this perspective, I don’t see much novelty in the work. The metrics chosen in the experimental evaluation, correlation based, favour artificially the proposed method.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Likely reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The proposed method seems to be a specific case of adversarial training, where the bias predicted from the representation is given by a very simple ‘classifier’, correlation based. In this perspective, I don’t see much novelty in the work. The metrics chosen in the experimental evaluation, correlation based, favour artificially the proposed method.

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method seems to be a specific case of adversarial training, where the bias predicted from the representation is given by a very simple ‘classifier’, correlation based. In this perspective, I don’t see much novelty in the work. The metrics chosen in the experimental evaluation, correlation based, favour artificially the proposed method.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    4

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    The paper proposes a representation learning framework that determines a direction to capture maximum task-relevant information and minimum bias-invariant information. This is achieved by optimizing the correlations with respect to the target and bias variables. The method is evaluated on a large number of facial shape data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The correlation-based losses are effective.
    2. The applicability to multiple bias variables is a plus
    3. The two datasets are large (thousands of training data)
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. Accuracy of the second experiment is low (AUC=0.587)
    2. Some experimental details are missing.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility of the baseline results solely from the paper is unlikely because details of the baseline methods are not described in detail.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. Some experimental details are missing. How to measure the correlation metrics for baseline methods? Those methods do not estimate projections, so what is the correlation measured with respect to? Table 1 “X refers to the data characteristic” is really confusing.

    2. I want to challenge the motivation that existing methods that optimize global-constraint MI(Z,s)->0 would reduce the diversity of information in Z. This claim seems not stringent because only information related to s is removed from Z (all other diversity can be preserved).

    3. Can the authors provide citations for the associate between facial shape and height (these two seem to be irrelevant)? The clinical motivation of predicting height from facial shape is not clear.

    4. Shouldn’t it be BR-Net instead of BP-Net in [7]?

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The major concern is the low AUC in the second experiment, but other than that I found no major flaw in the technical proposal.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    The paper presents a methodology to reduce cofounding biases when predicting a given outcome. The paper uses a projection-wise disentanglement strategy over an autoencoder. The approach is tested over two settings, predicting an attribute (BMI, gender, etc) and over a clinical setting, specific to maternal alcohol consumption during pregnancy prediction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The proposed method is technically sound and shows higher accuracy than other methods while controlling the correlation values of the denominated biases.
    • The methodology of the approach is correctly described and easy to follow.
    • The interpretation of the results is clearly explained.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The motivation of the work is not clearly stated. The general idea of reducing cofounding biases in outcome prediction is clear, but the impact of this solution in facial phenotypic data and population studies is not clear. What are common fairness issues in that type of studies? How more fair and interpretable representations help the current state of these studies?
    • The experimental setup needs to be better detailed. Implementation details contain hyperparameter definitions, but most of the subsection is dedicated to other state-of-the-art models.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I think the reproducibility of the paper can be improved with a better structured, more detailed experimental setup. Please see my comments under the comment section.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Major comments:

    • Fairness is a contextualized term [0]; the definition depends on the context/application of what is fair in a given setting. I think clearly state the definition of an unfair or/and fair representation for this paper will improve the problem statement. See [1] for examples regarding fairness in disentanglement.
    • It is not clear to me what are the unfair outcomes for the two proposed experiments?. Is there an overestimation of the models for a given BMI range or gender?
    • In the evaluation section. Apart from seeing a decrease in the correlation factor while having high accuracy. I recommend in the future to add other well-known fairness metrics to assess the performance of the method on improving the fairness of the outcome predicted; see [0, 1] for examples.
    • Subsection 3.1 Datasets and Task needs to be better structured. Please list here are current unfair outcomes in each setup. Is there underestimation or overestimation for a given ethnicity or gender? I think a Table with the dataset descriptions and experiment assumptions will help to understand the setup better.
    • A related work section detailing the current state of solutions (take content from section Implementation details 3.2, as these are not details of the implementation, rather comparison methods) will improve the paper and differentiate the novelty better of the proposed approach.

    Minor comments:

    • Abstract: “to solve the bias problem,” I will reword to mitigate problems related to bias. It is not something that we can completely solve. Instead, we can mitigate biases.
    • Subsection 4.1 and 4.2 could have more significant titles, “Phenotype prediction for gender, height, and BMI” and “Prediction on maternal alcohol consumption during pregnancy.”

    References [0] Verma, S. and Rubin, J., 2018, May. Fairness definitions explained. In 2018 IEEE/ACM international workshop on software fairness (fairware) (pp. 1-7). IEEE. [1] Locatello, F., Abbati, G., Rainforth, T., Bauer, S., Schölkopf, B. and Bachem, O., 2019. On the fairness of disentangled representations. NeurIPS 2019.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is technically sound and shows higher accuracy than other methods while controlling the correlation values of the denominated biases. Nonetheless, the manuscript is not clear on fairness issues and the connection to the particular use-case in 3D facial shape analysis. There is some clarity regarding the application in the results section and interpretation. But that should be stated in the Introduction and should be discussed in the final section.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    3

  • Reviewer confidence

    Somewhat confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper received mixed reviews and therefore the authors are invited to submit a rebuttal. Especially, several aspects of the paper were challenged by the reviewers. Experimental details are missing. The motivation of the work is not well presented. The association between facial shape and height and its clinical relevance is completely unclear. A proper definition of fairness in the context of this paper is not clear. The dataset is not properly explained.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    8




Author Feedback

We thank the reviewers for their review, and the positive comments about our paper. They appreciate that the proposed framework 1) is technically sound, 2) the proposed loss is effective and its applicability to multiple biases situation is a plus comparing with existing works, 3) has higher accuracy in two clinical applications on a large dataset while mitigating the effect of biases, and 4) is described clearly. In this rebuttal, we provide a point-wise response to comments of all reviewers.

Novelty R#1 Q4: the proposed method is a specific type of adversarial training with bias predicted from representations and thus lacks novelty. We respectfully disagree because our method 1) does not predict bias (Eq.3), and 2) does not involve adversarial training process at all. Also, we would like to highlight that our work is highly novel by proposing correlation loss for vector searching in latent space for the first time, and by enabling feature interpretation for fair prediction.

Motivation of application R#4 Q4: motivation of the application was not clear. For epidemiological studies the bias (confounding) problem is crucial, because it can create association that is not true, or association that is true, but misleading, thus leading to wrong diagnosis or therapy strategy. The bias problem becomes significant for AI since it used more and more for medical data analysis. With the proposed framework, we aim to mitigate the bias effects in prediction using auto-encoder. While we applied our framework to 3D facial images due to our clinical interest, the approach is quite general and can be used for other type of data. We add these into the Introduction to strengthen the motivation. R#2 Q7: as the reviewer suggested, we include a reference on the association between height and facial shape and its clinical relevance [r0].

Definition of fairness R#4 Q4&Q7 questioned the definition of fairness, and what unfair outcomes are in the context of this paper. In general, (Introduction) “a fair representation means it contains no information of sensitive attributes (i.e., bias, 𝑠)”. In our application: a fair prediction means the prediction ^t is unbiased by s; fairness in disentanglement is to disentangle facial features related to the target but not confounded by bias. The unfair results can be found in the baseline ‘VAE-reg’ (Fig. 2-3, Table 1-2). In the revised version, we replaced |Corr| by Corr, which also tells a positive or negative relation (underestimation or overestimation in R#4’s words).

Results and implementations R#2 Q4: “the low accuracy (AUC=0.587) in experiment 2”. The number is not impressive because effects of maternal alcohol consumption are small. However, it is already a large improvement over existing methods (Table 2).

R#2 Q4&7: the evaluation metric of the baselines was less clear. The metric is Pearson correlation coefficient Corr(^t,s) (section 3.3), which measures the dependency between the prediction ^t and bias s. The metric was applied to all the methods. R#1 Q4: “the metrics favour artificially our method”. Both baselines and our method aim to guarantee the prediction ^t unbiased by s. Technically, baselines aim to minimize the dependency between Z and s to be 0. Since ^t is derived from Z, ^t should also have no dependency with s. Thus baselines also encourage the linear dependence Corr(^t, s) to be 0 (Fig. A1), which does not favour artificially our method.

R#2 & R#4: suggestions on more details of the implementation and dataset.
The implementation configurations were stated in 3.2. In the revised version we include baseline details in the supplementary file and release the code of all methods. To better explain the dataset, we cited the profiles of the cohort study and added a table in supplementary file to further describe the data characteristic.

[r0] Tobias M. et al. Cross-ethnic assessment of body weight and height on the basis of faces, Personality and Individual Differences. 2013




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal has addressed most of the concerns raised to some extent. The authors are encouraged to address the comments in the final paper, if the paper is ultimately accepted.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    10



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    R1 had two concerns about the work, which the authors seem to have rebutted (although I would need to read and digest the paper to be absolutely sure). The other reviewers tended towards acceptance, although they still had some concerns relating to how fairness is defined and when it is appropriate to try to address issues of fairness, to which the authors provided their answers. Although confounds were mentioned, nothing was said about selecting appropriate confounds according to causality models (see e.g. Simpson’s paradox).

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    6



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    One main concerns is that the AUC is low (but > 50%). Another concern was the novelty of this method. However the application is novel.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



back to top