Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Benoit Dufumier, Pietro Gori, Julie Victor, Antoine Grigis, Michele Wessa, Paolo Brambilla, Pauline Favre, Mircea Polosan, Colm McDonald, Camille Marie Piguet, Mary Phillips, Lisa Eyler, Edouard Duchesnay

# Abstract

Traditional supervised learning with deep neural networks requires a tremendous amount of labelled data to converge to a good solution. For 3D medical images, it is often impractical to build a large homogeneous annotated dataset for a specific pathology. Self-supervised methods offer a new way to learn a representation of the images in an unsupervised manner with a neural network. In particular, contrastive learning has shown great promises by (almost) matching the performance of fully-supervised CNN on vision tasks. Nonetheless, this method does not take advantage of available meta-data, such as participant’s age, viewed as prior knowledge. Here, we propose to leverage continuous \textit{proxy} metadata, in the contrastive learning framework, by introducing a new loss called $y$-Aware InfoNCE loss. Specifically, we improve the positive sampling during pre-training by adding more positive examples with similar \textit{proxy} meta-data with the anchor, assuming they share similar discriminative semantic features. With our method, a 3D CNN model pre-trained on $10^4$ multi-site healthy brain MRI scans can extract relevant features for three classification tasks: schizophrenia, bipolar diagnosis and Alzheimer’s detection. When fine-tuned, it also outperforms 3D CNN trained from scratch on these tasks, as well as state-of-the-art self-supervised methods. Our code is made publicly available \href{https://github.com/Duplums/yAwareContrastiveLearning}{here}

# Link to paper

SharedIt: https://rdcu.be/cyl1w

N/A

# Reviews

### Review #1

• Please describe the contribution of the paper

The paper proposes a novel approach to contrastive learning for 3D MRI classification that takes metadata into account to improve positive sampling. Samples that are similar according to the metadata will be embedded closer in the embedding space, providing a means of controlling the embedding. Experiments illustrate that a model pre-trained on a large set of 3D T1 MRI brain scans, achieves promising performance on three binary classification benchmark datasets when given age as meta-data.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The way of including metadata into the contrastive loss is novel and interesting and suits the presented application setting well. The paper is well written/organized, the problem well motivated, and achieves considerable improvements over the presented baselines. Extensive experiments are conducted to validate the benefit.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The method section should be expanded slightly. It is unclear, if minimising the proposed loss in (3) is still maximising a lower bound on the mutual information (conditioned on y).

A baseline that combines the “standard” InfoNCE loss SimCLR with an additional l1 loss for the age prediction (on-top of the representation), to see if a simple multi-task learning approach could achieve as useful or more useful representations. This is especially relevant since the age prediction with data augmentation performs comparable to the proposed method.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

While code is not provided, the authors state that code will be provided upon acceptance. Overall, relevant implementation details are provided in sufficient detail in the manuscript.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

In addition to the comments in 4), I was wondering why all experiments in Table 1 are performed on 20% for the majority of datasets, while for the AD vs HC case, you consider 30%. Also, several of the references that you cite as arXiv papers have been published and the references should be updated accordingly. Note also, RBF stands for Radial basis function.

• Please state your overall opinion of the paper

borderline accept (6)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Despite the weaknesses outlined above, I believe that the use of metadata to control the embedding space is appealing and will be of interest for the MICCAI community, where contrastive learning approaches are applied extensively.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

4

• Reviewer confidence

Confident but not absolutely certain

### Review #2

• Please describe the contribution of the paper

The authors introduce a self-supervised leasrning proceadure that takes advantage of the meta-data available in the dataset. Through this procedure they increase the positive examples with similar proxy meta-data such as age with the anchor sample. The assumption is that these examples share similar semantic features. The authors propose a new loss, the y-Aware InfoNCE, which weights the infoNCE log term with a mean RBF value between the anchor’s meta-data (age) and positive and negative pairs considered in contrastive learning. Thereby the authors aim to solve the inherent issue in contrastive learning of same class images (but not exact same) being pushed apart as any other negative sample coming from different classes.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• The paper uses available meta-data in self-supervised learning. These meta-data are most of the time already available in medical imaging datasets. This paper proposes an interesting way to integrate prior knowledge in form of meta data.
• paper is easy to read
• good ablation studies
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

-The biggest weakness of the paper is lack of comparison with more contrastive learning (or non contrastive such as BYOL) methods. Over the past year many contrastive learning methods have been proposed and it seems the authors only make the comparison with SimCLR.

-

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It is sufficiently good.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
• Compare with more contrastive methods and BYOL like methods
• standard deviation and uncertainty experiments are conducted on cross validation, while seed averages might be of better help -Final results after fine-tuning with confidence intervals of +/- 3.7 points, makes the statistical validation questionable. In addition nothing can be concluded on BD and AD sickness of Contrastive Learning & age prediction against simple age prediction.
• Based on Table1, it seems that compared to SimCLR alone, age prediction alone is much more beneficial, and its not clear whether using simclr with age prediction makes an improvement to age prediction.
• I think a missing baseline is to see if ensemble of a model fine tuned on age prediction and simclr would perform worse than the proposed weighted contrastive learning.
• Please state your overall opinion of the paper

borderline accept (6)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Even though I think experiments and baselines are not complete, but I like the general idea and I think its novel.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Very confident

### Review #3

• Please describe the contribution of the paper

This work presents a novel contrastive learning method for 3D MRI classification tasks. The authors explore a way to incorporate available continuous meta-data (such as patient’s age) into the existing contrastive learning framework by introducing a new loss function, termed y-Aware InfoNCE loss. The effectiveness of the proposed method is evaluated on multiple public MRI datasets (such as BHB, SCHIZCONNECT-VIP, BIOBD, BSNIP, ADNI-GO). The reviewer appreciates the novelty of this work while having concerns about the experimental results.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. Leveraging meta-data for contrastive learning in medical image tasks is an essential yet not fully explored topic. This work proposes a novel y-Aware loss to incorporate this information into the learning process, which has scientific merits for this community;
2. The methodology is well presented and easy to follow;
3. Ablation study for the hyper-parameter sigma is helpful to understand the behavior of the proposed method;
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. In Table 1, it seems with DenseNet, the proposed Age-Aware Contrastive Learning doesn’t significantly outperform (or even lower than) the simple Age Prediction pre-training on some categories. Can you provide more discussion and insight on this result?
2. Why the authors don’t report Age Prediction and Age Prediction w/ D.A for UNet backbone?
3. The authors only validate the proposed method on MRI data. As it’s a general method for medical image tasks, more imaging modalities (such as X-rays) should also be evaluated;
• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The experimental settings are introduced in the paper. And the authors will release the code. So the reproducibility should be good.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. The authors should further improve the organization of the experiment part. For example, the numbering formats are not consistent. The subsection Evaluation of the representation uses the format of 1-, 2-…, while the subsection of Importance of σ and T in the positive sampling uses the format of i), ii)…;
2. To claim “semantically similar samples are “closer” than semantically different samples”, ablation study is necessary to compare the embedding distance of some exemplars;
3. The fonts in Fig 2 are too small to read, and the authors should make the fonts larger;
4. The authors should improve the writing of the experiment. Please ensure the experimental settings are well introduced. For example, what does the meaning of “D.A” in Table 1?
5. Please cite “Vu, Yen Nhi Truong, Richard Wang, Niranjan Balachandar, Can Liu, Andrew Y. Ng, and Pranav Rajpurkar. “MedAug: Contrastive learning leveraging patient metadata improves representations for chest X-ray interpretation.” arXiv preprint arXiv:2102.10663 (2021)” as a relevant work for incorporating meta-data for contrastive learning in medical image tasks;
• Please state your overall opinion of the paper

borderline accept (6)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

My initial rating primarily based on the novelty of this work and the concerns about the experimental results.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

6

• Reviewer confidence

Very confident

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This submission proposes a novel y-Aware loss that leverages continuous proxy metadata in the constrastive learning framework to improve the performance of the self-supervised method for 3D MRI classification. All the reviewers have positive comments on the novelty of the proposed method and the writing of the manuscript. The reviewers have concerns about experiment design, comparative experiments and experimental results, etc. The authors are suggested to address those questions in the rebuttal letter. Other questions should be addressed if space allows.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

# Author Feedback

We thank the Reviewers (R1, R2, R4) for their insightful comments which, we hope, will help us to improve the quality of the paper.

(R1: A baseline that combines SimCLR with age prediction" + R2:…it is not clear whether using simclr with age prediction makes an improvement.”). As suggested by R1 and R2, we have added a new multi-task approach combining SimCLR and age regression. We have updated Fig. 2 accordingly and added the following sentence “… compared to the multi-task approach SimCLR+Age, the features extracted by our method are less sensitive to the site where images come from. This shows that our technique is the only one that efficiently uses the BHB dataset by making the features learnt during pre-training less correlated to the acquisition sites.” In the updated Fig.2-b, we outperform by >+5% AUC SimCLR+Age on AD vs HC and SCZ vs HC and >2.5%AUC on BIP vs HC.

(R2: ”Compare with contrastive methods and BYOL”) As recommended by R2, we have included the results of: MoCo (contrastive method) and BYOL (non-contrastive method). We have updated Fig.2 accordingly. Similarly to SimCLR, results are significantly worse than the proposed method (-8% AUC for SCZ vs HC and AD vs HC in Fig.2-b).

(R4: “Why the authors don’t report Age Prediction and Age Prediction w/ D.A for UNet backbone?”) We thank R4 for pointing this out, we have added in Table 1 Age Prediction with Data Augmentation also for UNet. As for DenseNet, our pre-training remains better or similar to age supervision (+1% AUC on AD vs HC and +0.5% AUC for SCZ vs HC).

(R1: “… minimising the proposed loss is still maximising a lower bound on the mutual information…”) We agree with R1 that this is an important question. We are actually working on it and it seems that the proposed loss is indeed maximising a lower bound on the conditional mutual information. We plan to address this point, giving more mathematical details, in a future journal extension.

(R1: “While code is not provided…” + R4: “And the authors will release the code.”) As written in the abstract and p.5, our code is already publicly available in an anonymous repository.

(R1: “ why all experiments in Table 1 are performed on 20% …, while for the AD vs HC … 30%) We originally used 20% and 30% for the two data-sets respectively since we wanted the same number of subjects in both cases (N=100). We clarified this point by writing the absolute number of training subjects used for fine-tuning.

(R1: “Several of the arxiv references… have been published” + R4: “Please cite … as a relevant work”) We thank the Reviewers for pointing this out. We have modified the paper accordingly.

(R2: “Final results after fine-tuning makes the statistical validation questionable” and R4: “it seems the proposed method doesn’t significantly outperform Age Prediction”) We agree that the proposed method is not stat. sign. better than age prediction when fine-tuned on the same site. However, the main contribution of the paper is methodological. We propose a technique to leverage continuous (and categorical) proxy metadata with SimCLR. We will investigate in future whether the combination of other metadata (e.g. sex) can improve the performance of our representation.

(R4: “The authors only validate the proposed method on MRI data”) Here, we focused on 3 different neuroimaging applications using more than 10K MRI data. The extension to other modalities (e.g. X-ray) is left for future work.

(R4: “ the numbering formats are not consistent” + “The fonts in Fig 2 are too small to read” + “The authors should improve the writing of the experiment”) We have modified these points as suggested in the updated version.

(R4: “To claim “semantically similar samples are “closer” than semantically different samples”, ablation study is necessary”) We agree with R4. We added in the Supplementary a figure comparing a 2D UMAP representation of the features learnt with our model vs SimCLR, confirming our claim.

# Post-rebuttal Meta-Reviews

## Meta-review # 1 (Primary)

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Most of the concerns have been addressed in the rebuttal period. Updates reported in the rebuttal letter should be included in the final version.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8

## Meta-review #2

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors have addressed the reivewers concerns in their rebuttal. They are encouraged to address the concerns in their final paper too. There are novel compoenets in the method design. The proposed method builds upon several recent works on contrastive learning.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

## Meta-review #3

• Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

It is clear that the paper has an interesting novel technical contribution. Adding meta-data to a contrastive loss is well appreciated by all the reviewers. All reviewers requested additional experiments. Authors answered them well in their rebuttal. These additional experiments definitely improves the quality of the article. I believe their addition should not be very difficult in the paper.

I do not agree with R4’s comment regarding the need to demonstrate on x-ray images and citing a pre-print on the same topic.

Overall, the paper proposes an interesting idea and presents a large experimental validation. Adding more prior work in this comparison improves the quality even further.

• After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept

• What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

9