Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Dewen Zeng, Yawen Wu, Xinrong Hu, Xiaowei Xu, Haiyun Yuan, Meiping Huang, Jian Zhuang, Jingtong Hu, Yiyu Shi

Abstract

The success of deep learning heavily depends on the availability of large labeled training sets. However, it is hard to get large labeled datasets in medical image domain because of the strict privacy concern and costly labeling efforts. Contrastive learning, an unsupervised learning technique, has been proved powerful in learning image-level representations from unlabeled data. The learned encoder can then be transferred or fine-tuned to improve the performance of downstream tasks with limited labels. A critical step in contrastive learning is the generation of contrastive data pairs, which is relatively simple for natural image classification but quite challenging for medical image segmentation due to the existence of the same tissue or organ across the dataset. As a result, when applied to medical image segmentation, most state-of-the-art contrastive learning frameworks inevitably introduce a lot of false negative pairs and result in degraded segmentation quality. To address this issue, we propose a novel positional contrastive learning (PCL) framework to generate contrastive data pairs by leveraging the position information in volumetric medical images. Experimental results on CT and MRI datasets demonstrate that the proposed PCL method can substantially improve the segmentation performance compared to existing methods in both semi-supervised setting and transfer learning setting.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_21

SharedIt: https://rdcu.be/cyl1L

Link to the code repository

https://github.com/dewenzeng/positional_cl

Link to the dataset(s)

https://github.com/XiaoweiXu/Whole-heart-and-great-vessel-segmentation-of-chd_segmentation

https://www.creatis.insa-lyon.fr/Challenge/acdc/databases.html

https://zmiclab.github.io/projects/mmwhs/

http://segchd.csail.mit.edu/

Reviews

Review #1

Please describe the contribution of the paper

In this paper, the authors propose a contrastive learning-based method for volumetric medical image segmentation. Specifically, a new positional contrastive learning method is proposed to generate positive/negative pairs according to the position of the slices. Extensive experimental evaluations on 4 datasets validate the effectiveness of the proposed method. The main contribution of this paper is the proposed positional contrastive learning method, which could also be useful for other applications and medical modalities.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The idea of using positional information to design positive and negative pairs for contrastive learning is interesting and makes sense for the specific segmentation application. This positional-based idea could also be useful for other medical applications.
- The effectiveness of the proposed method is demonstrated by extensive experiments on 4 datasets with better performance than other selected methods. Details are shown in Table 1 and 2.
- The authors provide detailed discussions for the experimental results, which bring insights to the community.
- The paper is well-written and easy to follow.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Given that there is prior work [2] that also use positional information for 3D medical images, the novelty of the proposed method is a bit limited.
- The idea of predicting position as a way for self-supervised learning is not new in the medical domain, e.g., [1, *2]. [1] Self-Supervised Ultrasound to MRI Fetal Brain Image Synthesis, TMI 2020. [*2] Learning to map 2D ultrasound images into 3D space with minimal human annotation, MedIA 2021.
- The threshold t is a key hyperparameter in the proposed method. But a detailed discussion is lacking and there is no experimental analysis of this parameter.
- The claim at the end of Sec.3 is not convincing. Why the standard contrastive loss [3] that only has one positive pair is a drawback? Why the proposed method that uses much more positive pairs is better? The effectiveness of contrastive learning also depends on the negative pairs. Such a claim is also not validated in the experiment.
- Missing reference to contrastive learning when first mentioning in the Introduction.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Code was not submitted. But it should not be difficult to reproduce the method in this paper, given that sufficient details were provided and the datasets used are all publicly available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- The authors are suggested to do a more thorough literature review for self-supervised learning in the medical domain, especially for contrastive learning based approaches. Examples but not limited to: [3] Models Genesis: Generic Autodidactic Models for 3D Medical Image Analysis, MICCAI 2019, MedIA 2020. [4] Self-Supervised Contrastive Video-Speech Representation Learning for Ultrasound, MICCAI 2020. [5] Parts2Whole: Self-supervised Contrastive Learning via Reconstruction, MICCAI workshop 2020. [6] Contrastive Rendering for Ultrasound Image Segmentation, MICCAI 2020. [*7] Self-supervised feature learning for 3d medical images by playing a rubik’s cube, MICCAI 2019.
- It would be better if the performance for the setting without data augmentation, to isolate the influence from augmentation.
- It would be better to analyse the influence of the number of positive pairs and negative pairs.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Overall, the paper is well-written with a simple but effective idea for self-supervised contrastive learning. The authors also provide sufficient experimental evaluations with thorough analysis and discussion for the results. All these aspects contribute to the pros of the paper. Although there are some concerns as mentioned above, I believe this paper and the corresponding findings could be interesting to the audience of MICCAI 2021 and those issues could be addressed in a minor revision. As a result, I would recommend acceptance.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This work designs a new contrastive learning framework for 3D medical images. Position information of each slice is added to distinguish the positive/negative samples, in order to mitigate the problem of more false negative pairs. The motivation behind this is that adjacent slices in the same volume or corresponding slices in different volumes have similar content. The performance on different dataset has shown the effectiveness of the introduced framework. Authors also conduct transfer learning to verify whether the learned representations are transferable.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- This work proposes a novel contrastive learning framework by involving the position information of each slide, which is suitable for 3D medical images.
- The results on different datasets have shown that it outperforms the andom setting, other self-supervised learning settings, and other methods for 3D medical images. Authors also conducts transfer learning experiment to show the effectiveness of the proposed framework.
- The paper is well-written and well-organized.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Not sure about whether the assumption holds that volumes of different patients are aligned for these dataset. It would be better to do analysis on it.
- Is PCL based on the SimCLR framework? It would be better to provide more details on the data augmentations that are used in the contrastive learning. Some data augmentations are not suitable for medical images.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The code for SimCLR has been public. Authors have provided some details about the experiments. I think it is ok to reproduce the work of this paper if the dataset is public as well. As mentioned in the authors’ reproducibility response, they plan to release the code and training models.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

It would be better to have the analysis on the assumption that volumes of different patients are aligned for different datasets
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

A novel contrastive learning framework is proposed in this work. Adding position information is suitable for 3D medical images, which is helpful for the community. As such, I prefer the accept as my rating at the current stage
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

The authors utilized the positional information of 2D slices embedded in 3D volumetric medical images to reduce the false-negative samples and improved the convenient contrastive learning (e.g., SimCLR). Specifically, for each anchor image x, its positive samples include its augmented view and all 2D slices whose position is close to x. The proposed PCL is evaluated on 4 segmentation tasks in both semi-supervised and transfer learning settings, and demonstrates superior performance compared with several state-of-the-art self-supervised learning baselines, including SimCLR and GCL.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The manuscript is well written and illustrated, making it very easy to follow.
- The authors clearly point out the false-negative problem when conventional contrastive learning methods are applied on 2D slices from 3D volumetric medical images, and proposed a novel yet simple self-supervised learning method to address the limitation.
- The proposed PCL demonstrates superior performance compared with serval self-supervised learning baselines, including PIRL, SimCLR, and GCL.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- On page 4, the authors mentioned that the corresponding 2D slices in different volumes could be defined as positive pairs if the volumes of different patients are “perfectly aligned.” However, it is generally not possible considering the movement of patients and different protocols and machines during the collecting process. Therefore, how is the alignment problem solved during the training of PCL?
- In the supplementary, what is the average number of positive samples with different thresholds? Especially when the threshold is small, the positive samples will only contain the augmented view most of the time, and PCL will be downgraded to SimCLR. However, PCL is insensitive to different thresholds and significantly outperforms SimCLR even with the smallest threshold. Therefore, what is the main reason that PCL still significantly outperforms SimCLR with a small threshold? Or to what degree will PCL be downgraded to SimCLR?
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Fairly possible. The datasets used in this paper are publicly available, and the main idea is an improved version of SimCLR, which has official implementation.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. Since the 3D segmentation tasks are solved with 2D sliced-based methods, the ImageNet-pretrained model would still serve as a strong baseline and should be included. Additionally, the baselines did not incorporate recent progress in self-supervised learning. For example, SwAV [1] and BYOL [2] show superior performance compared with SimCLR. It would be more comprehensive to include SwAV and BYOL as baselines.
2. On the other hand, it would be interesting to see how the segmentation tasks can be solved with 3D volume-based self-supervised learning methods such as Models Genesis [3] and Rubiks Cube [4].
3. The title emphasized “segmentation” and PCL is only evaluated on segmentation tasks. However, PCL is not specifically designed for segmentation tasks but rather a general self-supervised learning approach. Therefore, the title may be misleading to some extent, and the impact of PCL may be restricted.
4. On page 1, the authors mentioned that “due to the strict privacy concern…acquiring such large labeled datasets is usually prohibitive…a large amount of unlabeled image data…is generated every day all around the world.” The logic here has some flaws since “privacy concern” exists for both labeled and unlabeled images. Please consider rephasing these two sentences.
[1] Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P. and Joulin, A., 2020. Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint arXiv:2006.09882. [2] Grill, J.B., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doersch, C., Pires, B.A., Guo, Z.D., Azar, M.G. and Piot, B., 2020. Bootstrap your own latent: A new approach to self-supervised learning. arXiv preprint arXiv:2006.07733. [3] Zhou, Z., Sodha, V., Siddiquee, M.M.R., Feng, R., Tajbakhsh, N., Gotway, M.B. and Liang, J., 2019, October. Models genesis: Generic autodidactic models for 3d medical image analysis. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 384-393). Springer, Cham. [4] Zhuang, X., Li, Y., Hu, Y., Ma, K., Yang, Y. and Zheng, Y., 2019, October. Self-supervised feature learning for 3d medical images by playing a rubik’s cube. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 420-428). Springer, Cham.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- The manuscript is well written and illustrated.
- The authors identify one limitation of conventional contrastive learning methods and proposed a novel yet simple approach to address it.
- The performance is superior compared with serval existing self-supervised methods.
- There are some missing details/explanations (e.g., the alignment problem, performance of PCL with small thresholds) which may potentially downgrade this paper.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The reviews for this paper are throughout positive. The idea of integrating positional information in the context of contrastive learning is appealing and the experiments are convincing. There are a couple of details missing and the issue raised regarding the alignment problem should be clarified in the paper.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

Author Feedback

We thank all reviewers and meta-reviewer for their time and recognition of this paper’s contributions. Here, we focus on addressing some concerns and questions of the reviewers.

Reviewer #1:

Why not use the standard contrastive loss? Response: We use the contrastive loss in Section 3.3 because it is neat and can handle multiple positive pairs at the same time so that our PCL can be better utilized. Of course, one can also use standard contrastive loss [3] to handle these positive pairs pair by pair (which is what paper [2] did). The implementation would be a bit redundant, but the performance will not change much.

It would be better to see the performance for the setting without data augmentation. Response: We have experimented finetuning without data augmentation. In fact, the performance of both random initialization and PCL degrade, though the former much more significant. In other words, the gains of PCL become more significant. We do not use these results in the paper because data augmentation is a common technique that people normally use.

It would be better to analyze the influence of the number of positive pairs and negative pairs. Response: The influence of positive pairs can be seen in Table 1 in the supplementary (though we did not have space to discuss it in the paper). Basically, there exists an optimal threshold with the best learning performance, when the threshold is too small, the number of positive pairs in a mini-batch decreases, information in adjacent slices can not be fully utilized (false-negative rate increases). However, when the threshold is too large, the number of positive pairs increases, so will the false positive rate which can decrease the learning performance.

Reviewer #2:

Not sure about whether the assumption holds that volumes of different patients are aligned for these datasets. It would be better to do an analysis on it. Response: The volumes in CHD (and ACDC) dataset were taken by standard process to capture the same anatomical areas, so they were already roughly aligned as they were acquired. Like [2], we did not use an external tool to align volumes in any of the datasets. We believe rough alignment is enough for PCL to work.

Is PCL based on the SimCLR framework? It would be better to provide more details on the data augmentations. Response: Yes, it is based on SimCLR. We only use weak data augmentation for contrastive learning and details can be found in Section 4.1. Code will be release for reproductivity.

Reviewer #3: 1: How is the alignment problem solved during the training of PCL? Response: In our analysis, we assume perfect alignment for ease of discussion. In our experiments, all data are only roughly aligned because they capture the same anatomical areas; no external tools are used to align them. Please refer to Reviewer #2 Q1.

2: What is the average number of positive samples with different thresholds? Why the smallest thresholds still outperform SimCLR? Response: Even with the smallest threshold we use (in the supplementary), PCL does not downgrade to SimCLR: the average number of positive samples for CHD (z-axis dimension about 130~300) is 6.7, and for ACDC (z-axis dimension about 10) it is 23.6. We have put the average number of positive samples in the table in our revised version. We will investigate the extreme case (threshold close to 0) in our further work – thank you for your suggestion!

3: Compare to SwAV and BYOL? Response: Both SwAV and BYOL are based on learning features from two different views of one data, our method of generating multiple positive pairs can actually build on them to leverage the structural information in medical images. Thank you for the suggestion, we will investigate these directions in our future work.

back to top

Positional Contrastive Learning for Volumetric Medical Image Segmentation