Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Quan Liu, Peter C. Louis, Yuzhe Lu, Aadarsh Jha, Mengyang Zhao, Ruining Deng, Tianyuan Yao, Joseph T. Roland, Haichun Yang, Shilin Zhao, Lee E. Wheless, Yuankai Huo

Abstract

Contrastive learning is a key technique of modern self-supervised learning. The broader accessibility of earlier approaches is hindered by the need of heavy computational resources (e.g., at least 8 GPUs or 32 TPU cores), which accommodate for large-scale negative samples or momentum. The more recent SimSiam approach addresses such key limitations via stop-gradient without momentum encoders. In medical image analysis, multiple instances can be achieved from the same patient or tissue. Inspired by these advances, we propose a simple triplet representation learning (SimTriplet) approach on pathological images. The contribution of the paper is three-fold: (1) The proposed SimTriplet method takes advantage of the multi-view nature of medical images beyond self-augmentation; (2) The method maximizes both intra-sample and inter-sample similarities via triplets from positive pairs, without using negative samples; and (3) The recent mix precision training is employed to advance the training by only using a single GPU with 16GB memory. By learning from 79,000 unlabeled pathological patch images, SimTriplet achieved 10.58% better performance compared with supervised learning. It also achieved 2.13% better performance compared with SimSiam. Our proposed SimTriplet can achieve decent performance using only 1% labeled data. The code and data are available at https://github.com/hrlblab/SimTriplet.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_10

SharedIt: https://rdcu.be/cyl1A

Link to the code repository

https://github.com/hrlblab/SimTriplet

Link to the dataset(s)

https://drive.google.com/drive/folders/14Cg-QuOCPVrynpuFI_jFqRqzTj2rNk4d?usp=sharing

Reviews

Review #1

Please describe the contribution of the paper

The authors describe a triplet sampling approach (SimTriplet) to learn representations from pathology images for downstream analysis tasks (e.g. segmentation). SimTriplet uses adjacent patches in addition to data (self-)augmentations, and doesn’t need negative samples. Furthermore, the authors present an efficient implementation that works on a single GPU.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

How to efficiently learn useful representations from unlabeled data while exploiting some form of data consistency (adjacent patches, slices or multi-view) is an important topic for medical image analysis, where annotations are expensive to come by. So this paper is a nice contribution to the field.

The authors present an efficient implementation that works on a single GPU. Thus, this solution can be widely used by the community as opposed to other approaches that require much larger compute resources.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The writing lacks a clarity in my view. There is not enough guidance for the reader. For example, I find the motivation for the current approach in the introduction hard to read. Without knowing all references in detail, it is hard to follow the paper. But I think also a conference paper should be as self-contained as possible.

The empirical findings need more discussion. I found the differences with supervised learning quite unexpected. Maybe I’ve missed something but to me it is not at all obvious that the self-supervised representation learning should improve the segmentation that much. One reason could be that the manual annotation actually seems rather coarse grained, this could bias the comparison towards methods that result in very smooth results. However, I am not saying this is a problem, I was just surprised, and a deeper discussion would be needed here.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The code should be made available. Otherwise it might be hard to reproduce the method.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- In the introduction (e.g. p2) it would be good to clarify how the problems are that the cited references are trying to address are connected and how it all fits into a common line of reasoning. It would be nice to lead the reader here.
- The method uses neighboring patches. E.g. the caption of Fig. 2 states “Adjacent image pairs are sampled from unlabeled pathological images (left panel) for triplet representation learning (right panel). The GigaPixel pathological images provide large-scale ”positive pairs” from nearby image patches for SimTriplet.” I wonder what happens at boundaries between tissue types. Here the assumption would break down I suppose. Maybe this is not a really issue, but it would be very helpful to comment on this aspect.
- An MLP predictor is used in the middle path of the network. It would be good to motivate and explain the role of it.
- For how many epochs did you train the self-supervised part? This would be useful to know to judge the differences in performance with respect to a fully supervised model (trained for 100 epochs)
- The average of 5 predictions is reported on the test set. How much variability did you find?
- It would be very important to discuss the difference in performance between SimTriplet and the supervised baseline in more detail.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I am not entirely convinced I understand the exact value of this contribution and its implications just from reading the paper alone. But the method is very interesting from a practical point of view. So if the manuscript were rewritten to better explain the motivation, the method details and also discuss the result a bit more, it would be a good paper for MICCAI.
What is the ranking of this paper in your review stack?

4
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper proposes a method called SimTriplet that builds upon the recent self-supervised approach, SimSiam, by incorporating the multi-view nature of pathological images. The presented method maximizes both intra-sample and inter-sample similarities via triplets from positive pairs, without using negative samples, and utilizes the recent mix precision training to advance the training by only using a single GPU with 16GB memory. The proposed method in this paper achieved superior performance compared with benchmarks, including supervised learning baseline and SimSiam method for tissue type classification from Whole Slide Images (WSIs).
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

-The motivation of this paper is sound and the proposed method is simple yet effective.

-The paper is well-written, and it is easy to follow.

-Experimental results such as model performance on partial training data demonstrate strong generalizability of the proposed method, e.g., only using 1% labeled data.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The paper seems to be limited in novelty, and it is a simple combination of MICLe [2] based on multi-view contrastive learning and SimSiam method. Having said that, the authors compared their proposed method with MICLe and SimSiam in isolation and under the different pre-training schemes and show improvement. The self-supervised baselines are limited, and the experiment section can be improved, for example, if authors compare their method with the recent self-supervised clustering-based scheme (Caron et al, 2020).

Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P. and Joulin, A., 2020. Unsupervised learning of visual features by contrasting cluster assignments. arXiv preprint arXiv:2006.09882.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Regarding the reproducibility of this work, I feel the description in the method section is sufficient to understand the details of implementation and training steps thoroughly. Having said that, the authors opt to release the code in the case of acceptance.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The authors might give more visualization examples, such as t-SNE of the reported experiments, to demonstrate the representations learned by the proposed loss terms objective.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Although the method borrows heavily from existing techniques, the paper has merits.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper presents a self-supervised representation learning approach using a new triplet loss that only requires positive samples (under different data augmentations), which simplifies the training process, allowing it to run on a single GPU. Both reviewers agree that the paper has technical and practical merits. They raise some questions, that I would like to see answered by the authors during the rebuttal. In particular:1) can the authors better motivate the approach, so the reader does not need to read all references to understand the main points of the paper?; and 2) can the paper discuss more in depth the differences with supervised learning?
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

Author Feedback

We appreciate the reviewers and AC for the insightful reviews and summarize the comments into two major and two minor concerns.

[Major concerns]

Clearly illustrate the motivation and innovation Both AC and R1 asked for clearer motivation, especially the novelty compared with MICLe and SimSiam. –The first motivation of SimTriplet is to propose a histopathological image centric contrastive learning scheme. Our SimTriplet was motivated by the unique advantage that nearby small patches typically represented the same tissue type in a Gigapixel histopathology image, thereby providing extra real “positive pairs” (from two images) beyond data augmentation (on one image) in the prior arts (Fig. 2). With such advantage, the proposed SimTriplet simultaneously minimizes both intra- and inter-sample distance (Fig. 3) while SimSiam only minimizes the former. Unlike MICLe, SimTriplet does not require any negative samples and is optimized for pathological images. Such innovations are visually presented in Fig.1. Meanwhile, we have clearly appreciated the motivating MICLe and SimSiam in the original introduction. –Another motivation is to leverage contrastive learning (typically memory extensive) for research labs that do not have access to advanced computational facilities. As suggested by R2, we run a new experiment using another benchmark SwAV (Caron et al. 2020). As a result, the F1 and Balanced ACC of SwAV are 0.53 and 0.60, which are inferior compared with our SimTriplet (0.65 and 0.72) using the same batch size = 128 within 16GB GPU memory. SwAV typically requires large batch size (4096) to achieve outstanding performance. It shows that the proposed SimTriplet can learn an effective embedding with small batch size and limited memory, due to the dual “stop-gradient” design (Fig. 1) that doubled the size of learnable images in a mini-batch with trivial added memory consumption. Moreover, we introduced mixed precision training to improve the memory efficiency.

Why self-supervised representation learning outperform supervised learning? Both AC and R1 asked for more discussion about the superior performance of self-supervised learning compared with supervised learning. – First, annotations are expensive in pathology. Self-supervised learning first learned an effective embedding from 79,000 unannotated image patches (from 79 patients), and was then fine-tuned with 5,000 annotated image patches (from 5 patients). By contrast, the supervised learning only learned from 5,000 annotated image patches (from 5 patients), thereby yielding inferior performance on testing images (Table 1). – Second, learning from large unlabeled data is the key advantage of self-supervised learning. Its superior performance than supervised learning has been shown in the prior arts (e.g., SimCLR and BYOL).

[Minor concerns]

Boundary cases discussion R1 asked what would happen for the pairs of patches are sampled at boundaries between tissue types? –A single histopathology image (like our data) contains more than 10,000 image patches. When we randomly sample 500 pairs, the probability of having the same tissue type is much higher than different tissue types (see Fig. 2 or 4). Moreover, the boundary is typically not a hard “cut” between tissue types, which even mitigates the potential issue.

Sensitivity of hyper-parameters –For R1’s question on the variability of the results, we now report the standard deviation of balanced ACC of SimTriplet is SD = 0.012, which is better than supervised learning (SD=0.028) and SimSiam (SD=0.016). –For R1’s question about the number of epochs, we trained the same 100 epochs for supervised fine-tuning to be a fair comparison with supervised learning. –R1’s asked about the role of MLP predictor. The previous SimSiam paper has revealed that the role of predictor is to avoid feature collapse. –We will provide t-SNE plots as suggested by R2, even our Fig. 4 has provided the similar visual evidence.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal provides a convincing answer for the first question (approach motivation). However, the answer for the second question about the superior performance of self-supervised learning (in comparison with supervised learning) remains unclear in my opinion given that it is not true in general that self-supervised learning have been shown to be superior (e.g., SimCLR and BYOL). The rebuttal did not answer R2’s concern about a comparison with Caron et al.’s paper. Even with these issues, I believe the paper should be accepted to MICCAI.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This work proposed a triplet loss which makes use of nearby patch-similarity in large whole slide histopathology images. The method is very simple and closely related to existing approaches (raising some concerns regarding novelty). However, the work shows that this simple approach can lead to improvements. Both reviewers appreciated this work.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

9

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Reviewers have questions regarding method clarity and motivation, result discussion and novelty. These issues need to be addressed in the rebuttal. The rebuttal has provided satisfactory responses with some additional results. The final version should address these comments and improve the method description as R1 suggested.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

back to top

SimTriplet: Simple Triplet Representation Learning with a Single GPU