Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Jing Ke, Yiqing Shen, Xiaoyao Liang, Dinggang Shen

Abstract

Generative adversarial network (GAN) has been a prevalence in color normalization techniques to assist deep learning analysis in H&E stained histopathology images. The widespread adoption of GAN has effectively released pathologists from the heavy manual workload in the conventional template image selection. However, the transformation might cause significant information loss, or generate undesirable results such as mode collapse in all likelihood, that may affect the performance in the succeeding diagnostic task. To address the issue, we propose a contrastive learning method with a color-variation constraint, which maximally retains the recognizable phenotypic features at the training of a color-normalization GAN. In a self-supervised manner, the discriminative tissue patches across multiple types of tumors are clustered, taken as the salient input to feed the GAN. Empirically, the model is evaluated by public datasets of large cohorts on different cancer diseases from TCGA and Camelyon16. We show better phenotypical recognizability along with an improved performance in the histology image classification.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_55

SharedIt: https://rdcu.be/cymbh

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

In this work, the author proposed incorporating contrastive learning in CycleGAN for WSI color normalization. 2 additional constrains are introduced: 1) a self-supervised tissue cluster label is generated as latent input of GAN; 2) a contrastive loss is introduced to preserve the semantic information of color normalized image.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Color normalization of WSI image is an important topic. Combining contrastive loss and CycleGAN is novel. Method has been tested on 5 public datasets and compared with 7 other related methods. Promising results have been achieved.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Method writing needs to be improved. It is hard to follow and some critical details of methods are missing.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Experiments are based on public data. Some implementation details are missing. No code given.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The title “Contrastive Learning Based Stain Normalization” is not accurate for the method proposed. “Incorporating Contrastive Learning in CycleGAN for Stain Normalization” is more preferred. The description of CycleGAN can be simplified to make room for more critical The author proposed a self-supervised pre-clustering model. How is this model implemented? How to incorporate cluster labels as latent input of U-Net? There are 4 losses introduced. While Fig 2 only showed 3 of them. Lp is missing. The training configuration of CycleGAN is confusing: 1) It is not clear which dataset is selected as target domain in the experiment. 4 datasets are used. Did author use them in turn as target domain? Or the target domain is a virtual common domain across datasets? 2) Did author train separate CycleGAN for different source-target domain pairs. Or only 1 normalization network is trained in this paper? If only 1 is trained, the distribution of source domain is not consistent and the style reverse network (B->A) could be confused and fail. The motivation of the definition of theta and mu in equation 6 is not clear.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Method is relatively novel and comprehensive comparison is given. However, method description needs to be improved before accept.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

8
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

Authors have proposed a GAN based approach for stain normalization. The GAN is trained in combination with a clustering loss and contrastive loss. The latter is used to handle the mode collapse that arises in the GAN.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The authors have built upon an existing approach (StainGan) for stain normalization and introduce modifications to make it robust and improve performance. The first modification is by using “Augmented CycleGAN” instead of “CycleGAN.” The output of soft-clustering is provided as latent input to the GAN.

Although authors claim to use a CycleGAN, the reviewer thinks that the paper uses augmented CycleGAN (https://arxiv.org/pdf/1802.10151.pdf). However, there is no reference to this. Why is this paper not acknowledged?

The second modification is the introduction of the contrastive learning task. It leads to addressing the issues like mode collapse. Also, an ablation study has been provided to highlight the contribution of these changes. The methodology has been verified empirically (similarity metrics) and diagnostically (classification).
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
According to the reviewer, certain short comings of the paper are as follows:
1. Clustering loss: The authors claim, “We design a self-supervised method to cluster morphologically homogeneous patches across….” (Page 2, last para). At other places also, it is mentioned as a self-supervised approach (page 3, last para). However equation (1) is loss of unsupervised task which authors have mentioned themselves as “and the unsupervised soft-clustering model as fs : X s → [0, 1]^d……” (Page 4, first para). Why such a conflict? Also, no information has been provided about the “d” in the training details.
2. Contrastive learning: Two paired images x^A′,x^A′′ are constructed. These depend on \theta. However, \theta also depends upon these two paired images. So, how are these values initialized and updated during training? There is no information about this in the training details.
3. Experiment set-up: The model requires a source image and a target image (or A and B). However, no information is provided for either dataset regarding the source images and target images that are used for training.
4. Authors claims, “ In our method, a more generalized GAN is trained towards multiple cancer diseases…” (page 3). Again, no information is provided in training details or experiment set-up that highlights issues. As a consequence, the incorporation of clustering loss is not clear.
5. Are the provided results on the training set, validation set, or test set? Also, what are the size of the datasets and the splitting ratio? This information is entirely missing.
6. Classification results: Again, no information about experiment-set up or training and testing details.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper is missing many important details, and solely based on the provided information, it is not be possible to reproduce the results. However, the authors have mentioned that the code will be released publicly.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Authors should provide all the necessary training details, experiment set-up, and datasets information. This information is essential for understanding the proposed approach. Please refer to point (4). There is also a need to explain the clustering task in more details.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The reviewer’s decision is based on the lack of details in experiment set-up, training, and datasets. In the absence of this information, the reviewer is unable to understand the incorporation of clustering loss. Also, there seems to some ambiguity. For example, clustering is proposed as self-supervised, but it seems to be unsupervised. Maybe additional training details can highlight this aspect. The result section has comparison with other methods, and ablation study as well. But due lack of above information, reviewer is unable to make some important inference from these results.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

The paper proposes a framework for H&E slides stain normalization, based on a Generative Adversarial Network and contrastive learning. The authors claim good results when comparing to SOTA works, without the need for annotations.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Paper well-written; Interesting method with promising results; Dataset with good size and very good pathology variability; Good SOTA comparison;
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Unclear presentation of the classification task setup/results
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

All data is available but, without case identification, results are not reproducible. Altough code is not available, the method is clearly described and seems reproducible on another dataset.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

In section 3.1, the authors say that, in addition to the direct assessment of the normalisation quality, they also assess its impact on a subsequent classification task by comparing the proposed method with state-of-the-art works. For that, they rely on the NCT-CRC-HE-100K-NORM dataset and another subset from CAMELYON16, and they use the ResNet-18 as the classifier backbone. However, the results presented on Table 2. are indicated for the TCGA collections and none result for the NCT-CRC dataset is presented. Also, if these datasets were used, and knowing that they are not annotated at the pixel/patch-level, containing only slide labels, how do the authors did the aggregation of patch predictions into a single prediction for the slide (the NCT-CRC has only small patches, individually labelled)? This comparison/results should be revised, clarified and improved, in case of acceptance. Moreover, for a better understanding of the importance of the normalisation step, the classification results should include the performance of the classifier also on raw data.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper is well written and overall clear, except for the evaluation of the impact of the normalisation method on the classification task. The presented results (SSIM and FSIM) are very promising and beat the state of the art methods but the classifier setup/results should be revised and clarified.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Very confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper proposes a framework for H&E slides stain normalization, based on a Generative Adversarial Network and contrastive learning. The strengths of the paper include: 1) Combining contrastive loss and CycleGAN i; 2) Combining contrastive loss and CycleGAN. The points should be addressed in the rebuttal: 1) details about implementation; 2) details in experiment set-up, training, and datasets.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Author Feedback

To Meta-Reviewer: The concerns about the details of the experimental setting, implementation, and dataset have a point-to-point answer as below.

To Rev#1: 1.Detail of method The proposed framework minimizes the incidence of mode collapse by GAN in a color normalization task and unreasonable output images as the input of the succeeding classification network. The designed constrained style transfer with contrastive learning effectively suppresses the outliers in the mutual information extraction (section 2.2).

2.Code availability and reproducibility (also to Rev#2#3) We temporarily use https://anonymous.4open.science/r/MICCAI-5EB0/main.py to meet the anonymous requirement. We will also publish the images and readme document as a package under our full name account upon the paper acceptance. With the elaboration, it is thoroughly reproducible.

3.Implementation details (also to Rev#2, Meta Rev Q1) The Self-Supervised Learning (SSL) framework clusters patches prior to the training stage of color normalization, SSL is also progressively trained together with GAN by loss L_p. Then the clustered output logits, i.e. the input to the final softmax layer in SSL, are concreted with the feature vectors at the bottom layer of the U-Net. The U-Net architecture, routinely adopted in medical image segmentation, is employed to transfer styles between two domains for its outstanding performance and compact architecture. L_p is an implicit loss term, comprised in two boxes tagged ‘SSL’ in Fig2.

4.The configuration in training (also to Rev#2, Meta Rev Q2) Four datasets are used in our experiment, each containing a variable number of slides from multiple hospitals,and trained by a separate CycleGAN independently. In each dataset, slides from an individual hospital are allocated to either domain A or B, where the distribution criterion is the data balance between the two domains. In this fashion, both the transfer from A->B and its style reverse network B->A work well. There is no training/test data distribution for style transfer. We use all images to test SSIM & FSIM scores.

To Rev#2: 1.Augmented CycleGAN An augmented CycleGAN learns many-to-many mappings by cycling over the original domains augmented with auxiliary latent spaces i.e., map (z1,x1) in domain A to (z2,x2) in domain B, where z1, z2 is the latent inputs, x1 and x2 denote images. However, our framework only does (z1,x1) input to x2 output, without latent label z2 as output in domain B. However, we will cite the Augmented CycleGAN to eliminate the confusion.

2.Clustering loss SSL is trained prior to CycleGAN, the Lp in Eq1 uses the pretrained SSL to improve performance. Hence the former statement is consistent with the latter. d is a hyper-parameter to describe the dimensionality of cluster output vectors.

3.Contrastive learning \theta, x^A’, x^A’’are mutually conditioning, they are updated on the fly as training proceeds. The initial value of \theta is initialized to ½, i.e. and the average is assigned to x^A’ and X^A’’ (Para. 1 in Sec. 2.2).

4.Classification setup (also to Rev#3, Meta Rev Q2) After the implementation of color normalization to all images, 70% of images are distributed to train a robust classifier and 30% to test in each dataset with the held-out method. The demonstrated results are performed on test sets.

To Rev#3: Annotations for classification (also to Meta Rev Q2) The NCT-CRC dataset is issued with patch-level annotation originally. Camelyon16 provides pixel-level annotation, which is aggregated to patch-level annotations. The patch-level annotations for used three TCGA subsets are publicly assessed. Hence, we can compare the aggregated classification performance for tumor identification. We demonstrate the universality of our method with different mainstream annotation strategies.

We appreciate all your suggestions. Should there be any statement omitted in the previously submitted version, we will take it in the final version.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper proposes a novel method combining contrastive learning and CycleGAN for histopathology stain normalization. Method has been tested on 5 public datasets and compared with 7 other related methods. Ablation study on the components of the model was also conducted. The rebuttal sufficiently clarifies some details of the model.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper combines contrastive learning and cycleGAN for histological image colour normalisation in which contrastive solve the problem of mode collapse by using interpolated paired images as a replacement of original and GAN-generated image pair. As pointed out by reviews and AC, the general idea of the paper is innovative and evaluations are also comprehensive But some methodological details are missing in the original manuscript including details of clustering method and evaluation details in the normalization. These issues have been mostly addressed in the rebuttal and I therefore support the paper acceptance.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

9

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal effectively addresses the main concerns of the reviewers. The paper is meritorious and effectively uses contrastive learning. In the final version the authors should further clarify the steps in the clustering approach
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8

back to top

Contrastive Learning Based Stain Normalization Across Multiple Tumor Histopathology