Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Cong Cong, Sidong Liu, Antonio Di Ieva, Maurice Pagnucco, Shlomo Berkovsky, Yang Song

Abstract

Hematoxylin and Eosin (H&E) stained histopathology images provide important clues for diagnostic and prognostic assessment of diseases. However, similar tissues can be stained with variant colours which significantly hinder the diagnostic process and training of deep learning models. Various Generative Adversarial Network (GAN) based stain normalisation methods have thus been proposed as a preprocessing step for the downstream classification or detection tasks. However, most of these methods are based on either unsupervised learning which suffers from large discrepancy between domains or supervised learning which requires a target domain and only utilise the target domain images. In this work, we propose to leverage Semi-supervised Learning with GAN to incorporate the source domain images in the learning of stain normalisation without requiring their corresponding ground truth data. Our approach achieves highly effective performance on two classification tasks for brain and breast cancers.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_56

SharedIt: https://rdcu.be/cymbi

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a generative adversarial network for stain normalisation via colorisation of grayscale images or hematoxylin channel. The authors use a novel two-decoder architecture that introduces a consistency-based regularisation. The method is tested on two datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The semi-supervised approach to color normalization is interesting. The proposed conditional GAN includes an adversarial loss, a feature-based content loss, an earth mover’s distance loss on color distributions, and the proposed consistency loss. All of them seem to be relevant when normalizing histological images.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    In the training pipeline, it is not completely clear when the output from each decoder and the output of the network are used.

    The paper investigates the impact of different inputs, i.e., the hematoxylin component and grayscale image on histopathology normalization and classification, but does not take into account the impact on structure preservation, which is a major concern in medical images.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The datasets are publicly available and the parameters are correctly described. The results can, in theory, be reproduced.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The description of the training pipeline in section 2.3 should clarify when the output used is from decoder 1, decoder 2, or the combination of both (average) in eq.(4) and eq.(5). Also in eq.(5), the consistency loss seems to take as reference the pseudo mask of the source image. Shouldn’t it be an image from the target domain?

    The pseudo-mask is selected from the target in terms of mean and standard deviation of pixel colours of the overall target domain. This seems to be equivalent to the reference image required by most methods in the state-of-the-art and should not only represent the target domain but also have desirable H&E properties. It would be fair to use the same pseudo-mask and reference image for the competing methods. If this is the case, the reference image in Fig 2.g is not a good H&E sample as both stains are not easily identified.

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    See above

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    4

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    This paper introduces a semi-supervised method for stain normalization. The proposed method is compared with traditional methods and some SOTA methods to demonstrate the efficacy. Several loss terms were also incorporated to supervise the training. Extensive experiments, including ablation experiments, were conducted to verify the proposed method.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Downstream tasks such as IDH and BreakHis classification were carried to validate the method, which is essential for this kind of image pre-process procedure. The paper emphasized the importance of downstream task which was usually overlooked in previous studies.
    2. Several existing methods were properly integrated in the work and the results shows the efficacy of components such as the EMD loss.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The definition of semi-supervised in the paper is not proper
    2. The paper is not well-written and difficult to follow, especially in the Methods section
    3. The ablation study of the important component in the framework are missing
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method is adequately clarified with proper implementation details. I think the reproducibility of the paper is not a big issue.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Stain normalization is an important pre-processing step for deep/machine learning methods as pathologists can easily adapt to different stain styles, while stain style deviation poses a challenge for the generalization of deep learning models. This paper presents a semi-supervised colorisation model for stain normalization.

    I have several concerns about this paper. Please see my detailed comments below:

    1. The methodology of this work seems problematic. Semi-supervised learning utilizes the unlabeled data to enhance the supervised learning of labeled data. While in this paper, the source images (to be transformed to the target’s stain style) were treated as unlabeled data. The unlabeled data is not an optional but an essential ingredient for the stain style translation. In fact, there is no need to label the source images. You can just use it.

    2. The method is based on the assumption that hematoxylin component of different style images are the same. However, the assumption is not referenced in the paper. And as far as I know, the assumption may not be valid.

    3. The paper is not well-written and difficult to follow, especially in Method section. For example, Fig.1 is very confusing. The authors try to combine conditional GAN and semi-supervised learning framework in a single figure. However, it took me a while but I still cannot figure out the overall framework of the proposed method. I strongly recommend the author to re-draw Fig.1 into two parts, which correspond to Section 2.1 and Section 2.2, respectively. In addition, Eq.5 is also confusing and maybe erroneous. As the inputs are the unlabelled data, why y_s are included in Eq. 5? In addition, why L_gan and L_content are still employed here? In the Section 3.1, \lambda_L1 and \lambda_content cannot be found in the text.

    4. The chosen of the pseudo mask seems crucial to the method. The authors choose only one target domain image whose mean and standard deviation of pixel are closest to the overall target images. Is the proposed method robust to this choice? how the performance will be influenced if other pseudo mask are used? This is missing and not discussed in the paper.

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    see above.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    In this paper, a semi-supervised network was proposed to solve stain normalization as an image colourization task. The proposed model incorporates the source domain images in the image colourization learning and repaints both source and target domain images with the same colour distribution. The proposed approach evaluated using the two public datasets. Results demonstrated that the proposed method leads to a significant improvement for the classification task of histopathology images.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is mostly well written and clear.
    2. Methods are well formalised and straightforward.
    3. Experiments on 2 public datasets and analysis are presented.
    4. The use of generative models with two parallel decoders is interesting.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1.The authors should review the well-known stain normalization methods in the introduction.

    1. A study is needed to evaluate the performance of the proposed method in improving nuclei segmentation algorithms.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provided sufficient details about the model, dataset and evaluation.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    In my opinion, here are some needed to be improved. 1.The authors need to review more stain normalization methods in the introduction and explain their limitations. I have listed some methods: a) Vahadane et al., Structure-preserving color normalization and sparse stain separation for histological images, IEEE transactions on medical imaging, vol. 35, no. 8, pp. 1962–1971, 2016.

    b) BenTaieb, A.Hamarneh, G.: Adversarial Stain Transfer for Histopathology Image Analysis. IEEE Transactions on Medical Imaging 37(3), 792-802 (2018).

    c) Shafiei, Sobhan, Amir Safarpoor, Ahad Jamalizadeh, and H. R. Tizhoosh. “Class-Agnostic Weighted Normalization of Staining in Histopathology Images Using a Spatially Constrained Mixture Model.” IEEE transactions on medical imaging 39, no. 11 (2020): 3355-3366.

    d) Janowczyk, A., Basavanhally, A.Madabhushi, A.: Stain Normalization using Sparse AutoEncoders (StalloSA): Application to digital pathology. Computerized Medical Imaging and Graphics 57, 50-61 (2017)

    1. More datasets and experiments would strengthen the paper, maybe for future work.
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • important problem is addressed
    • method is well demonstrated on two large datasets
    • method is concisely described
    • datasets and results nicely described
  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    2

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors have proposed a semi supervised adversarial approach for stain normalization in histopathology images. I agree with R2 that some aspects of the method are not clear, Fig 1 is confusing and ablation studies are lacking. My other concern is with the choice of datasets. Most stain normalization methods choose the CAMELYON16 and CAMELYON17 datasets for evaluation. While the authors decision to opt for the TCGA IDH and BreakHis datasets is commendable, those should have been in addition to the CAMELYON datasets. R3 also raises a valid point of not citing some imp methods, especially Vahadane et al TMI 2016, and Ben Taieb et al TMI 2018. While there are merits to the paper the authors need to clarify specific aspects of the method as pointed out by R2.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    6




Author Feedback

We thank the Reviewers and Area Chair for your valuable time and comments.

AC: Choice of datasets In this study, we chose TCGA IDH for its diagnostic value and BreakHis for its multi-magnification setup. Moreover, both datasets provide patches with predefined training/test splits, which allows us to focus on stain normalisation and conduct direct comparisons with state-of-the-art methods reported on these two datasets. We will include CAMELYON16/17 in our future work, which however requires re-implementation of state-of-the-art models on CAMELYON16/17 for fair comparison.

R1-1: Training pipeline & pseudo-mask Sorry for the confusion. G(x) in Eqs (1, 4, 5) is the output from two decoders. Losses are calculated separately for the two decoders and summed for model updating. Thank you for pointing out the error in Eq (5) where input of the consistency loss should be y’ (pseudo-mask) instead of y^S. We have modified our paper accordingly. The pseudo-mask is indeed equivalent to the reference image for the competing methods. In original Fig.2, the reference image was randomly sampled from the target domain. We will replace it with the pseudo-mask in the final version.

R1-2: Structure preservation The content loss aims to preserve high-level features (last paragraph, Section 2.1). We have also tested more explicit structural constraints for nuclei but did not see an improvement in classification accuracy.

R2-1: Definition of semi-supervised We treat stain normalisation as an image colourisation task that colourises the inputs with the target domain stain colours. For target domain images, we used a supervised learning strategy with paired training data (original H&E stained colour image as ground truth and corresponding grayscale or hematoxylin component as input). Such paired/labelled training data are only available in the target domain but not the source domain. To enhance the supervised learning of labelled data, our work focuses on incorporating the unlabelled source domain images in training the GAN model. We design a two-decoder structure with consistency regularisation and pseudo mask as ground truth for the source domain, so that images from both target and source domains can be used during training. Therefore, the overall learning framework becomes semi-supervised.

R2-2: Hematoxylin component The hematoxylin component provides an alternative to grayscale images of the H&E stained colour images. Same as grayscale images, hematoxylin components are computed for individual images [21] and are not assumed to be constant across images.

AC, R2-3: Method description and Fig. 1 We will modify Fig.1 as suggested. We have fixed the typo in Eq (5): y_s should be y’ (pseudo-mask). The consistency loss in Eq (5) helps incorporate source domain images for semi-supervised learning. For both domains, L_GAN is used to enforce the generator to produce realistic images and L_content preserves the high-level visual features. L_MAE, which reduces the difference between the generated RGB and the original RGB, is replaced by consistency loss for source domain images. The two lambda parameters will be added to Eqs (3, 4, 5). We will also revise Section 2 to describe our methods more clearly.

AC, R2-4: Ablation study for pseudo-mask We have conducted further experiments as suggested. We define two different masks: 1) a target domain image that is structurally most similar (measured by SSIM) to the source domain input, and 2) a randomly chosen image from the target domain. The results show that our model has < 1% accuracy drop when using different masks, whereas other competing methods show 2-5% drop in accuracy. This result validates our design of the pseudo mask and shows robustness compared to other methods.

AC, R3-1: Review & evaluation We will add the references. We are also working on evaluating the methods on other datasets and tasks including nuclei segmentation, for preparing a journal paper. Many thanks for your suggestion.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In thee rebuttal the authors address the main concerns around justification of semi supervised approach, and the explanation of Fig 1. Authors also report numbers for ablation studies. In my opinion the responses are adequate.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    10



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper propose a a semi-supervised network to solve stain normalization as an image colourization task in which gray-scale or heamotoxylin-channel images are used as input and the network outputs colourised images according to the target domain style. Like R2, I did not find the description of “semi-supervised” is clear enough in the original manuscript. Ony after authors’s explanation in the rebuttal, I realise it is a paired training in target domain but not in source domain. Nevertheless, I find the use “labelled” data and “unlabelled” is confusing, it is better to say “paired training” or “non-paired training”. Moreover, such a “semi-supervised” method is part of proposed image colourization, but not a special contribution for general stain normalization as other GAN based normalization techniques do not need “supervised” data as well. An more important problem I found is the extraction of heamotoxylin images using “colour deconvolution” Ruifrok [21] (stated in the rebuttal). As far as I know, Ruifrok [21] is a deconvolution method based on a reference colour matrix, which is obtained experimentally using pure staining slides with heamotoxylin/eosin only. Since we are talking about colour shift between source and target domain, which implies one should use source/target specific colour matrix to do the colour deconvoution. So the “gray-scale” method would be the one that is useful in practice but it does not outperform other method in IDH dataset. In summary, I do not feel the presented method is well motivated and justified and therefore suggest to reject the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    16



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper presents a semi-supervised adversarial learning method for stain normalization in histopathological images, which can benefit down-stream image classification tasks. The rebuttal addresses most of the reviewers’ concerns. In addition, the author say that they will improve the clarity of the paper, including revising the method description, modifying of Figure 1 and adding more related references.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    11



back to top