Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Apostolia Tsirikoglou, Karin Stacke, Gabriel Eilertsen, Jonas Unger

Abstract

The scarcity of labeled data is a major bottleneck for developing accurate and robust deep learning-based models for histopathology applications. The problem is notably prominent for the task of metastasis detection in lymph nodes, due to the tissue’s low tumor-to-non-tumor ratio, resulting in labor- and time-intensive annotation processes for the pathologists. This work explores alternatives on how to augment the training data for colon carcinoma metastasis detection when there is limited or no representation of the target domain. Through an exhaustive study of cross-validated experiments with limited training data availability, we evaluate both an inter-organ approach utilizing already available data for other tissues, and an intra-organ approach, utilizing the primary tumor. Both these approaches result in little to no extra annotation effort. Our results show that these data augmentation strategies can be an efficient way of increasing accuracy on metastasis detection, but fore-most increase robustness.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_60

SharedIt: https://rdcu.be/cyl6A

Link to the code repository

N/A

Link to the dataset(s)

https://datahub.aida.scilifelab.se/10.23698/aida/lnco

https://camelyon17.grand-challenge.org/Data/

https://datahub.aida.scilifelab.se/10.23698/aida/drsk


Reviews

Review #1

  • Please describe the contribution of the paper

    This work explores alternatives on how to augment the training data for colon carcinoma metastasis detection when there is limited or no representation of the target domain. Both an inter-organ approach utilising already available data for other tissues, and an intra-organ approach, utilising the primary tumour were explored.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well written, the methods chosen for data augmentation are common in the field;
    2. A very important problem of dealing with limited datasets through data augmentation is studied;
    3. Interesting and systematic experiments were conducted.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. A relatively small dataset for generalisation of the results was used (37 patients, how many images?);
    2. Some more details on the datasets (number of images, positive/negative, annotations, demographics, if possible) will help the reader to understand what was used for the experiments;
    3. Some more information on training Cycle-GAN and adding more images of the results of augmentation would be interesting for the reader, maybe, if possible, could be added in the Appendix.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good that public data was used. However, sharing the source code and trained models will be highly appreciated.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. A relatively small dataset for generalisation of the results was used (37 patients, how many images?);
    2. Some more details on the datasets (number of images, positive/negative, annotations, demographics, if possible) will help the reader to understand what was used for the experiments;
    3. Some more information on training Cycle-GAN and adding more images of the results of augmentation would be interesting for the reader, maybe, if possible, could be added in the Appendix.
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A very important problem of dealing with limited datasets through data augmentation by domain adaptation was studied systematically on public data

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    The paper proposes a method to use inter-organ and intra organ datasets to handle the lack of availability of datasets. Detailed experimentation is conducted to consider different cost scenarios and the result interpretations are provided.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper addresses one of the major problems faced in the field. using data manipulation techniques.
    2. The levels of difficulty with similar and varying external datasets (breast…..skin) from the same or different organs is discussed
    3. The methods are analyzed against three cost categories to evaluate the best configuration to be used
    4. Applicable in real-world and might be helpful in solving problems in several use-cases
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It would be helpful if the authors discuss a little more about sampling patches from the images. Was the sampling done at random or was there control to ensure that bad patches (which are very common in WSI) don’t get picked up.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducible with the information given. Scalability is not discussed at length.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    No major weaknesses. Experimental details could have been more specific to make the paper easily reproducible

  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The quality of work; experiments conducted and the interpretations are very helpful to the community.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper proposes a method to use inter-organ and intra organ datasets to handle the lack of availability of datasets. The strengths of the paper include: 1) A very important problem of dealing with limited datasets through data augmentation is studied; 2) systematic experiments were conducted; 3) well written and easy to follow; 4) real world results are explained.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2




Author Feedback

We would like to thank the reviewers for their comments and the provisional acceptance recommendation. We will address the main points they indicated to improve the main and supplementary manuscripts.

  1. Regarding datasets and Cycle-GAN training details, as well as the sampling strategy: • We refer the reviewers and the readers to the supplementary document for details on the datasets and the experimental setup. Table 1 in the supplementary material shows the number of patch images for the training and testing sets of colon, breast, and skin datasets. In specific, the 37 patients of the colon adenocarcinoma dataset correspond to 119,474 tumor and 149,580 non-tumor patch images. These 37 patients were randomly split into 32 patients for training and 5 for testing. Table 2 in the supplementary document provides further details on the train and test sets, as well as on the limited training data sub-sets, of the colon adenocarcinoma dataset regarding the number of patch images both in total (tumor/non-tumor) pre- and post-class balancing, and in specific per location source (hospital) and lymph node tumor, primary tumor and non-tumor patches pre- and post-class balancing. We will complement the datasets description with details on whole slide image numbers - however, we emphasize that training and evaluation performance were conducted on patch level - and with additional information for the breast and skin datasets. Unfortunately, apart from the hospital source, no other demographic details for the dataset cases are available. • We used one of the publicly available TensorFlow-1 implementations (https://github.com/vanhuyz/CycleGAN-TensorFlow) of the vanilla image-to-image translation network trained for 250,000 iterations for all experiments. We explored Cycle-GAN transformations in two ways: 1) per class (tumor/non-tumor) adaptations, and 2) one joint network for both of the classes. We will add the source code reference in the main paper, and complement the supplementary material with more Cycle-GAN domain transformation visual examples. • For the colon dataset, we sampled using a random uniform grid with 128 microns between the sample points. This corresponds to 256 pixels when sampling at a resolution of 0.5 microns (i.e., approximately 200 times magnification). We set the patch size to 256×256, meaning that the patches were sampled side-by-side without overlapping. In total, 269,054 patches from non-tumor, primary tumor, and lymph node tumor tissue were extracted. Each patch was assigned the label based on the annotation of the center pixel in the patch. The same sampling strategy was followed for the breast and skin datasets, resulting in 200,770 and 277,193 extracted patches respectively. We will add the sampling details to the supplementary document.

  2. Regarding reproducibility and scalability: • The deep classifier, implemented in TensorFlow-2, as well as all the trained models: 52 Cycle-GANs and 985 classifiers, i.e., (4 groups × 49 sub-sets experiments + 1 full-set experiment) × 5 runs - see Table 3 in the supplementary document, are available upon request. • Since the presented results are based on relative trends, we expect that the proposed intra- and inter-organ augmentation approaches will scale up with similar observations to other cancer types too. We believe that utilizing primary tumor and already available inter-organ data, combined with further research on optimal augmentations size, domain adaptation strategies, and/or other image synthesis methods will enable solutions of deep learning-based diagnostic tools for several cancer types where not a plethora of training data are available.



back to top