Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Hongyi Duanmu, Shristi Bhattarai, Hongxiao Li, Chia Cheng Cheng, Fusheng Wang, George Teodoro, Emiel A. M. Janssen, Keerthi Gogineni, Preeti Subhedar, Ritu Aneja, Jun Kong

Abstract

In triple negative breast cancer (TNBC) treatment, early prediction of pathological complete response (PCR) from chemotherapy before surgical operations is crucial for optimal treatment planning. We propose a novel deep learning-based system to predict PCR to neoadjuvant chemotherapy for TNBC patients with multi-stained histopathology images of serial tissue sections. By first performing tumor cell detection and recognition in a cell detection module, we produce a set of feature maps that capture cell type, shape, and location information. Next, a newly designed spatial attention module integrates such feature maps with original pathology images in multiple stains for enhanced PCR prediction in a dedicated prediction module. We compare it with baseline models that either use a single-stained slide or have no spatial attention module in place. Our proposed system yields 78.3% and 87.5% of accuracy for patch-, and patient-level PCR prediction, respectively, outperforming all other baseline models. Additionally, the heatmaps generated from the spatial attention module can help pathologists in targeting tissue regions important for disease assessment. Our system presents high efficiency and effectiveness and improves interpretability, making it highly promising for immediate clinical and translational impact.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_53

SharedIt: https://rdcu.be/cymbf

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This report develops a multi-module system using multi-stain histopathology images (H&E and two stains) to predict pathologic complete response for triple negative breast cancer patients. The system has three modules: a cell detection modules, a spatial attention module and a pCR module.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • the combination of the three modules has some novelty, and the use of multi-stain images is challenging problem that requires more work.
    • the use of both H&E and IHC is novel.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The perceived novelty of the authors is very exaggerated because of the following reasons:

    • many reports have been published even in breast cancer going back to 2011 using digital pathology for clinical outcome prediction.
    • the description of the clinical use case shows lack of expertise in diagnosis and treatment of breast cancer.
    • especially in the discussion section the authors truly go overboard and claim this is the first study of predicting pCR. whereas many reports have used digital pathology in both breast cancer and beyond to predict treatment & survival. These methods are in essence the same as the ones reported here for the specific application in triple negative breast cancer and the authors’ definition of pCR.
    • spatial attention in digital pathology has already been used widely in digital pathology across several cancer applications.
    • details are missing on the actual multi-module model, in particular the amount of training data that is available for the CDM module that is hand curated by pathologists/human experts.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    hard to judge

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Negative PCR should be better defined and reported by using the RECIST criteria, e.g. for all non-PCR patients where they partial response, stable disease or progressive disease. The heterogeneity of the response according to RECIST criteria is an important factor determining the complexity of this problem.

    It is not clear how predicting PCR after chemotherapy treatment will avoid unnecessary chemotherapy use in future patients. Since the model is trained to predict treatment response after this treatment, how can you avoid then in the future?

    The clinical use case and the unmet clinical need are not properly introduced and suggest a lack of expertise in the literature on breast cancer diagnosis & treatment. E.g. It is very exaggerated that only one report has been published in breast cancer using digital pathology for treatment outcome prediction. E.g. Beck et al. Science Translational Medicine 2011. already reported on the use of digital pathology to predict breast cancer survival. Also of three references mention that studied PCR before, two are in the area of rectal cancer and not in breast cancer.

    The authors are advised to not overly focus on the clinical use case but rather present the work as a novel method that is applied within the area of breast cancer as a test case rather than claiming novelty in the clinical use case. As the authors likely had access to this breast cancer cohort as convenience rather than this is their primary field of interest.

    Minor – In this sentence “especially the absence of cancer cells in pathology images of tissue samples dissected during surgery” change during to after, as pathological analysis of tissues is typically not done in a Realtime fashion.

    The caption of Figure 1 should be expanded to better explain the components of the figure. Also MLTMA is not defined.

    Methods:

    • what is meant by serial imaging? The authors should provide a proper definition of serial imaging, are multiple images available at different time points during treatment? serial suggest different time points? Whereas more likely the authors have multiplex imaging with several stains done at the same time? This needs to be better defined.

    • the CDM modules is trained based on expert cell’s of interest. However, how many cells were manually labeled and what are the cell types where both cell locations and cell types labeled? This is important information to be able to judge if a CDM model can be properly trained and if enough training examples are present.

    • how is the SAM module trained? What is the loss function, or what is optimized here? Spatial attention in digital pathology models has been introduced several years ago, but no mention is made to any of this literature. E.g. Tomita N, Abdollahi B, Wei J, Ren B, Suriawinata A, Hassanpour S. Attention-Based Deep Neural Networks for Detection of Cancerous and Precancerous Esophagus Tissue on Histopathological Slides. JAMA Netw Open. 2019;2(11):e1914645. doi:10.1001/jamanetworkopen.2019.14645

    Figure 2 is not completely clear how the three modules are connected to each other: CDM, SAM & PCR? What are all the loss functions for each module, and how are they combined? How are the CDM modeul & SAM module connected to the PCR module?

    Results:

    • What is meant with “… adjacent tissue sections…” ?

    • Table 1: it is not clear if these are results on the training set or test set?

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    see above.

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    In “Spatial Attention-based Deep Learning System for Breast Cancer Pathological Complete Response Prediction with Serial Histopathology Images in Multiple Stains,” the authors propose a deep learning system incorporating multiple stain types and spatial cellular information to predict pathologic complete response (PCR) to neoadjuvant chemotherapy in breast cancer. The authors co-register H&E images, Ki67, and PHHH3 stains, then apply a cell detection model to form spatial attention maps provided with the original images to a prediction model.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The coupling of a cell detection module and a prediction module is an intriguing proposal and seems to make a substantial difference in training a successful prediction model in a low data setting
    • The co-registration of H&E with IHC stains is also an interesting addition and seemed to make a strong improvement over H&E only predictions also
    • The application addressed (prediction of PCR to NAC) is currently under-explored in the context of digital pathology
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • There are missing details regarding the dataset and clinical context that make it tough to assess the performance and impact of the model. Critically, it is not clear whether these predictions are made on pre-treatment or post-treatment samples.
    • The description of the training approach is insufficiently detailed. In particular, the data used for training the CDM remains unclear and if there was any overlap between the data used to train and validate the prediction model.
    • The dataset consists of only 75 patients total between training and testing
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Mostly good, but the dataset description and protocol for CDM training are missing key details.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • The paper is missing some key clinical details. For instance, are these biopsy or surgical samples? Most critically, no details are provided on when these were samples collected. There is a substantial difference in both difficulty and clinical impact between predicting pCR on treatment-naive samples vs. mid or post-treatment samples. The phrasing “A total of 75 NAC treated TNBC cases are collected” seems to imply post-treatment samples were used, which would decrease enthusiasm for the work.
    • The direct comparison the authors make to the results of [7] in the discussion is likely a bit unfair, as that study investigated PCR prediction via TIL density among all breast cancer subtypes and there is substantial evidence in radiology literature that suggests that predicting PCR within molecular subtypes is easier than general response prediction [Cain et al. 2019 , Breast Cancer Research and Treatment; Braman et al. 2017, Breast Cancer Research]. The performance of [7]’s TIL based approach would likely improve in a triple negative cohort, due to the known importance of TILs in TNBC specifically [Gao et al. 2020, BMC Cancer]
    • While I am glad the authors reserved a large portion of their patients for validation, the dataset still remains quite small
    • Details on how the cell detection model was trained are insufficient. Were the same training-testing splits used? Did images overlap between training the two models? If testing set samples were used to train the CDM, this could be inflating results
    • Additionally, the cell detection model is said to classify cell types, but no details are given on what those cell types are
    • “However, a precise PCR prediction remains a challenging and until now, unsolved problem” is too lofty a claim given the low sample size and results that are comparable to many previous works on PCR prediction in the radiology literature
    • Given the comparison with the results of [7] in the discussion, it would perhaps be more informative to compare quantitatively by investigating the performance of TIL density in the authors’ dataset, if they are one of the cell types identified by the CDM
    • Given the sizeable boost the authors see when adding the SAM component, I would be curious as to how well a SAM-only model would perform. I am curious to know to what extent the network is using information about cell types/positions from the CDM directly vs. using that information to process the image itself.
    • A worthy addition to the discussion section would be recent work from Bychkov et al. Nature Sci Rep 2021, who predict treatment outcomes indirectly from H&E images by training a model to predict HER2 expression and investigating its association with survival in recipients of HER2-targeted therapy
    • The paper would benefit from an explanation of what biology Ki67 and PHHH3 stains capture and why it would be advantageous in outcome prediction
    • An interesting analysis for feature work would be to compare the raw CDM output with the spatial attention maps to quantify what cell types/patterns contribute most to PCR/Non-PCR prediction
    • It was surprising to see that the H&E CNN-only prediction model performance was essentially random. This makes me wonder how much information might be coming from the SAM alone.
  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I was conflicted on my recommendation for this paper. There are some interesting technical contributions here - namely the coupling of cell detection and prediction and the registration of H&E images with IHC stains. However, the paper also has some notable issues and is missing key details, most critically the clinical details of the dataset and the training of the CDM, which could hopefully added on the camera ready version. My most considerable reservation is the lack of detail on when the tissue samples were collected. If they were obtained prior to neoadjuvant therapy (as opposed to during or after), I would instead rate this paper as “borderline accept (6)”

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper
    • This work develops a system to predict pathological complete response (PCR) from chemotherapy before surgery using novel deep learning tools of spatial attention in the convolutional network framework. If a particular patient is predicted to have a poor PRC, then chemotherapy can be avoided, and other treatments can be administered.
  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • This paper tackles an important problem which has not yet been solved by taking advantage of the spatial structure of cells in histopathology images.
    • The paper proposes to incorporate adjacent tissue sections stained differently to obtain the maximum information about a tissue slice for PCR prediction
    • The proposed deep learning system is intuitive, and mimics a real pathologist’s workflow.
    • The paper is well written, with the different modules explained in good depth.
    • The ability of the method to yield post-hoc explainability using spatial attention maps is an added bonus.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The paper does not sufficiently compare PCR prediction with baselines. It is recommended to run the setup with multiple folds/runs to capture the variance of the methods and more accurately quantify the differences of the methods, for both tables 1 and 2.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • The model architecture and training is described. However, the GPU(s) used, and the strategy to the best model is not noted.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • In the introduction, it should be explicitly stated that Ki-67 and PHH3 markers can be obtained from the IHC images, before mentioning their use with H&E images.
    • In Fig 2, it should be noted that only H&E images are used for the cell detection module, while all of them are together used for the prediction module (if my understanding is correct)
  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper addresses an important problem in an innovative way with interpretability and novelty. This will be of interest to the MICCAI community.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper develops a multi-module system using multi-stain histopathology images (H&E and two stains) to predict pathologic complete response for triple negative breast cancer patients. The strength of the paper include: 1) the coupling of a cell detection module and a prediction module; 2) the use if both H&E and IHC. The points should be addressed in the rebuttal: 1) proper literature review; 2) discussion of clinical use; 3) details regarding the dataset, especially whether these predictions are made on pre-treatment or post-treatment samples.; 4) The description of the training approach.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    8




Author Feedback

Literature review: We claimed “Our model is the first work on predicting pCR with multi-stained histopathology images of serial tissue sections’ (Page 7). We do not intend to claim our study is “the first study of predicting pCR” (reviewer 1). In our work, we propose to jointly use images of the adjacent biopsy cuts from the same patients obtained BEFORE neoadjuvant therapy to predict pCR. We only find one study on breast cancer pCR prediction with pathology images. Reference of “Beck et al. Science Translational Medicine 2011” from reviewer 1 is for survival prediction, not pCR prediction.

Clinical use: Our data are pre-NAC biopsies, which means they were collected BEFORE neoadjuvant therapy (reviewer 1 and 2). In response to reviewer 2 on biomarker biology rationale, “Pannu et al., Oncotarget, 2015” reports that aggressive TNBCs promote cell proliferation with a faster cell cycling kinetics and enhances cell cycle progression. This enhanced cell cycle kinetics can be captured by Ki67 and PHH3 stains in adjacent tissue sections. Ki67 and PHH3 are proliferation and mitosis markers, respectively. Mitosis is the cell cycle phase that marks the “real” cell division. A tumor cell can be Ki67 positive, but will not enter mitosis phase until after a very long time. Therefore, a combinatorial analysis of Ki67 and PHH3 together can capture “important” risk predictive information, on the aggressiveness of the tumor that we cannot glean from either Ki67 or PHH3 alone. Note that the performance of H&E based prediction model is much worse than that of the model using multi-stained image triplets jointly. This verifies the rationale of our biology assumption that a combinatorial analysis of H&E, Ki67, and PHH3 can capture “important” risk predictive information (reviewer 2).

Dataset: Our data are pre-NAC biopsies, which means they were collected BEFORE neoadjuvant therapy. We define pCR as having no evidence of residual invasive carcinoma in both the breast tissue and regional lymph nodes with the Residual Cancer Burden (RCB) value being zero. Non-pCR covers varying levels in response with evidence of residual invasive carcinoma. Note that RCB value is calculated based on the lymph nodes and Primary Tumor Bed.

Training approach: We trained CDM for only tumor cell and TIL detection with 868 40x pathology images in the size of 1,024x1,024 pixels, as our biology hypothesis is that tumor cells and TIL together with two biomarkers (Ki-67 and PHH3) can provide information on the pCR prediction. This detection dataset is independent from the 1,038 40x pathology images for the pCR prediction and includes 53,314 tumor cells and 20,966 TILs manually labeled and classified by pathologists. With the optimizer stochastic gradient descent (SGD) and the loss function for YOLOv4, “Bochkovskiy et al., arXiv, 2020”, CDM is trained for 200 epochs with the learning rate of 0.001 in one NVIDIA V100 GPU. When the CDM is fully trained, the CDM is frozen in the later training process for other modules to avoid computation burden. With trained CDM fixed, SGD as the optimizer, learning rate as 0.001 for 100 epochs, SAM and PM are further trained with the cross-entropy loss for pCR prediction.

SAM is designed to transfer the H&E image derived CDM output to spatial attention maps that can be overlaid with adjacent IHC images. Therefore, SAM cannot be trained or run separately from CDM. We present comparison results from a system with only PM (i.e. col. 1, and 2 in Table 1 and 2) and a system with CDM+SAM+PM (i.e. col. 3, and 4 in Table 1 and 2). Such comparison results in Table 1 and 2 clearly demonstrate the spatial attention enhances pCR prediction.

Although our dataset includes 75 patients, each patient has 3 whole slide images of adjacent biopsy cuts. Additionally, our pCR prediction system is trained and tested on 29,631 and 46,029 image patches in the size of 512x512 pixels, respectively. Therefore, we have a large scale of training and testing data.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper develops a multi-module system using multi-stain histopathology images (H&E and two stains) to predict pathologic complete response for triple negative breast cancer patients. The combination of multi-stain images for PCR is novel work and the paper shows fairly good results. The rebuttal also sufficiently addresses the concerns about clinical use, data set and training details.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    6



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This manuscript presents a deep multi-task learning framework to predict pathological complete response (PCR) for triple negative breast cancer. The framework takes as input images with multiple stains and uses a spatial attention mechanism to fuse feature information from multi-stain input images. A combination of multiple modules might be novel, but the cell detection module is based on YOLOv4, and spatial attention has been widely used in digital pathology (pointed out by R1). Also, many technical details are missing and the clarify of the presentation needs to be significantly improved. The paper might need another round of major revision before publication.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    18



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    the authors propose a deep learning system incorporating multiple stain types and spatial cellular information to predict pathologic complete response (PCR) to neoadjuvant chemotherapy in breast cancer. Reviewers have concerns about prediction made pre vs post therapy, proper literature review, the size of datasets and other implementation details. In the rebuttal, authors confirm that their data is indeed pre-therapy ones and also clarify other issues. Generally, I feel the paper is interesting as it shows a useful framework to integrate multi stain types and has clinical relevance. The method to integrate cell detection and prediction module also have some novelty.Though the dataset is not particular large to make a more solid claim, it might be still worth to accept the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    12



back to top