Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Zeyu Gao, Bangyang Hong, Xianli Zhang, Yang Li, Chang Jia, Jialun Wu, Chunbao Wang, Deyu Meng, Chen Li

Abstract

Histological subtype of papillary (p) renal cell carcinoma (RCC), type 1 vs. type 2, is an essential prognostic factor. The two subtypes of pRCC have a similar pattern, i.e., the papillary architecture, yet some subtle differences, including cellular and cell-layer level patterns. However, the cellular and cell-layer level patterns almost cannot be captured by existing CNN-based models in large-size histopathological images, which brings obstacles to directly applying these models to such a fine-grained classification task. This paper proposes a novel instance-based Vision Transformer (i-ViT) to learn robust representations of histopathological images for the pRCC subtyping task by extracting finer features from instance patches (by cropping around segmented nuclei and assigning predicted grades). The proposed i-ViT takes top-K instances as input and aggregates them for capturing both the cellular and cell-layer level patterns by a position-embedding layer, a grade-embedding layer, and a multi-head multi-layer self-attention module. To evaluate the performance of the proposed framework, experienced pathologists are invited to selected 1162 regions of interest from 171 whole slide images of type 1 and type 2 pRCC. Experimental results show that the proposed method achieves better performance than existing CNN-based models with a significant margin.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_29

SharedIt: https://rdcu.be/cymak

Link to the code repository

https://github.com/ZeyuGaoAi/Instance_based_Vision_Transformer

Link to the dataset(s)

https://dataset.chenli.group/home/prcc-subtyping


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper presents a cancer subtyping method using histopathology images. A transformer based method is designed, integrating cellular and cell level features. Evaluation is conducted on a private dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Method design is sound and has some novelty.
    • Various aspects of evaluation have been conducted.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Method description is not very clear and should be improved.
    • Compared approaches (Table 1) are hardly representative of current state-of-the-art in image classification. More advanced deep learning especially non-transformer based methods should be compared.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Method description should be improved for reproducing the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    See above.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Method seems well designed. Performance comparison with more advanced deep learning methods would be good.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    This paper proposed to use the vision transformer to subclassify renal cell carcinoma from histopathology images. Initially, a segmentation and classification network were used to segment the nuclei and then classify them into different grades. A vision transformer was then applied to the patches (based on the segmentation) with the embedding of the grades to produce classification results. The experimental results with the self-collected dataset appear to be reasonable.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well organized, which made audience easy to follow. The application of vision transformer to histopathology images is novel and interesting.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It seems that the proposed method is a straightforward implementation of transformer with a minor modification of the embedding. Therefore, it’s difficult to understand the technical contribution to the field.

    The evaluation materials seem to be problematic. A total of 1162 ROIs across 171 patients was randomly divided into training, validation and test datasets, which means that both the training and test dataset may consist of images derived from the same patient.

    There are number of hyperparameters were set without justification. Consequently, it will be difficult to reproduce the results.

  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The proposed method was developed with a self-organized dataset. There are also number of empirically derived hyperparameters, which maybe difficult to reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    There is no comparison to the existing methods that are optimized for histopathology image classification. Therefore, it’s challenging to understand that the proposed method has improved the state-of-the-art.

    The manuscript claimed that CNN based methods have difficulties to capture the features at the cellular and cell-layer level. However, it’s relatively difficult to understand the reason.

    Following the above, author also used CNN based methods for the first stage detection. A tiny CNN was also applied for embedding the patch features. It seems to be contradicting the point mentioned above.

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The application of vision transformer to histopathology images is novel and interesting. However, many sections of the manuscript need to be better explained and justified.

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    The paper deals with classification into supbtypes of pRCC cases from the observation of WSI. However type 1 and 2 are very similar and differences in large size images like WSI is not easy task even with state of the art CNN models. The studied model is a Transformer called instance based Vision Transformer i-ViT. The validation is performed over 171 WSI and 1162 ROI in there coming from the TCGA-KIRP database.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    It is a very interesting mixture of geometric considerations (graph-based) and new architecture (Transformer) for a very difficult classification task for which even physicians hardly (30%) agree.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I would say the layout and the piece of information delivery throughout the text. To make it more impactiful I will rearrange a few things.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Code and dataset for nucleisegmentation will be made available. For the rest, it is quite clear to reproduce.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Abstract : “invited to selected” , “to select” I guess. Introduction : Fig. 1 : there is no “red circles” but red outlines … I do not really see the point : CNN are able to learn subtle differences right ? So why is it so difficult to distinguish different architectural patterns even if similar. There must be another justification. The large image issue can work. What is really interesting in this paper is : how deep learning architecture usually devised for small images or patches can be useful when relationships and geometry or topology of elements matter. Page 3 : “we propose an novel”, “a novel” Please rephrase it “The central idea of the proposed i-ViT is to capture instances features first by extracting instance-level patches that each patch includes a nucleus with part of the surrounding background.” I find out that the paragraph “To tackle the aforementioned challenge” will be better positioned in the method part.

    Method : how do you get the nucleus grade for prediction ? Answer in the experiments part ok but a bit late to me in the course of the reading as it is critical to the method. What is the difference with graph based method applied to histological images apart form the transformer innovation? I guess it is related to the 4 methods in the Subtyping of pRCC part in the experiments session but please make it clearer what is the more related to mainstream graph based methods (possibly with hand-crafted features) and the learned features ones. Please elaborate on these previous research works.

    Experiment : 3.2 nuclei segmentation → Nuclei segmentation 3.2 Results The results are convincing for the i-ViT method. I will be curious to see the generalization of i-ViT-H and i-ViT on a totally different dataset. “The performances of the huge and middle scale models are competent in most of the parameter settings,” is competent the good word ?

    Conclusion : “Subtyping of pRCC has poor clinical consistency, the diagnostic difference between our pathologists and TCGA is more than 35%, so it is necessary to design an automated subtyping model for pRCC.” It could be stated in the introduction. Then it explains the difficulty even for the physicians to process and stress out the interest to be aided for the physician. I really liked the work. I think it could be more impactful by taking into accounts my remarks about the global presentation in the sense that some piece of information should be put earlier in the text and rearrange a bit the intro into methods.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The classification problem is rather difficult (even for expert in the field) and the proposal in interesting as it blends a bit of geometric /topologic representation (architecture of the tissue) with state of the art Transformer methods.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    3

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    There are mixed opinions. Some noted lack of technical contribution, some noted lack of clarity and some noted problems in evaluation. These are the main issues that should be addressed in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5




Author Feedback

Thank you for your time and constructive comments, and we glad to have this opportunity to discuss our work.

About the technical contribution, we conclude it in the following three aspects: Firstly, this paper focuses on a representative and difficult histological-image-based cancer subtyping task, papillary renal cell carcinoma (pRCC) subtyping. The difficulty comes from distinguishing similar subtype cell-geometry in the large-size WSI, which is a challenge even for pathologists. Based on our knowledge, few deep-learning-based methods are proposed to perform this or a similar subtyping task that requires fine-grained features. We believe that it is meaningful to address such a task that is critical in the real-world scenario. Secondly, we propose a new framework, instance-based Vision Transformer, for pRCC subtyping. As explained in the introduction section, two key features to distinguish type 1 and type 2 are the randomly distributed cellular-level and cell-layer-level patterns in the large-scale WSI. We design a two-stage framework. In the first stage, we segment out the nuclei areas from the original image as instance patches with position information and assign a grade to them, implement by a Micro-Net trained jointly by the nuclei segmentation and the grade classification task. In the second stage, we select the same number of instances for each image according to nuclei size and grade, then learn the position, grade, and instance embedding. Then we apply a Transformer-like mechanism to fuse all the embeddings of an image to learn an image-level representation for classification. Our framework is meticulously designed to capture the cellular-level and cell-layer-level patterns for pRCC histological subtyping. It integrates the existing advance in medical image segmentation and classification, rather than a simple and straightforward implementation of Transformer. Note that, for the first stage, pixel-level labels (for segmentation) and nuclei grade labels (for grading) are available, provide good guidance to CNN for learning features from a small area. Also, for instance-level embedding, the instance size is in cell-level, where CNN can effectively capture the features. However, the existing CNN-based method for image classification can hardly capture the fine-grained features (cellular-level and cell-layer-level patterns) from a large-size image when constrained by the coarse-grained image-level labels (like type 1 or 2). The above is why we apply CNN in our framework but think existing CNN models are not effective in pRCC subtyping.
Thirdly, our model is evaluated on a real-world dataset and shows promising results. We believe our work can bring insights for solving similar classification tasks. And we state in the paper (section 3.1) that both our code and dataset will be pubic available.

About the evaluation, based on our knowledge, few existing methods are focusing on pRCC subtyping. Therefore, we compare our model to four baselines. The hyperparameters that potentially influence the performance in our framework are analyzed and in the sensitivity analysis (section 3.3). Besides, quote from the second paragraph in section 3.1, “We randomly divide the dataset into three subsets in patient-level.” Therefore, images from one patient would not appear in both training and test set.

About the clarity, we appreciate the detailed comments in improving the clarity of our paper. We will seriously consider the advice, rephrase the inappropriate expressions, rearrange some paragraphs, and improve the clarity in the final version.

That’s all for our response. We sincerely hope this can resolve the concerns and our paper can be accepted.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal seems satisfactory. There is some novelty in the method design. Main issue is clarity. The final version should be revised to address the reviewers’ concerns.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper introduces a two-stage framework for subtyping of papillary renal cell carcinoma: nuclei segmentation and classification followed by a vision Transformer-based cancer subtyping. Most concerns are addressed in the rebuttal, such as technical contribution, comparison with other methods and parameter sensitivity analysis. In addition, the authors will improve the clarify of the presentation (they promise in the rebuttal).

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    8



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Authors propose an two-stage framework for subtyping of papillary renal cell carcinoma in histopathological image. As described in the manuscript and emphasized in the rebuttal, stage 1 is a nuclei segmentation, where patches cotains nulei are selected as instance features, which are futher aggregated together with their positions in vision transformer (stage 2) for final classification. The methodology framework is very well justified. In my opinion, the application of the vision transformer in conjunction with nuclei segmentation has sufficient amount of novelty as it adapts vision transformer in their own application and solve a very important relevant problem. The tehnical and implementation details are also addressed in the rebuttal. Plus, authors promise to release the code as well as trained model which make the method more reproducible. I support to accept the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



back to top