Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Marta Wojciechowska, Stefano Malacrino, Natalia Garcia Martin, Hamid Fehri, Jens Rittscher

Abstract

Detection of early onset of fibrosis is critical to detecting long term damage to identify potential loss of organ function. While formal grading systems for fibrosis have been established, we argue that a quantitative analysis of fibrosis patterns will improve diagnostic quality and help to standardise clinical reporting. Here we are using deep learning to identify elementary fibrosis patterns. Subsequently, a graphical model is utilised to model the spatial organisation of the fibrosis patterns. Our experimental results demonstrated that this approach correlates well with established clinical grading. The presented method holds the potential to be applied to histology in other organs (e.g. kidney).

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_21

SharedIt: https://rdcu.be/cyl90

Link to the code repository

https://github.com/mkatw/gnn-fibrosis

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    Assessing the level of fibrosis plays an important role in the diagnosis of liver disease. In this work, an automated system is designed to classify WSIs of different fibrosis grades (F0-F3). First, the ImageNet pre-trained CNN is used to extract the features of the patch in the WSI, and then the graph representation of WSI is constructed according to the feature clustering results. Finally, the classification result is obtained through graph convolution with attention pooling.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Grading liver fibrosis from pathological images is an interesting medical application, which has not been sufficiently studied in the existing literature.
    2. Some visualization of experiments are provided, which is helpful for understanding.
    3. The writing of this paper is clear and easy to follow.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. In general, technological innovation is relatively weaker than the applied innovation. There are some good works on learning representation from WSI to graph, such as [1-2]. Compared with these methods, the cluster-based method proposed in this paper is not more novel. It is recommended to add relevant comparison and analysis.
    2. Does the edge of the graph (determined by the clustering result) have some medical connection with fibrosis disease? It is recommended to add more analysis to Figure 6.
    3. There are some typos in the caption of Figure 1, it should be F0-F3, right?
    4. As can be seen from Table 2, the experimental results are not very good. Only when f0-f2 are merged into one class and F3 is a separate class, can the algorithm achieve satisfied performance. However, from Table 1, there are only three F3 samples in the test set. Will this bring validation bias? It is suggested that the author expand the data set of F3 and provide more experimental results and analysis.

    Reference

    [1] Adnan M, Kalra S, Tizhoosh H R. Representation Learning of Histopathology Images using Graph Neural Networks[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2020: 988-989. [2] Lu W, Graham S, Bilal M, et al. Capturing Cellular Topology in Multi-Gigapixel Pathology Images[C]//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops. 2020: 260-261.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    the authors claim that their source will be available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    please refer to the weakness part.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    insufficent comparison, writting good

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    The paper developed a dataset for liver fibrosis stages (METAVIR standard) and proposed to use GCN for WSI classification.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper achieves good performance, provides detailed analysis of results, and is prepared to release the data and code.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are several issue about dataset description, technical details, method comparison, and reference. See comments below.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The data and code will be made available later.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    I assume the dataset used here (table 1 XXXX from organisation ***) is originally proposed in this paper? However, the description of the data acquisition is too vague. “The biopsies have been labelled by an expert pathologist”: which level of annotation does it have, collagen level or only class level? What is the size of the WSI slide?

    Section 4.1 Tile subtyping: “an ImageNet pretrained ResNet18 model” was used here. Indeed, pretrained models derived from natural images can be and sometimes used for medical image analysis, however, there is also another ResNet model trained on the same dataset but used only for method comparison. Why not use this ResNet as the pretrained model? K-means clustering is an unsupervised method, so I doubt if simply setting k = 4 gives rise to a precise subtyping: all cluster 4 points stand for dense collagen without outliers? Again why select only cluster 3 tiles among the 4 clusters?

    Section 4.3 Graph convolutional layers: reference [11] is not the state-of-art graph convolutional/neural network model, so why use [11] as the GCN classification model? For example, GIN, GMT, etc. are all published after [11] in recent years and have shown better performance.

    Section 4.4 Attention layer: attention is usually used in GCN model, see GAT or GMT for example, so the problem here is that there was no reference for the attention layer in this section. The author should either cite and use a published attention GCN layer or mention why all the published attention methods are not suitable for this task.

    Section 5 Results: as mentioned in the paper: “Graph Convolutional Networks (GCNs) have been applied to histopathology”, but there is no comparison with any of [13, 17, 18]. The baseline (ResNet18) seems to be too weak. Figure 4 and Figure 5 say little about the superiority of the proposed method.

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper is easy to follow, but there are several issues that need further clarification.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    4

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    This paper presents a method for liver fibrosis stage classification by a graph convolutional network (GCN) classifier. A pretrained ResNet18 model is used to extract image features that are clustered further by k-means method for collagen subtyping at the tile level. With such landmarks of high collagen tiles, a tissue graph is established with weights derived from image features by the ResNet18 model. After the graph layer, an attention layer is used to aggregate node vectors to a set of 3 vectors, each for one output class. The resulting method is trained and tested on 271 NAFLD PSR stained liver biopsies staged by METAVIR standard.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. This work fully leverages the spatial distribution of high collagen regions by graph convolutional networks and uses such information for liver staging classification.
    2. This work may present a high translational value for research and clinical practice.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. For computational efficiency, this presented work flow is applied to downsampled images by a factor of 32. As a result, it does not leverage the full image detail information for classification.
    2. The resulting tile subtyping results may present spatially disconnected tiles of cluster 4. It is not clear how these tiles are combined for connected regions.
    3. It is not clear why features extracted from ResNet-18 model pretrained on ImageNet can well differentiate 4 clusters of tiles with different collagen densities.
    4. Only ResNet baseline method is compared with the proposed method with and without attention layer. It would be better to have additional state of the art classification systems for comparisons.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The reproducibility of this paper is nice. It provides clear descriptions of the architecture schema of the system, the dataset for performance testing, number of samples in each class, training-testing data splits, and baseline methods. It also makes the code public available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. As this presented work flow is applied to downsampled images by a factor of 32, it does not leverage the full image detail information for classification. Therefore, any way to improve the method computational efficiency without loss of full image resolution would be in great demand.
    2. In 4.1, it is clear how to make regions of high collagen content by connecting spatially disconnected tiles of cluster 4.
    3. As the resulting graph depends on the tile size in Section 4.1, it would be important to discuss how the tile size affects the end classification performance.
    4. In section 4.1, it says “we found that setting k = 4 results in easily interpretable clusters with tiles containing increasing amount of collagen.” Is this conclusion true for other datasets?
    5. Please clarify why features extracted from ResNet-18 model pretrained on ImageNet can well differentiate 4 clusters of tiles with different collagen densities.
    6. In 4.1, it says “To address the problem of variation in staining slides are binarised”. What method is used for binarization?
    7. As the batch size can be a decisive factor for the final system performance, please clarify why the batch size for ResNet is 1 and GCN is 64. Does this have anything to do with memory limitation?
    8. Only ResNet baseline method is compared with the proposed method with and without attention layer. It would be better to have additional state of the art classification systems for comparisons.
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work fully leverages the spatial distribution of high collagen regions by graph convolutional networks and uses such information for liver staging classification. The work code is pubic. Therefore, this work may present a high translational value for research and clinical practice.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Based on the reviewers’ comments, we think this is a high-quality paper with significant technical contributions. Authors should address reviewers’ comments in the camera-ready submission.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3




Author Feedback

We would like to thank the reviewers for recognising the technical contributions in our paper; in particular the clinical importance of our work, and its translational value. We think that the very constructive comments will improve our manuscript.R1:

C1: Compare with [1-2]. A: Our method is unique in that it reflects the underlying structure of liver tissue, which is not present in other organs. Adnan et al. [1] use only 10% of tiles to construct representative tissue graphs from very large cancer sections. It would not apply to the very spatially limited liver biopsies. Also, the work by Lu et al. [2] is cell specific, whereas our slides do not feature cell nuclei, but fibre.

C2: More analysis in Fig 6. A: We will expand this in the paper.

C3: Typos in Fig 1. A: The caption is correct.

C4: Will the small number of F3 samples bring validation bias? A: Indeed, the small number of class F3 samples is a limitation in our study and results solely from data unavailability.

R2:

C1: Dataset description. A: Thank you for pointing this out. We will clarify this in the paper.

C2: Why not use classification ResNet as the pretrained model? A: The two ResNet models were used for images presenting very different scenes. The one used for method comparison was trained on 32x downsampled WSIs, whereas the one used for feature extraction was applied to 256x256 full resolution tissue patches. We do not think that these models can be used interchangeably.

C3: Is k=4 clusters precise? Why use only cluster 3? A: Indeed, k=4 clusters does not guarantee the most precise tissue subtyping. However, in our work we performed the clustering for the purpose of identifying regions of dense collagen, which are used as centre-nodes in subsequent graph construction. All tiles (from all clusters) are included in the constructed graph.

C4: Comparison with other GNNs A: We will include a comparison with other GNN models in the camera-ready version of the paper. We would like to note that the paper which proposed GMT was only submitted two weeks before the MICCAI ’21 submission deadline.

C5: Use of attention layer A: The use of attention suggested is complementary to our approach and we will include these layers in our analysis.

C6: Comparison with [13, 17,18] A: [13,18] are cell nuclei-specific methods, which cannot be applied to our slides, as they do not contain nuclei. [17] was developed for the purpose of identifying regions of high similarity, which was not the objective of our paper. Our baseline method was used by the only paper [16], which we identified as addressing the same task. We will revise Fig. 4 and Fig. 5 in the camera-ready version.

R3:

C1: Workflow applied to downsampled images? A: It is not true that our proposed method is applied to downsampled slides. As stated in Section 4, 32x downsampled slides are used by the baseline method (ResNet18). The proposed method uses full-resolution slides (Section 4.1). We will further clarify in the camera-ready text.

C2: How to make regions of high collagen content? A: Our pipeline is constructed to connect all tiles into subregions centred around the tiles with high collagen content. This is done to mimic the underlying tissue architecture.

C3: Effect of tile size. A: We chose the tile size specifically to match the size of biologically relevant tissue regions. We will clarify this in the paper.

C4: Is k=4 suitable for all datasets? A: Indeed, this observation may not be true for other datasets. We will clarify this in the paper.

C5: Why do ResNet-extracted features differentiate clusters? A: The results of clustering were assessed by a qualitative evaluation.

C6: Binarization method? A: Segmentation using a CNN. We will clarify this in the paper.

C7: Is batch size chosen related to memory limitation? A: Yes. We will clarify this in the paper.

C8: Compare with more methods. A: We will expand our method comparison in the camera-ready version.



back to top