Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Pietro Antonio Cicalese, Syed Asad Rizvi, Victor Wang, Sai Patibandla, Pengyu Yuan, Samira Zare, Katharina Moos, Ibrahim Batal, Marian Clahsen-van Groningen, Candice Roufosse, Jan Becker, Chandra Mohan, Hien Van Nguyen

Abstract

Computer Aided Diagnosis (CAD) systems for renal histopathology applications aim to understand and replicate nephropathologists’ assessments of individual morphological compartments (e.g. glomeruli) to render case-level histological diagnoses. Deep neural networks (DNNs) hold great promise in addressing the poor intra- and interobserver agreement between pathologists. This being said, the generalization ability of DNNs heavily depends on the quality and quantity of training labels. Current “consensus” labeling strategies require multiple pathologists to evaluate every compartment unit over thousands of crops, resulting in enormous annotative costs. Additionally, these techniques fail to address the underlying reproducibility issues we observe across various diagnostic feature assessment tasks. To address both of these limitations, we introduce MorphSet, an end-to-end architecture inspired by Set Transformers which maps the combined encoded representations of Monte Carlo (MC) sampled glomerular compartment crops to produce Whole Slide Image (WSI) predictions on a case basis without the need for expensive fine-grained morphological feature labels. To evaluate performance, we use a kidney transplant Antibody Mediated Rejection (AMR) dataset, and show that we are able to achieve 98.9% case level accuracy, outperforming the consensus label baseline. Finally, we generate a visualization of prediction confidence derived from our MC evaluation experiments, which provides physicians with valuable feedback.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_31

SharedIt: https://rdcu.be/cymam

Link to the code repository

https://github.com/pcicales/MICCAI_2021_aglom

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors presented a MC sampling method for assessing anti-body mediated rejection from renal histopathological images, especificaaly, glomerular crops to overcome the need for a fine-grained structural annotation. They introduced tw case-level architecture, in which they first use CNN to process input images, and then a Morphset to compare the embedding of input images with learned prognostic vectors. On the top of that, the y implemented another MC sampling step to aggregate the predection and used these predections to represent the confidence level of the final predection by using probability denstiy curves.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1-The topic is of potential interest. 2-Hope this can be extended to other related diseases (e.g., liver cancer, lung cancer, prostate cancer, glioma, etc. 3-Comprehensive ideas. 4-References are adequate.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1-Hard to follow since it has a lot of math (lost some times). 2-Results need a a table comparing the three models in terms of accuracy, sensitivity, and specificity.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It might be reproducible.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

1-Very comprehensive math that needs more elaboration to make it easy for the reader. 2-Figure 2 needs some clarification on the Figure itself (e.g., Feature embedding) 3-An algorithm is needed to summarize the steps. 4-Reporting an accuracy, sensitivity, and specificity will make the results more representative. 5-Who perfromed the manual segmentation? Is he a pathologist? One or more?If more than one, what is the possiblity of the interobservability affecting the final results?
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper has novel contributions and demonstrate good results as well.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

3
Reviewer confidence

Somewhat confident

Review #2

Please describe the contribution of the paper

The authors proposed a novel method based on the attention mechanism to process images (patches) extracted from renal histology samples for diagnosing Antibody-Mediated Rejection (AMR). Particularly, AMR crops of each case are sampled using Monte Carlo (MC) sampling scheme and then the selected images are passed through a CNN (EfficientNet-B0) to generate image embeddings and then those embeddings are processed through the proposed SEDR block and attention mechanism to be abstracted and find relevant information. Finally, image-level information is aggregated and processed to generate case-level AMR (present/absent) prediction. The contribution lies in the MC sampling method and the proposed attention mechanism, called MorphSet. A powerful advantage of their method is that it does not need image-level annotations. Also, due to the MC sampling scheme, the model provides uncertainty for case-level predictions.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Novel application and dataset. A novel approach to use attention mechanism to do confident prediction based on weakly labelled samples. Comparison of their method with a fully supervised method.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Lack of enough validation data (which is stated in the manuscript). Lack of performance comparison with SOTA methods.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Although the explanation of the paper somehow makes it look hard to be re-implemented, it is possible to implement such a method based on the provided information. Although, the results are not reproducible as the data is not publicly available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

1- What is the SE attention layer? Do you mean Squeeze-Excitation block? In this case, please cite the related paper. 2- In fig 3., it would be nice to have box plots that summarize the peak and std of the predicted probability densities for all AMR examples (it would be interesting to see the same information for non-AMR predictions as well). 3- In the last paragraph of Section 2.1, it has been declared that each image was segmented using QuPath. What do you mean by “glomerular compartment unit”? (do you mean nuclei?) What information has been extracted from these objects? And where this information has been utilized? Was it only an extra input for the pathologists to label images more accurately or it has been used in the model training process (because it is not mentioned in the methodology where (in the model) it was used)? Please elaborate on this part to avoid confusion and ambiguity. 4- It is recommended to add a block to Fig. 1 (b) to represent the concatenation operation (after global pooling). 5- What rff exactly is? Is it a trainable linear layer? 6- It is recommended to mention that the proposed method is trained in an end-to-end manner. To avoid confusion with the MIL based methods that utilize embeddings that are extracted from pretrained models. 7- Please also report case-level accuracy for the EfficientNet-B3 baseline. 8- Is it possible for the MorphSet approach to determine the most attenable image when making a case-level prediction? 9- It is nice to investigate the number of sampled images on the case-level prediction performance (accuracy and uncertainty). 10- It would be great to compare the performance of the proposed model with a deep learning and voting based MIL model which is now state-of-the-art in weak learning algorithms (like Xu et al. [15] or Ilse et al. “Attention-based Deep Multiple Instance Learning”). 11- A possible shortcoming of the proposed method in comparison to attention-based MIL methods that analyze WSIs is that here we need to select the glomeruli image (patches) manually. Having said that, for other MIL method it may be required to segment glomeruli regions before instance extraction to achieve such high results. This comment requires more experiments that authors can consider in their future work.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Novel approach, Very well written and organized (free of grammatical/spelling errors) The proposed method is relevant to the field and can be useful for other applications as well.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

MorphSet, an architecture mapping the combined encoded representations of Monte Carlo (MC) sampled glomerular compartment crops to produce Whole Slide Image (WSI) predictions on a case basis without the need for expensive fine-grained morphological feature labels, was introduced. The architecture was evaluated on kidney transplant Antibody Mediated Rejection dataset.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. An important problem of training algorithms without fine-grained morphological feature labels is tackled;
2. The model evaluated on a relevant dataset and the performance is compared agains two state-of-the-art models;
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Evaluation on more datasets will help to understand generalisability of the architecture;
2. It’d interesting to see where the model made mistakes (AUC of 0.999 is already very high though) as well as understand where the proposed model outperformed the EfficientNet, for example;
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Sharing the code and the data will help for reproducibility of the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. Evaluation on more datasets will help to understand generalisability of the architecture;
2. It’d interesting to see where the model made mistakes (AUC of 0.999 is already very high though) as well as understand where the proposed model outperformed the EfficientNet, for example;
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. An important problem of training algorithms without fine-grained morphological feature labels is tackled;
2. The model evaluated on a relevant dataset and the performance is compared agains two state-of-the-art models;
What is the ranking of this paper in your review stack?

6
Number of papers in your stack

7
Reviewer confidence

Somewhat confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The author introduced a method based on the attention mechanism and Monto Carlo (MC) to process histology patches for diagnosing Antibody-Mediated Rejection (AMR). The application is novel and is of great interest for personalized medicine in renal rejection. Technically, the use of attention mechanism to do confident prediction based on weakly labelled samples is interesting. The evaluation is highlighted by and the performance is compared against two state-of-the-art models. Please refocus the abstract on the significance and contribution. The introductory part of the abstract is too long. Also, in Fig. 2, the author mention “significant performance improvement”, how did the authors come with this conclusion? I do not see any statistical analysis performed to support the authors claim. Again, in Fig. 3 the high confidence is only obvious in the left graph only. Thus, a compelling argument is required, or statistical significance should be provided (with p-values). I would recommend reporting the case-level accuracy for the EfficientNet-B3 baseline as indicated by R2. Other valuable constructive comments have been raised by the reviewer to enhance the submission qualitylity
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

Author Feedback

We would like to thank the reviewers for their constructive comments, which we believe will help us improve upon the quality of this submission as well as the future work we intend to pursue in this direction. There were some key considerations highlighted by the meta-reviewers which we will address here in our rebuttal.

Phrasing and organization of the abstract: We intend to reorganize the abstract to emphasize the importance of our work, and will reduce the amount of background content there.

Significance of the results: We agree that the wording of the caption in figure two is misleading; we will adjust the phrasing to avoid any confusion. We will make our argument for the performance improvements more compelling by elaborating further in the captions of figures two and three, as well as in the results and discussion sections of the paper.

Case-level accuracy of EfficientNet-B3 baseline: The result reported in the ROC curve represents the unbiased case level accuracy of the EfficientNet-B3 baseline. The input probability values represent the percentage of glomeruli which were classified as AMR by the compartment-level classifier. We did this as opposed to fixing some classification threshold for AMR case level predictions (i.e. > 50% of glomeruli classified as AMR constitutes an AMR case prediction) because pathologists do not generate case level diagnoses by using a hard set threshold on their glomerular assessments. Reporting accuracy in this way would therefore not be particularly meaningful from a medical standpoint, whereas reporting our results using an ROC curve allows us to avoid this confusion. To clarify with respect to our task, AMR is divided by the dominant Banff classification (Becker 2018) into chronic active, active and chronic forms. The lesions qualifying for chronicity are scored on the most severely affected glomerulus, of which a single one would suffice. AMR activity can be diagnosed with a single glomerulus with a microthrombus, or at least 25% of glomeruli affected by glomerulitis. Since we have lumped together all three forms of AMR, a fixed threshold would not be meaningful.

Other reviewer comments: *Squeeze Excitation citation: We will be sure to include the citation for the Squeeze Excitation attention mechanism used within the MorphSet architecture and will clarify the meaning of SE in the paper. *Dataset generation, terminology: We will clarify how the data was segmented by a single pathologist using the QuPath software, and will elaborate on the medical terminology (i.e. compartments) which was used throughout the paper, as well as other terms (i.e. rff) which were not made clear. *End-to-end training: We will clarify that our model is indeed trained in an end-to-end fashion, which provides a significant advantage to other MIL approaches which use pretrained model feature embeddings. *Attenable image extraction: We will add a sentence in the future works to highlight this point; we are currently working on improving the architecture so that we can identify particularly discriminative images. *Methods clarification: We will improve the wording of the methods section so that the mathematical foundation of our approach is made more clear.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal cleared most of the previous comments. Addition of argumentative discussion for performance evaluation, missing references, and the case-level accuracy are necessary in the final version.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposes a novel method based on Monte Carlo sampling and an attention mechanism for diagnosing Antibody-Mediated Rejection (AMR). One of the key strength is that it only requires weakly labelled histology patches and not fine-grained morphological feature labels of the whole slide image. The reviews were quite positive, highlighting the novelty of the method and the clarity of the paper. The authors did not address the suggestion of comparing to more SOTA methods, but considering the page limit of MICCAI, I find this acceptable. The concerns that were raised by the meta-reviewer were discussed and addressed in the rebuttal.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

8

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors seem to address most of the reviewers’ comments sufficiently and can make changes in the final camera ready version of the paper. In particular, they explain why the ROC curves are most useful. However, Figure 3 also needs a better caption. They have not addressed the meta reviewer’s concern about statistical significance. Labelling x and y axes of the plots is also needed to make it clear what it is. If the authors can address this as well, I recommend acceptance.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

9

back to top

MorphSet: Improving Renal Histopathology Case Assessment Through Learned Prognostic Vectors