Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Amoon Jamzad, Alice Santilli, Faranak Akbarifar, Martin Kaufmann, Kathryn Logan, Julie Wallis, Kevin Ren, Shaila Merchant, Jay Engel, Sonal Varma, Gabor Fichtinger, John Rudan, Parvin Mousavi

Abstract

PURPOSE: Deployment of deep models for clinical decision making should not only provide predicted outcomes, but also insights on how decisions are made. Considering the interpretability of Transformer models, and the power of graph networks in analyzing the inherent hierarchy of biological signals, a combined approach would be the next generation solution in computer aided interventions. In this study, we propose a framework for classification and visualization of surgical mass spectrometry data using Graph Transformer model to empower the interpretability of breast surgical margin assessment. METHODS: Using the iKnife, 144 burns (108 normal, 41 cancer) were collected and converted to multi-level graph structures. A Graph Transformer model was modified to output the intermediate attention parameters of the network. Beside ablation and prospective study, we propose multiple attention visualization approaches to facilitate the interpretability. RESULTS: In a 4-fold cross validation experiment, an average classification AUC of 95.6\% was achieved, outperforming baseline models. We could also distinguish and visualize clear pattern of attention difference between burns. For instance, cancerous and normal burns gather more attention in the lower and higher subbands of the spectra respectively. Looking as cancer subtype prospectively, a pattern of cancer progression was also observed in the attention features.
CONCLUSION: Graph Transformers are powerful in providing high network interpretability. When paired with proper visualization, they can be deployed for computer assisted interventions.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87234-2_9

SharedIt: https://rdcu.be/cyl74

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    In this paper, The authors propose use of Graph Transformer networks to empower the interpretability of breast surgical margins assessment using iKnife modality.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The clinical background is strong and useful. The data is original. Demonstration of clinical feasibility is enough. Evaluation of this work is sufficient.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The model parameter settings were not provided. There is no significant improvement from this work comparing with the previous ones. Any other technical contributions beside the accuracy?

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The code and data via a link can be provided for readers to reproduce the work and have a comparison

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The major contribution from this work is updating Graph Transformer networks structure. So besides the information of figures, the authors need to clarify the amount of parameters, layer settings, active functions, and optimization methods. Those are not included in details in this paper. Please add a table to specify them so that the readers can follow up.

    Please add comparisons with other listed works and provide the P-values of the models to illustrate its better performance.

    Please redo the validation experiments and verify other outcomes, such as computational complex, the amount of parameters, time consuming, data robustness, etc.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Strong application and model. Evaluation of this work is also sufficient.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    The authors proposed to extend the concept of Transformers and graph neural networks for surgical margins detection, and to better interpret the classification predictions. The proposed method only test over one dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Through experiments

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    ++ A crucial problem is that the authors claim that the values given by the Softmax function generally can be used as confidences while many works show the opposite for modern neural networks e.g. https://arxiv.org/pdf/1706.04599.pdf without substantiating or investigating their claim ++ There are no tests comparing the proposed solution with other algorithms. Without these comparisons it is difficult for the reader to know if proposed method actually performed better than the state-of-the-art models ++ One dataset is not enough to prove the efficiency of proposed method ++ The proposed work is incremental. But due to lack of comparison with existing approaches, it is hard to ascertain efficacy. It is insufficient to just compare with the model that you are improving on. ++ The proposed method should be written in algorithm format. This would dramatically improve readability. ++ The cited paper are few. Please add more references so that the readers can see current trends of the broad domain

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No links to both of proposed dataset and the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Please see above

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Lack of technique contribution. Only existing methods are used. No comparison to other methods.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    Re-excision rates for women with invasive breast cancer undergoing breast conserving surgery have decreased in the past decade but remain substantial. This high positive margin rate after lumpectomy is mainly due to the inability to assess the entire surface of an excised specimen accurately during surgery. Many techniques have been proposed to accurately and rapidly detect cancer cells on the surface of excised breast specimen include mass spectroscopy (iKnife). The authors propose using a Graph Transformer model and a GNN to better assess the surgical specimens. In addition, the authors evaluate multiple attention visualization approached to increase the interpretability of these models.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The 4 level hierarchical graph is a nice visual to display a spectra.

    The authors very nicely explain their network architecture and the ablation burns are a clever surrogate for positive margins.

    Interpretability of models are included in these experiments. An important component that the most authors don’t bother to incorporate.

    Authors also perform a small prospective test set of 4 patients.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Table 1 nicely shows the performance of the Graph Transformers. The performance metrics continue to improve with 3 Graph Transfromer. What was the rationale for stopping with 3 GTLs?

    Although ablation burns were used, the models should have evaluated the surgical margins for residual tumors. Burns, especially those caused by surgical cautery do not have the heterogeneity of tumor cells, that have skip lesions and different depths of tumor invasion.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Although the authors do a nice job explaining their network architecture, insufficient details were provided for the explainability component, specifically how the multiple attention visualization were generated.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The authors propose a set of elegant experiments and the XAI aspect is very interesting. However, my enthusiasm is dampened by two issues. First, rather than ablation, the authors should have looked at positive margin rates and see how their models performed compared to pathologist. Second, iKnife and other mass spectroscopy techniques are really research tools for this setting. As a result, the DL techniques proposed in this aper are quite far from clinical translation

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Positive surgical margins is a good question, but their approach to the problem with the use of mass spectroscopy makes it a niche solution. As a result, I wonder of the general interest to this paper.

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors propose a Graph Transformer model to empower the interpretability of breast surgical margin assessment. This model can realize the classification and visualization of the surgical mass spectrum. It is difficult to ascertain the efficacy of proposed model due to the lack of comparison with existing methods.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    9




Author Feedback

The reviewers acknowledged the clarity of our manuscript, strength of the clinical application, thorough experiments, and the significance of model interpretability included in this work. They also asked for clarification on comparison with other methods, gold standard labeling, and the implementation of the attention mechanism. Minor modifications were also requested which will be addressed in the paper within the page limit.

*Comparison with other methods In our paper we compared the performance of our graph transformer (GT) model (AUC=95%) with a 3-dense layer non-graph model (AUC=85%) and found the improvement to be statistically significant (p-value 0.001). To address the recommendation of reviewers, we also implement and add two new baseline models for comparison, a non-graph CNN model (AUC=87.6%), and a non-transformer graph model with 3 graph convolutional layers [Kipf 2017] (AUC=91.8%). Our model outperforms the new baselines (p-value 0.008 and 0.022, respectively). While our proposed GT model achieves high performance, it is not the only goal of our study. Our contributions also include i) representing and generating a multi-level hierarchical graph with rich node features that preserve the critical positional information for intra-operative mass spectrometry, for margin detection; and ii) interpretation of the model and visualization of the transformer attentions to provide feedback to clinicians. As R3 nicely articulated, “Interpretability of models are included in these experiments, an important component that most authors don’t bother to incorporate”. Particularly, for computer assisted interventions (CAI), the interpretability of the results is often as critical as performance. We will add the baselines to our table and rephrase the last paragraph of introduction to clarify these contributions.

*Gold standard labeling & Performance on other Data Our manuscript is a contribution to CAI where the data was collected in clinic and represents an accumulation of 2 years of acquisition. While it would be ideal to have true mixed-margin (mixed cancer and normal cells) burns to test our application, it is not possible to explicitly collect these without compromising routine pathology assessment. As our intraoperative mass spectrometry approach, iKnife, is a destructive method, we are only given sampling opportunities from excised tissue at locations that are confidently homogeneous, and at the discretion of a pathologist. Therefore, as the reviewers mention, our data is the ideal surrogate to utilize. The performance of graph transformer networks has been evaluated for several graph datasets including ZINC (molecular dataset), CLUSTER (synthetic dataset with 6 node clusters), and PATTERN (graph dataset with binary node labels) [Dwivedi 2021]. We will add clarification on both of these points in the paper.

*Attention mechanism & softmax We agree with our reviewer that “the prediction probability from a softmax distribution has a poor direct correspondence to confidence” [Hendrycks 2017], and do not use softmax for any confidence calculations. Our model uses a softmax to generate normalized weighted scores that represent the attention distribution within the layer. Unlike confidence which corresponds to the final classification prediction, the attention scores contribute to updating the node features in each graph layer of our model.

*Code availability & Parameters The code will be made available on the authors GitHub. The ablation process for layer hyperparameters (# GT layers, # hidden features, # attention heads) is mentioned in the paper. We tried up to 4 GT layers and found that the performance plateaued after 3. We decided to use 3 layers to minimize the computational complexity and avoid overfitting. Following recommendations from reviewers, we have added details of # parameters (490K), optimizer (Adam, lr=1e-4), batch size (16), and activation (ReLU) to the manuscript.

New references have been added as well.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose a Graph Transformer model to empower the interpretability of breast surgical margin assessment. This model can realize the classification and visualization of the surgical mass spectrum. The authors did well in rebuttal. Most of the questions and concerns raised by reviewers (including meta reviewer) are well addressed.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    6



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper uses graph transformer for surgical margin characterization. The major concern, i.e., lack of baselines, was addressed well in the rebuttal. Meanwhile, the visualization and explanation perspective is particularly appealing. I recommend the paper to be accepted. This paper could be further strengthened if the authors could cite other GNN explaining works, e.g., GNNExplainer.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes to apply graph transformers to improve interpretability of surgical mass spectrometry.

    One reviewer finds the use of iknife data original but limited contribution and lack of key technical details.

    A second reviewer highlights a lack of comparison with other algorithms, limited evaluation, and lack of technical contribution.

    A third reviewer also highlights a limited evaluation notably on residual tumors and lack of technical details on the multiple attentions.

    A consensus exists on a limited technical contribution, as an existing method has been used on a new data, and a limited validation. A major aspect is a missing comparison with existing or equivalent comparative methods, hindering the appreciation of the contribution. To my knowledge, the authors have not proposed graph transformers. The rebuttal did not clarify the clear novelty of the contribution. This is despite the use of an original data in the medical image analysis field. New experiments with new baselines were achieved post-submission, this is beyond what should be evaluated at submission time. We encourage a new submission.

    For these reasons, Recommendation is toward Rejection.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    23



back to top