Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Talha Qaiser, Stefan Winzeck, Theodore Barfoot, Tara Barwick, Simon J. Doran, Martin F. Kaiser, Linda Wedlake, Nina Tunariu, Dow-Mu Koh, Christina Messiou, Andrea Rockall, Ben Glocker

Abstract

Whole body magnetic resonance imaging (WB-MRI) is the recommended modality for diagnosis of multiple myeloma (MM). WB-MRI is used to detect sites of disease across the entire skeletal system, but it requires significant expertise and is time-consuming to report due to the great number of images. To aid radiological reading, we propose an auxiliary task-based multiple instance learning approach (ATMIL) for MM classification with the ability to localize sites of disease. This approach is appealing as it only requires patient-level annotations where an attention mechanism is used to identify local regions with active disease. We borrow ideas from multi-task learning and define an auxiliary task with adaptive reweighting to support and improve learning efficiency in the presence of data scarcity. We validate our approach on both synthetic and real multi-center clinical data. We show that the MIL attention module provides a mechanism to localize bone regions while the adaptive reweighting of the auxiliary task considerably improves the performance.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87234-2_74

SharedIt: https://rdcu.be/cyl9l

Link to the code repository

https://github.com/biomedia-mira/atmil

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    Authors are proposing an MIL framework for analysis of whole body MRIs for multiple myeloma (MM) classification. MM affects the bones in the patients so as a first step they propose a method to segment the bones from MRI. They use post processing to further refine the segmentations. At the second stage, tiles extracted from the bones are pass to the the MIL for classification. They an auxiliary task of anatomical label prediction to help the flow of the gradient. The main task is performed after the attention pooling patients are classified as non-active, focal or defuse. The knowledge from the auxiliary task is transferred to the main task by minimising the Fisher divergence between the two gradients.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is nicely organised and written. Very good introduction, nice background on the disease and current method of diagnosis in clinics.
    • MIL approaches are explained well. Authors are providing a clear explanation of their method as well as the details required for reproducing their approach. -Authors are establishing a nice discussion and properly compare with similar state of the art methods.
    • Usage of adaptive weighting to combine the gradients of the two tasks seems to be a good extension to previous methods of attention based MIL
    • Their method can be clinically significant helping in diagnosis of MM, helping the experts find myeloma patterns thanks to the attention mechanism.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    A question that remains in my mind after reading the paper is whether the attention mechanism was pointing to the parts of the bone relevant for the diagnosis? In introduction, authors state that “lesions can be scattered across the skeletal system and make up only a small fraction (≈≤ 10%)”. Fig 4 (D) is not really inline with the fact, the attention is covering almost all of the bones.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The implementation details are nicely described and the paper seems to be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • The main and the auxiliary tasks, although very similar to each other, on the last steps of training, probably the gradients from the auxiliary task can be noisy and prevent the convergence of the main task. Comparing the results (F1 score) from Table 2 supports this idea as you are doing better with “MIL + att” alone compared to “MIL + att + uniform” and of course a more sophisticated combination method for the two leads to better results. Something that doesn’t seem right is “MIL + att + WL”. The auxiliary task helps training a lot, other rows of the table prove this point. The model without the auxiliary task seems to be fine as well. But in “MIL + att + WL”, the auxiliary task not only seems not be helping, but it is even performing worse than “MIL + att + uniform”. If gamma = 0.5 and the model is trained for 1000 epochs, probably a bigger beta value is desirable because quickly only after 10 epochs ( when 1% of training done ) the coefficient value falls under (0.0001) and auxiliary task becomes useless. Maybe revising the training of “MIL + att + WL” would place it more correctly in the table.

    -There is a lot of room for improvement of the Qualitative assessment. Maybe asking the expert to look at a few images to find the myeloma patterns and demonstrating that the model pays attention to them can highly clarify the interpretability of the model while introducing the reader to the problem as well. If the patches are 3D, shouldn’t the attention be visualised in 3D as well?

    • Later for the extension of the work, It can be interesting to have more auxiliary tasks. Probably all of them won’t contribute to the training equally so w_m can be different for each of them or learned somehow? More work in this direction seems very interesting.

    • In table 1, accuracy is reported for different number of instances, but the caption says “each bag contains 100 instances”.

    • Looking at table 2, It is a bit unclear how using weighted loss (MIL+att+WL) results into a lower F1-score compared to having same weights for both losses during the training (MIL+att+uniform) or not even using the auxiliary task (MIL + att).

    • Dataset can be visualised better. Maybe a showing how myeloma patterns look like in fat and water or b900 scans can help the reader understand what authors are searching for.
    • How big is the MR image and how many 16x16x16 patches are obtained roughly per sample? Attention maps seem coarse.
    • Although the method is proposed for M auxiliary tasks, it is tested only against one.
    • The limitations, cases the model fails to classify properly and future improvements are not mentioned.
    • It is unclear what are mistakes the model makes. A confusion matrix or area under PR per class could help understanding the model’s performance.
  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although some minor problems exist in reporting of the results on the dataset, in general their method is interesting and can be a step forward for the community in training of MIL models using auxiliary tasks.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    The document describes a MIL method with an attention mechanism.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A major plus of this work is that only patient-level labels are needed for training.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There is no review of the state of the art or comparison of the results with other published methods on the same field.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I believe the paper contains enough detail to allow for reproducibility.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The document describes a Multiple Instance Learning method with an attention mechanism. A major plus of this work is that only patient-level labels are needed for training.

    Some comments include:

    • There are some non-initialized acronyms, such as MM. Note that the abstract is not an integral part of the document. Initialization on the abstract does not count. They should be reinitialized in the main document.
    • Please refrain from using common expressions such as “search of a needle in a haystack” in scientific writing.- I cannot find a review of the state of the art.- Although Table 2 presents the use of [14,20,21,22,10], I cannot find a comparison of the results with other state of the art results on the same (or similar) problems.
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The methodology used seems correct and the work has practical applicability.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    This paper proposed a MIL framework for the diagnosis of multiple myeloma from WB-MRI. An attention mechanism is used to identify local regions with active disease. Additionally, they integrated an adaptive weighting scheme to leverage an auxiliary task to support the learning of the main task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written and easy to follow. It includes comprehensive experiments on both a public dataset and a medical dataset and compares the proposed approach with existing MIL techniques.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There are some typos and grammatical errors that need to be fixed for the final submission.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The paper tested the approach on both medical and public dataset that increase the reproducibility and generalizability of the technique. The detail of the model was clear.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    There are some typos and grammatical errors that need to be fixed for the final submission.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    • The paper is well written and the details are clear.
    • They reported a comprehensive experiments and compared the technique with existing approaches.
    • The proposed method was applied on both medical and a public dataset.
  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors are proposing a MIL framework for the analysis of whole-body MRIs for multiple myeloma (MM) classification. The framework consists of two steps: step I to segment bones from MRI, which also used to sample instances. The second step to classify patients (bag label) into non-active, focal or defuse using multiple instance learning (MIL) with an attention mechanism and an adaptive weighting scheme to adjust the contribution of an auxiliary task. The paper is generally well-written, provides a nice explanation to justify the design of their method and sufficient comparison to other state-of-the-art MIL approaches (code also available). Yet there are some important issues that need to be addressed in the rebuttal: 1) it is not entirely convinced whether the attention mechanism was pointing to the parts of the bone relevant for the diagnosis (R1). If it is difficult to evaluate on a real dataset, it should be straightforward to evaluate on synthetic dataset Morpho-MNIST. 2) The other issue raised by R3 is about the state-of-the-art for MM classification. It needs to be included either in comparison or at least addressed in the introduction.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4




Author Feedback

We would like to thank the reviewers and AC for their valuable time and insightful comments on our paper. The feedback seems very positive and we are grateful for the suggestions for future work. There are two main points raised by the reviewers, a) efficacy of attention mechanisms on real-world or synthetic data, and b) discussion of state-of-the-art methods for multiple myeloma (MM) classification.

Regarding the first point, the current clinical procedure for MM diagnosis with whole-body MR (WB-MR) is based on visual assessment with a disease pattern categorization on the regional/bone-level. Hence, our current goal was to generate attention maps by aggregating individual instance predictions within segmented bone regions. Localised annotations/delineations of disease patterns are not easily available and not generated as part of the clinical process. Due to the nature of MM, lesions can be scattered across the skeletal system and may cover a few or almost all of the bones. In Fig. 4, we exemplify two common scenarios where patients may have disease only in a fraction of bones or spread across multiple bones. This is shown separately for both disease patterns (diffuse and focal).

The AC is right that it is challenging to evaluate the attention maps on real data in the absence of lesion segmentations. Demonstrating the efficacy of attention mechanisms on Morpho-MNIST, however, is possible. As presented in the paper, we had transformed MNIST digits to mimic different disease patterns, as shown in Fig. 3. To validate the effectiveness of identifying meaningful instances via the attention mechanism, after processing each bag, we can sort the instances in descending order with respect to their attention weights (as defined in eq 2). We observe digits with local thickness and global & local thickness (representing disease categories) attain higher attention weights for positive bags confirming the efficacy of attention. This illustrations will be added as supplementary material.

Regarding the second point, diffusion-weighted WB-MR is a new diagnostic tool, only recently being recommended for patients with myeloma but not yet widely used. To our knowledge, there is very little if any previous work on developing computational methods for MM classification in WB-MR. Prior work mostly focused on PET/CT and is not directly applicable. We will clarify this in the introduction and add pointers to the literature on PET/CT MM imaging.

R2 has made great suggestions for an extended version. We will add the requested details about the image data and patch sizes, and we have a more detailed analysis including a confusion matrix readily available which can be added to the supplement. R2 further makes good observations about the behaviour of the ‘MIL+att+WL’ variant, and we agree that this would be interesting to investigate further, as the training setup might be sub-optimal. We also agree that the investigation into multiple auxiliary tasks is an interesting direction for future work. Thanks for the suggestions.

R3 asks for a discussion of previous work which is related to the second point above. To our knowledge, there are currently no computational approaches for MM classification using WB-MR. We will make this more clear in the introduction and add pointers to other works in the context of PET/CT imaging.

R4 points out some minor corrections which have been incorporated.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have addressed all concerns and I happily recommend the paper acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    While the authors only partially addressed the concerns in their rebuttal, the paper addresses an innovative application with sound methodology and evaluation. I therefore recommend acceptance of the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I think the authors have done a good job in addressing the reviewers’ questions. I would like to recommend to accept.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2



back to top