Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Siyuan Liu, Kim-Han Thung, Weili Lin, Pew-Thian Yap

Abstract

Deep learning based image quality assessment (IQA) is useful for automatic quality control of medical images but requires a large number of training data. Though using multi-site data can significantly increase the training sample size and improve the performance of the IQA model, there are technical and legal issues involved in the sharing of patient data across different sites. When data are not sharable, devising a single IQA model is that applicable to all sites is challenging. To overcome this problem, we introduce a multi-site incremental IQA (MSI-IQA) method for structural MRI, which first trains an IQA model from one site, and then sequentially and incrementally improves the IQA model in other sites using transfer learning and consensus adversarial representation adaptation (CARA) without explicit data sharing between sites.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87234-2_36

SharedIt: https://rdcu.be/cyl8w

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors designed a multi-site incremental IQA system for automatic QA the structural MRI using multiple deep learning approaches.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The proposed method applied several deep learning approaches to automatically QA the cross-site MR images, without re-training the model when new data was added. Overall this is an interesting and potentially very useful system.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The MR images at this period of age show dramatic structural alterations, which is not validate/discussed in author’s results and discussion.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Matches what the author complained in reproducibility checklist.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Overall this is a very interesting, easy approach and user friendly design and has a lot of potentials. The only concern I have is that the MR images between 6-12 month has a lot of structural alterations, such as the inverse of GM/WM intensities, rapid changing partial volumes, etc. I am not sure if the transfer learning is able to address these issues, and whether these issues would influence the image QA. Also I am looking forward to seeing more results related to aging images using the method proposed by authors.

  • Please state your overall opinion of the paper

    ground-breaking (10)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The organization of the article, the novelty and the results

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    This paper proposed a pipeline based on deep learning incremental multi-site IQA method for pediatric structural MR images without the need of inter-site data sharing.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well organized and proposed methods are reasonable to perform the task in the paper.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The numerical results are weak to support the new pipeline as the authors just listed the accuracy table. No parameter tuning, no sensitivity analysis. The paper combines several existing modules for the proposed pipeline, such as transfer learning, random forest.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I am not 100% confident about the authors (not specific to this paper) releasing the data/code as many researchers agreed so, but refuse to do that after acceptance. I do not think this question should be significant for accetance or not, but it is a good way to encourage open access code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    I would suggest the authors to do more numerical experiments and provide more evidence for the proposed idea.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The innovation in terms of methodology seems limited. The numerical experiments are not well illustrated to show the effectiveness.

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Somewhat confident



Review #3

  • Please describe the contribution of the paper

    In this paper, the authors propose a novel framework aimed to perform an image quality assessment by incremental training using data from different sites without share the actual images. The method is based on transfer learning and an adversarial representation adaptation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The idea of developing a framework that avoids sharing the data is great and the clinical applications for this can be relevant.

    The paper is well written and easy to follow.

    Although some elements of the pipeline are fairly standard the proposed method seems to provide some contributions.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    There is no ablation study on the proposed approach.

    The approach is not compared with other state-of-the-art solutions or other baseline and it is hard to evaluate the possible improvement and the actual contribution of the proposed method.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Although the code and the data are not publicly available, the paper provides enough details to make the code reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    What is the difference between the proposed approach and approaches proposed the in domain adaptation field?

    An ablation study obtained disabling the different blocks of the proposed pipeline should be presented to understand what is the contribution of each element of the pipeline.

    From the proposed experiments, it is obvious that using more data (although not directly) on the training would increase the classification performance. Another experiment that I would do is to compare the proposed method with a simple baseline that is obtained training the network using all the images (from both source and target domain) in one go without transferring the model between sides. Does your approach have similar performance?

    Comparison with other existing approaches is missing in the experiments.

    Other possible combinations of source and target domain should have been included in the experiments (i.e. inverting the source and target domain).

    Is not clear to me what is the difference between existing and new data in the dataset. Additionally, what is the aim of the unlabelled data in the dataset that seem not used in the experiment? The overall description of the dataset should be improved.

    The flow path in Fig.1 is not easy to follow. The 4 stages seem not connected and it is not clear what models or block are transferred between these stages.

    The authors mentioned that the pipeline is developed taking into consideration the computational time, but no experiments are conducted in connection with the speed of the approach.

    The three criteria described in section 2.2 “Slice Self-training” are not clear to me. Can you be more explicit on what are the iterations and the probability mentioned there?

    In section 2.3, the authors proposed to relabel slices with high confidence. Can you explain who does this relabelling and how can you say that is high confidence?

    Minor: What does the following sentence mean? “Note that due to site and age differences, the data from these three datasets may have different distributions, which we resolve using transfer learning and adversarial learning in MSI-IQA”

  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    A comparison with other state-of-the-art approaches is essential to understand the contribution of the proposed method.

    An ablation study that illustrates the improvements provided by each of the different components would be relevant to understand the importance of the different part of the pipeline.

    The unlabelled data of the dataset seems not used in the paper.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper works on image quality assessment for clinical applications. The reviewers reached consensus on the novelty and relevance of the work. Although the overall comments are positive, concerns on experimental design and lack of comparison were also raised. I would like to invite the authors to offer their thoughts on these raised issues in the rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5




Author Feedback

Reviewer #3

  • No parameter tuning and sensitivity analyses The confidence thresholds in slice/volume self-training procedures are utilized to relabel or remove slices/volumes with noisy labels and retain slices/volumes that can be classified with high confidence for network retraining. The thresholds are set according to the training accuracy. The high sensitivity, specificity, and accuracy results in Tables 1 and 2 indicate the effectiveness of our method.

Reviewer #4

  • Difference between the proposed method and domain adaptation methods Most domain adaptation methods require all training data to be annotated correctly. Most existing methods are not designed to work in our case where incorrect labels might exist in the training data. We overcome this problem by employing fine-tuning-based transfer learning instead of domain adaptation in Stage 2. In addition, we employ domain adaptation in Stage 3, i.e., CARA, for incremental adaptation to newly acquired images in the target site.

  • Lack of ablation study Tables 2 and 3 show the quantitative results on existing and new data at the target site respectively, which can be considered as ablation studies. Specifically, “source-site training” corresponds to MSI-IQA with Stage 1 and “individual target-site training” corresponds to MSI-IQA with Stage 2. In comparison, MSI-IQA with Stages 1-3 outperforms these two ablation cases on both existing and new data at target site. This indicates the effectiveness of each element of our method.

  • Comparison with existing approaches To the best of our knowledge, this is the first attempt to use a small dataset with label noise to train a deep neural network for multi-site incremental IQA of pediatric MRI. Other deep learning-based IQA methods are all trained on clean datasets whose labels are correctly annotated.

  • What are unlabeled images used for? The source-site unlabeled images are utilized in slice self-training to augment the small labeled dataset. Slices that can be classified confidently are selected for slice self-training.

  • Difference between existing and new data at target site The existing data in the target-site are images with 10 to 13 month-old subjects and the target-site new data are images with 13 to 24 month-old subjects.

  • How are weights transferred in four stages? The weights of source-site trained encoder and classifier in Stage 1 are utilized as the initial weights of encoder and classifier in Stage 2, respectively. In Stage 3, the weights of the encoder trained with existing target-site images are utilized as the initial weights of the encoder trained with newly acquired target-site images. In Stage 4, no weights are transferred.

  • Lack of computational results. We use light-weighted models for SQA and VQA, i.e., a shallow residual network for SQA and a random forest for VQA, for speed. The computational costs per slice and per volume are 10.71 ms and 355 ms, respectively.

  • What are “iteration” and “probability” in slice self-training? “Iteration” is a repeat of the relabeling and retraining process. “Probability” refers the output probability of the classifier.

  • Relabeling slices with high confidence Our method compares the maximal probability with a confidence threshold to determine which slice to relabel or remove. If the predicted label is identical to the prediction in the previous iteration or the label is predicted with high confidence, i.e., the maximum probability exceeds a threshold, this slice is kept in training data. Otherwise, the slice is removed.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal addressed the main concerns on ablation experiments and comparison to previous work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    9



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    In the rebuttal addressed some points but some important points, espcieally ones raised by R4, remain unaddressed or insufficiently addressed. The authors position their paper in a very narrow niche for “multi-site incremental IQA with label noise” and claim that comparisons to other SOTA solutions are not warranted because they are not designed to work with label noise. I believe it is not a priori clear that related work would automatically fail in the noisy label scenario, but rather this assumption should be empirically proved. In a related vein, we never see the benefit of individual components such as the slice self-training in an ablation study. So it remains open how much the mechanisms against label noise even add to the method.

    I believe this is promising work that needs some more evaluations and perhaps, due to its large number of components, should be presented in a less compact medium such as a journal submission.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    15



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper proposes a cross-site Image Quality Assessment of MR images, based on incremental training without sharing the raw images. The request of doing more numerical experiments or comparison to state-of-the-art/ablation study has only been partly answered. But overall, the reviews are positive and the authors addressed most of the concerns.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



back to top