Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Mengda Guan, Yuanyuan Lyu, Wanyue Cao, Xingwang Wu, Jingjing Lu, S. Kevin Zhou

Abstract

The quality of a chest X-ray image or radiograph, which is widely used in clinics, is a very important factor affects doctors’ clinical decision making. Since there is no chest X-ray image quality database so far, we conduct the first study of perceptual quality assessment of chest X-ray images by introducing a Chest X-ray Image Quality Database, which contains 2,160 chest X-ray images obtained from 60 reference images. In order to simulate the real noise of X-ray images, we add different levels of Gaussian noise and Poisson noise, which are most commonly found in X-ray images. Mean opinion scores (MOS) have been collected by performing user experiments with 74 subjects (25 professional doctors and 49 non-doctors). The availability of MOS allows us to design more effective image quality metrics. We use the database to train a blind image quality assessment model based on deep neural networks, which attains better performances than conventional approaches in terms of Spearman rank-order correlation coefficient and Pearson linear correlation coefficient. The database and the deep learning models are available at https://github.com/ICT-MIRACLE-lab/CXIQ.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87234-2_30

SharedIt: https://rdcu.be/cyl8q

Link to the code repository

https://github.com/ICT-MIRACLE-lab/CXIQ

Link to the dataset(s)

https://github.com/ICT-MIRACLE-lab/CXIQ


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduced a dataset for CXR image quality assessment. They also tried some well-known CNN models to observe the results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • dealing with the lack of CXR quality dataset
    • collecting subjective scores and producing MOS
    • reasonable perturbations inspired by references
    • sharing the dataset
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Nothing significant to mention
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    High if they share the dataset and trained models.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • I suggest to rephrase the “our method” to anything else, because using CNN bench networks for quality assessment is not new and I don’t think it is the contribution of the paper (while reporting benchmark accuracies are).
    • If you are going to produce a benchmark, I suggest to fix the train/test or cross-validation splits and share them (plus results of benchmark networks) as the metadata for the dataset
    • There is a concern of leakage of the data. You mentioned the 80% randomly for the training phase. So, if the test and train subsets include same subjects, I don’t believe the benchmark results are blind quality assessment! Seriously we should consider this. —– maybe that is the reason the results are so high
    • Another concern is that, while you are sharing the dataset and hopefully trained models, the accuracy of deep models on this dataset is already high and there would not be a chance of efforts for improving the results. Therefore just shared models would be useful somehow. —- to deal with this concern, I suggest to produce more artifacts, such as annotation marks by persons, effect of the ribs, adversarial methods for CNNs and etc
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The chance of subject leakage between train and test, degrade the score of the paper in my opinion. So, just the shared dataset would be important.

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    The authors made a subjetive experiment with chest X ray images database after simulating several noise conditions. Then they use the annotated data to use deep learning for objective evaluation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The data base with subjective results that according with the authors will be made public. The study on the different deep learning models for objective evaluation.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The lack of clarity of the training model.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    seems possible to reproduce if teh authors share the database as they plan.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The authors should consider

    1)add the citation of the paper L. Lévêque et al., “On the Subjective Assessment of the Perceived Quality of Medical Images and Videos,” 2018 Tenth Internationa Conference on Quality of Multimedia Experience (QoMEX), 2018, pp. 1-6,

    2) Be clear in how many subjects evaluate each image?

    3) The level of noise in the plots of fig. 3 should be in a larger font. The labels on x-axis and y-axis should grow the font too.

    4) I do not agree that the deep learning models result in very good results as said by the authors in page 7. The fact that the correlations are higher do not mean that they are good for all the tested deep learning models

    5) The authors are also not very clear how the training is conducted. I believe that one part was used for training while the other part was used for testing, train part I to test part 2 and train part 2 to test part 1, but this is not completely clear. The authors do not say how they obtain the aggregated results.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Is a paper that brings an added value because of the new database that will be made public. This atracts many citations.

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    1) The authors conducted the study of perceptual quality assessment of chest x-rays and introduced the chest x-ray image quality database. 2) The authors adopted and improved the existing subjective experimental methodology to better evaluate the quality of medical images. 3) The authors trained blind image quality assessment models, constructed by deep neural networks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The authors presented a novel solution and dataset for medical image quality assessment.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is not clear what heatmap visualization means. The authors need to provide more reasoning about heatmaps, maybe conduct more experiments since right now it just show that model pay attention to the lung region but nothing specific.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The details described are enough to reproduce the pipeline, and the authors mentioned that they plan to open source the entire study.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The authors may consider to add couple of sentences about future works. It will be more clear if authors specify which models they exactly use. How many layers for DenseNet, ResNet, ResNext, ResNet-Wide and VGG?

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Novelty, clinical translation and applicability.

  • What is the ranking of this paper in your review stack?

    5

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors collected a dataset for assessing chest radiograph quality. A large number of subjects participated in the experiments of evaluating the image quality. Medical image quality assessment is widely used, especially for CT and MRI reconstruction. In contrast to providing a subjective score to the images, some objective evaluation metrics, such as SSIM and GMSD, have been proposed and used before.

    However, there are lots of details missing about the methods and results. The low-quality images are generated by intentionally adding noises. How about the distribution of the qualities of the original images? It is better to demonstrate the quality assessment of some real high-quality and low-quality images to show the practical value. The technical part of assessing the quality is trivial, based on a set of well-known CNN models. The heat map is not meaningless. In the rebuttal, I hope the authors can clarify the concerns raised by all the reviewers.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3




Author Feedback

We thank the AC and reviewers for their valuable suggestions and comments. In the following, we reply to the main points raised by the AC and reviewers. We will revise the final paper carefully to address all comments from all reviewers.

-Q: How about the distribution of the qualities of the original images? A: We aim to select the original image with high quality and did the following to guarantee this. We first selected a few public datasets with good image quality and then gave these datasets to three physicians for comparison. After comparison, they chose one of these public datasets with the best image quality as the dataset used in our paper. Then, with the physicians’ help, we further selected 60 images from this dataset as the original images of our Chest X-ray Image Quality Database.

-Q: The quality assessment of some real high-quality and low-quality images to show the practical value. A: To the best of our knowledge, there is a lack of real x-ray public datasets containing the same images with different levels of noise for the requested quality assessment. Instead, we selected 14 frames from 14 private x-ray videos of heart, chest, etc., one per video. The videos are collected with different doses thus the images are contaminated with different levels of noise. However, since the contents of these images are different, it is inappropriate to invite subjects to directly score such images. We only did the subjective comparison (as in Part I), to obtain MOS for these 14 images. Then, we used the pre-trained model to predict MOS. Such predicted MOS matches with the subjective MOS for these 14 images: The SROCC is 0.889 and the PLCC is 0.884. The result suggests that a deep learning model could be potentially applied to real images to predict their perceptual quality.

-Q: The concern of leakage of the data. A: We divided the dataset into training and test sets by group (a total of 60 groups), instead of shuffling all the images. Thus, there is no data leakage problem here.

-Q: About training details. A: Each image in CXIQ with three MOSs, MOS of Part I, MOS of Part II, and Aggregate MOS. The method of obtaining these MOSs has been mentioned in the ‘Section2.1 Data construction’ on Page 4. We use five commonly used networks to train models and test for each type of MOS, so we do not train on Part I to test on Part II or train on Part II and test on Part I. And the results are shown in Table 1 of the paper. For the network models, we chose DenseNet121, VGG19, ResNet50, ResNeXt101, and wide ResNet50.

-Q: How many subjects evaluate each image? A: Each image is evaluated by a different number of subjects. In Part I, 57 groups are evaluated by 2 subjects per image, and 3 groups are evaluated by 3 subjects per image. In Part II, since only professional doctors participate in this part, each image is evaluated by 1 subject per image. We will disclose the specific subjects involved in each image.

-Q: About the heatmap A: The meaning of the heatmap has been mentioned in ‘Section 4.2 Visualization’ on Page 8.

-Q: Where the expression is not rigorous in the paper. A: Thanks to the reviewers for pointing out this.

  1. Page 7, ‘Fourth, our models based on deep neural network achieve very good results, and this suggests that deep neural networks successfully learn perceptual characteristics of X-ray images.’ This is indeed not rigorous. Not all deep learning models can achieve better results than traditional methods; so in the final version, we will revise it to ‘…, and this seems to suggest that deep neural network has a good potential of learning perceptual …
  2. Per suggestion, we will rephrase the ‘our method’ to ‘deep learning method’.

-Q: Other concerns. A: Per reviewers, we will fix the train/test set and results of benchmark networks and share them as the metadata for the dataset. Further, we are considering adding more forms of noise in the future, and invite more physicians to participate in the subjective assessment.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    I do not think the paper is sufficient for MICCAI publications. Other than directly collecting unbiased images and evaluate their quality, the authors have collected 60 high-quality images and then added noises to generate low-quality images. The authors ignored objective quality metrics. The authors have evaluated 14 images in the rebuttal, but I still think it is of limited practical use.

    Although the reviewers have gave relatively high overall score, the ranking is consistently low within all the reviewers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Reject

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    15



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This AC has no further comments since the authors addressed the main comments well.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    While the methodology is not new (using off the shelf CNN), the problem that this paper tends to address is an important and understudied one. Authors have addressed major concerns from reviewers and I believe this percetual quality dataset and framework would be useful and of interest to MICCAI society. However the issues raised by the reviewers need to be addressed in the camera ready version. Furthermore, the consideration of only noise but not other perception artifacts such as motion, metal artifacts could you addressed in future work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    15



back to top