Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jiajin Zhang, Hanqing Chao, Xuanang Xu, Chuang Niu, Ge Wang, Pingkun Yan

Abstract

The extensive use of medical CT has raised a public concern over the radiation dose to the patient. Reducing the radiation dose leads to increased CT image noise and artifacts, which can adversely affect not only the radiologists judgement but also the performance of downstream medical image analysis tasks. Various low-dose CT denoising methods, especially the recent deep learning based approaches, have produced impressive results. However, the existing denoising methods are all downstream-task-agnostic and neglect the diverse needs of the downstream applications. In this paper, we introduce a novel Task-Oriented Denoising Network (TOD-Net) with a task-oriented loss leveraging knowledge from the downstream tasks. Comprehensive empirical analysis shows that the task-oriented loss complements other task agnostic losses by steering the denoiser to enhance the image quality in the task related regions of interest. Such enhancement in turn brings general boosts on the performance of various methods for the downstream task. The presented work may shed light on the future development of context-aware image denoising methods.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87231-1_43

SharedIt: https://rdcu.be/cyhV0

Link to the code repository

https://github.com/DIAL-RPI/Task-Oriented-CT-Denoising_TOD-Net

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a task-oriented low-dose CT denoising network with knowledge from downstream tasks. Empirically validated with liver and kidney segmentation tasks. Results reported for ROI and whole image for the image enhancement performance, with significant improvement over the baselines.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper has a very good structure and it is a pleasure to read.
    • The authors propose a great solution to an important problem.
    • To the best of my knowledge the proposed approach is innovative enough.
    • Good presentation of the results.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Any quantification of the dose reduction potential by the proposed model which certainly takes away some value of the paper for potential practical use.
    • Relatively weak denoising baselines.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Should be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Major:

    • How would this synthetic low-dose CTs fair with low-dose in real setting, for the particular application? Did you consider using any regular denoising dataset?
    • It is not clear how much dose is reduced through this low-dose synthesis. And what would be the clinical value of it?
    • Would be nice to see the results of TOD-Net without WGAN D?
    • Details on the segmentation networks are not provided. Are the architectures (U-Net, V-Net, Res-U-Net, Vox-ResNet) used in their original forms?
    • How the hyperparameters were selected? For example, is 0.5 the optimal choice for \lambda?
    • Were the models given similar opportunity in terms of the #parameters and the execution time?
    • Cross-dataset evaluation will be more interesting, in terms of the generalizability test.
    • It looks like TOD-Net for LiTS performance is better over the NDCT pretrained performance for each of the models. What’s the explanation for that?
    • Literature review is not thorough enough.
    • In Table 1, are the percentage SSIM and RMSE scores reported? It should be clarified.

    Minor:

    • What is ‘NLCT’ on page 2?
    • The authors could consider reporting detailed L1-WGAN and Perceptual-WGAN results too. -The authors should use {LiTS}, {CT} for the references.
    • What’s the batch size used for training TOD-Net?
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although there are some issues on the clarity and experimental setup, the innovative problem formulation and the results could be worth sharing with the community.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    The contributions are as follows -

    Addition of a downstream task oriented section compared with existing implementations and ideas to improve improve denoising.

    Extensive comparison with pre existing methods and the proposed method on two datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The primary strength is in the performance gain obtained from utilizing the additional task oriented down stream module. It is always advisable to provide additional task for the framework to solve and this paper does just that and demonstrates a nice way to help regularize the framework to solve the task without overfitting. Since the framework has multiple tasks to solve with one being primary, its parameters are utilized effectively.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The major weakness of this paper is that they do not mention their preprocessing steps. I have worked on the lits liver lesion dataset and without prior pre processing the foreground contrast/clarity is never achieved as shown in the images.

    Eg - the LDCT image of Fig 1 are preprocessed LITS dataset images.

    While which do not hamper the credibility of the performance gain since all the frame works tested use the same image. It is still questionable how much of a performance gain can be obtained if the framework TOD-Net had to work on the original unprocessed images.

    Figure 2 - Seg Pred (T(G(x))) shows an output image which is far from the ground truth. I would love to see a few more samples of the performance. While the proposed methods outperform the methods compared with in the paper, it does not obtain the state of the art performance on the LITS dataset for liver segmentation.

    Some papers which have obtained over 95% Dice are listed below.

    1) Dey, Raunak, and Yi Hong. “Hybrid cascaded neural network for liver lesion segmentation.” 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, 2020. 2) Yading Yuan, “Hierarchical convolutional convolutional neural networks for automatic liver and tumor segmentation,” arXiv:1710.04540, 2017. 3) Li, Xiaomeng, et al. “H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes.” IEEE transactions on medical imaging 37.12 (2018): 2663-2674.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It is reproducible and all datasets used are public.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The primary changes I would ask the authors to do would be to use the denoising framework on completely unprocessed images. It would demonstrate the maximum capabilities of the framework.

    The paper already has a nice idea and improves on previous works. It just can be extended to work with unpreprocessed images to show the full potential.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    It outperforms previously used methods for the task of denoising.

    The idea to add additional task downstream is good and should be used more prevalently and deserves a spot at MICCAI

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    The paper proposes a denoising network for denoising low-dose CT images, with a downstream network to produce outputs that are both realistic and task-relevant.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The major strength of the paper is the use of a task-oriented network to learn denoised outputs which are enhanced for the downstream task instead of keeping a task-agnostic network. Although task-oriented networks are used in non-medical contexts [1] [2], it seems to be novel in the context of medical CT denoising.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The major weakness of the method is in the evaluation criteria. The data required in an actual clinical setting (a noisy low dose CT image, its corresponding denoised image, and segmentation, or other task-related data) may be difficult to obtain. In the paper, corresponding low-dose CT images are obtained by using a CT simulator and are used for training. In such a case, evaluation should also have been done on real low-dose CT images to evaluate and compare performance, and domain gap between simulated and real low-dose images. Some parts of the paper are redundant which should be changed (more in detailed comments).

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Although there is no code, the implementation details including architecture, datasets, baselines, etc. are explained meticulously, therefore the work looks more or less reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    -> The paper proposes to use a task-oriented network for low CT denoising, with the outputs enhanced for a task (such as segmentation). The problem has interesting clinical applications. The paper is generally well written and implementation details are mentioned clearly. -> One of my major concerns is the evaluation of the method. For all of the methods, synthetic data is used as low-dose CT images. Due to the nature of the problem formulation, the method requires low-dose and its corresponding denoised version, which may be very difficult to obtain. Therefore, the method needs to perform well when trained on simulated data and then tested on real low-dose CT images in terms of downstream task performance. -> Is the task network pre-trained with the clean images and their corresponding segmentations, or is it jointly trained with the TOD-net? It seems a bit unclear in the paper. Depending on the answer, the training dynamics may differ (feedback to the downstream network may also come from generated images during training). -> In Table 1, MSE-WGAN seems to perform better than VGG-WGAN in almost all cases. However, each of the three metrics in Table 1 (SSIM,PSNR,RMSE) have MSE-like terms, which may be making MSE-WGAN favorable. The paper can show qualitative results (or other metrics with perceptual-style losses) to show that MSE-WGAN results are indeed better. Moreover, instead of using a pretrained VGG (which is trained on natural images), one can train a task agnostic network for the medical images (train a VGG-based autoencoder, for example). -> Section 2.3 and Figure 2 are very redundant. Since \hat{x} is the output that is used in each of the four losses in equations 4-8, an easy calculation of the chain rule will imply the presence of the term d\hat{x}/d\theta_G in the gradient of each of these losses. The entire section simply shows the presence of this term in all gradients, which is not very useful in terms of adding context to the rest of the paper. -> Similarly, Figure 2 shows the gradients with respect to the output image. I’m not sure about the conclusion derived from the figure. Also, it may be useful to visualize all gradients in the same colormap to compare the differences.

    Typos/low level errors in the paper: -> nerual -> neural -> denoisor -> denoiser -> In equation (1), (2), (3), argmin is used. Note that argmin will return the optimal parameter of the objective function, and not the function itself. Please correct the notation in these equations (remove argmin and use it only to get \theta^*).

    [1] Fang, Kuan, et al. “Learning task-oriented grasping for tool manipulation from simulated self-supervision.” The International Journal of Robotics Research 39.2-3 (2020): 202-216. [2] Liu, Fang, Licheng Jiao, and Xu Tang. “Task-oriented GAN for PolSAR image classification and clustering.” IEEE transactions on neural networks and learning systems 30.9 (2019): 2707-2719.

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea behind the paper is interesting and has a very good application. However, the evaluation can be improved substantially. Moreover, there are sections which are not useful in the paper (see detailed comments). Therefore, my current recommendation is a borderline rejection.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    All three reviewers agree this is a novel paper with clinical importance. The paper is in general well written and the results are well presented. However reviewers all raised questions regarding the methodology details, including pre-processing, validation, segmentation method, comparison with SoTA. I have noticed Figure 3g upper panel uses blue for the entire segmented region, which means false positive, but Figure 3h actually uses red in part of the same region to indicate correct segmentation. Thus blue was misused in Figure 3g and cover more than it should. Thus the recommendation is invitation to rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    1




Author Feedback

We are grateful to the reviewers and AC for acknowledging our contributions and considering our work as “a great solution to an important problem”. The reviewers’ questions mainly regard the methodology details including image preprocessing, validation scheme, segmentation model and comparison with SoTA methods. In this rebuttal, we clarify several important details. We will release our source code on Github once the paper is accepted so readers can find all the details in the code.

  1. Preprocessing steps not mentioned(R2) For all the datasets, the pixel intensity is clamped with a window of [-200,200]HU and normalized to [0,1]. In Sec. 3.1, to train a denoising network, we first resized the images to have an in-plane resolution of 1.5mm×1.5mm and a slice spacing of 1mm. Then, the axial slices are center cropped/zero padded to a size of 256×256 pixels. To generate training batches, the 3D LDCT images and segmentation ground truth are split into multiple overlapping 256×256×32 sub-volumes. In the validation and test phase, the denoising network works directly on the original volume with size of 512×512×#slices.

  2. Lacking validation on real LDCT images(R1,R3) As mentioned by R3, a dataset including paired LDCT and NDCT with segmentation annotations is hard to obtain. Thus, to evaluate the performance of our TOD-Net on real CT images, we add a new experiment applying it on the original LiTS and KiTS datasets. We evaluate the denoiser performance with 5 segmentation networks. The results show that TOD-Net brings significant improvement for all segmentation networks on KiTS. On LiTS, all the networks perform better except the performance of ResUNet decrease a little with no significant difference. – | UNet | VNet |ResUNet|VoxResNet|H-DenseUNet Org Img(LiTS)| 94.3 | 92.8 | 93.5 | 92.1 | 92.9 TOD-net(LiTS)| 95.0 | 93.4 | 93.2 | 92.4 | 93.5 P-value(LiTS) | 0.04 | 0.04 | 0.08 | 0.06 | 0.04 Org Img(KiTS)| 90.9 | 89.6 | 89.8 | 89.4 | 91.1 TOD-net(KiTS)| 91.5 | 90.4 | 91.0 | 90.4 | 91.8 P-value(KiTS) | 0.03 | 0.03 | 0.01 | 0.01 | 0.02

  3. Segmentation model structure & hyperparameter selection(R1) In our work, all the segmentation networks (U-Net, V-Net, Res-U-Net, Vox-ResNet and newly added H-DenseUnet) use the same structures as described in the original papers. The hyperparameter λ,weight of the MSE loss, was empirically set to be 0.5 and generated satisfactory performance.

  4. Comparison with additional SoTA methods(R1,R2) R1: Baseline denoising models relatively weak Following R1’s suggestion, we further compared the TOD-Net with a more recent denoising model WGAN-SA-AE(WSA)[1]. The results below show that our TOD-Net outperforms WSA on low dose LiTS and KiTS datasets for both denoising and segmentation. Methods| SSIM(%)| PSNR |UNet(Dice %)| ResUNet(Dice %) WSA(LiTS) | 73.1 | 22.7 | 93.0 | 91.5 TOD-Net(LiTS)| 76.7 | 23.3 | 93.9 | 92.8 WSA(KiTS) | 67.2 | 22.0 | 88.9 | 88.8 TOD-Net(KiTS)| 69.3 | 22.0 | 90.2 | 90.0 [1] Li et al., SACNN, IEEE TMI, pp. 2289-2301, 2020.

R2: TOD-Net not tested with SoTA segmentation networks Following R2’s suggestion, we add H-DenseUNet into our experiments. The segmentation Dice scores(%) of H-DenseUNet are given below. Datasets|NDCT|LDCT w/o denoiser| LDCT w TOD-Net LiTS| 92.9 | 85.7 | 92.7 KiTS| 91.1 | 83.1 | 91.5 It can be seen that H-DenseUNet did NOT get a Dice score > 95%. It is mainly because the data split is different from the original challenge. We split the challenge training set into training, validation, and test sets. The segmentation models are trained with fewer data and evaluated on a different test set than the original challenge. However, this result clearly demonstrates the benefit on segmentation using TOD-Net for denoising.

  1. Typo in the caption of Fig. 3 Thanks AC for pointing out the typo! We fixed the last sentence in the caption of Fig. 3 as “The red, blue, and green in (g) and (h) depict the true positive, false negative, and false positive regions, respectively.”




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Reviewers’ comments have been sufficiently addressed with additional experimental results and comparison. This paper will be a valuable addition to MICCAI this year.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    1



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Three knowledgeable reviewers provide mixed comments, R3 being the most negative. His/her main concen is about the lack of validation on real LDCT images. As the rebuttal mentions, it is infeasible to collect a dataset of images with both LDCT and regular-dose CT. However, in the rebuttal, the authors make an attempt to validate the effectiveness of low-dose denoising based on segmentation performances. To me, this is a reasonable way of validation on real images. Therefore, I downplay the R3’s negative comments and recommend the acceptance of the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have clarified the image preprocessing and added a set of experiments in the rebuttal that have substantially improved the quality of the paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



back to top