Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Junyu Chen, Evren Asma, Chung Chan

Abstract

A convolutional neural network (ConvNet) is usually trained and then tested using images drawn from the same distribution. To generalize a ConvNet to various tasks often requires a complete training dataset that consists of images drawn from different tasks. In most scenarios, it is nearly impossible to collect every possible representative dataset as a priori. The new data may only become available after the ConvNet is deployed in clinical practice. ConvNet, however, may generate artifacts on out-of-distribution testing samples. In this study, we present Targeted Gradient Descent (TGD), a novel fine-tuning method that can extend a pre-trained network to a new task without revisiting data from the previous task while preserving the knowledge acquired from previous training. To a further extent, the proposed method also enables online learning of patient-specific data. The method is built on the idea of reusing a pre-trained ConvNet’s redundant kernels to learn new knowledge. We compare the performance of TGD to several commonly used training approaches on the task of Positron emission tomography (PET) image denoising. Results from clinical images show that TGD generated results on par with training-from-scratch while significantly reducing data preparation and network training time. More importantly, it enables online learning on the testing study to enhance the network’s generalization capability in real-world applications.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87199-4_3

SharedIt: https://rdcu.be/cyl3F

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors present Targeted Gradient Descent (TGD), a novel fine-tuning method that can extend a pre-trained network to a new task without revisiting data from the previous task while preserving the knowledge acquired from previous training. It enables online learning that adapts a pre-trained network to each testing dataset to avoid generating artifacts on unseen features.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Instead of blindly fine-tuning all the kernels in the specific layers or retraining the entire network with a mixture of new and old labels, it might be more sensible to precisely retrain the “meaningless” kernels to make them adapt to the new tasks while the “useful” kernels are preserved so they can retain the knowledge acquired from the prior training with a larger training dataset (a wider coverage of data distribution). This is an interesting idea, and makes sense.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

This paper didn’t mention nor compare with many methods of similar taste in lifelong learning, such as elastic weight consolidation. Recycling redundant weights for new knowledge is kinda well-known there.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Good
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

See above
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

See above
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

In this paper, the authors proposed a novel fine-tuning method, Targeted Gradient Descent (TGD), that reuses a pre-trained ConvNet’s redundant kernels to learn new knowledge. The proposed method can extend a pre-trained network to a new task without revising data from the previous task while preserving the knowledge learned from previous training. The proposed method showed effectiveness in PET image denoising task, where the proposed TGD can be used to fine-tune an existing denoising ConvNet to make it adapt to a new reconstruction protocol using substantially fewer training studies. Using TGD for online-learning also showed improved artifact control on unseen features during testing.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The idea of identifying and using redundant kernels in a pre-trained network for incremental learning while retaining the learned knowledge is innovative, and can potentially have great clinical impact in image denoising as well as other medical imaging tasks. The writing is also good.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The dataset for evaluation is too small, with only 2 patient studies.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The details in the paper are sufficient.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. Expanding the testing dataset would make this paper more convincing. If not possible, please address this limitation in the paper.
2. In section 2, what is the difference between X and Y? I don’t think Y is defined.
3. Section 4.1, second paragraph, “where the bladder’s shape is early the same…”, the “early” should be “nearly” or removed.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Novel method. Great potential clinical impact. Good writing.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

The paper proposed a finetuning method based on the Targeted Gradient Descent which can reuse the redundant kernels in a pre-trained network.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper proposed a novel method to score the meaningfulness of feature maps and make the meaningless feature maps trainable by adding the Targeted Gradient Descent Layer into the ConvNet.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The KSE threshold is defined by the author based on the results in the fig4. This threshold is very important. Is it suitable for all the datasets? More evaluations are needed.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper provides sufficient details about the models/algorithms, datasets, and evaluation.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Please make the image as large as possible in order to see it clearly. In the sentence “where the bladder’s shape is early the same as that of the input image”, is there a typo for “early”?
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

the novelty of the method
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This work proposed a new fine-tuning method that can extend a pre-trained network to a new task without revisiting data from the previous task while preserving the knowledge acquired from previous training. Overall, reviewers recognized the novelty and contribution of this work. Potential improvements include comparison with alternative methods, and evaluation on other tasks / datasets.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

Reviewer #1 Comments:

This paper didn’t mention nor compare with many methods of similar taste in lifelong learning, such as elastic weight consolidation. Recycling redundant weights for new knowledge is kinda well-known there. Response: We thank the reviewer for the suggestion. We have included the following related articles on lifelong learning in the revised manuscript.

[1] Karani, N., Chaitanya, K., Baumgartner, C., & Konukoglu, E. (2018, September). A lifelong learning approach to brain MR segmentation across scanners and protocols. In International Conference on Medical Image Computing and Computer-Assisted Intervention (pp. 476-484). Springer, Cham. [2] McClure, P., Zheng, C. Y., Kaczmarzyk, J. R., Lee, J. A., Ghosh, S. S., Nielson, D., … & Pereira, F. (2018, December). Distributed weight consolidation: a brain segmentation case study. In Proceedings of the 32nd International Conference on Neural Information Processing Systems (pp. 4097-4107). [3] James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, et al. Overcoming catastrophic forgetting in neural networks. Proceedings of the national academy of sciences, page 201611835, 2017.

Reviewer #2 Comments:

The dataset for evaluation is too small, with only 2 patient studies.

Expanding the testing dataset would make this paper more convincing. If not possible, please address this limitation in the paper. Response: We thank the reviewer for the suggestion. We have included more patient studies in the camera-ready version up to the page limit. It is also important to emphasize that the low-BMI studies consist of 10 i.i.d. noise realizations. The quantitative metrics (i.e., ensemble bias and ensemble CoV) were calculated based on the 10 noise realizations.

In section 2, what is the difference between X and Y? I don’t think Y is defined. Response: We are sorry for the confusion caused. Here, X and Y represent, respectively, the input and output feature maps of a convolutional layer. Their relationship can be written as: Y_n= \sum_{c=1}^C W_{n,c}*X_c , where * represents the convolution operation, and W_(n,c) represents the n^th kernel of the c^th channel. Note that the bias and activation are omitted for simplicity. We have clarified this in our revision.

Section 4.1, second paragraph, “where the bladder’s shape is early the same…”, the “early” should be “nearly” or removed. Response: We have corrected this typo in our revision.

Reviewer #3 Comments:

Please make the image as large as possible in order to see it clearly.

In the sentence “where the bladder’s shape is early the same as that of the input image,” is there a typo for “early”? Response: We thank the reviewer for the suggestion. We have corrected the typo and made the figures larger in our revision.

back to top

Targeted Gradient Descent: A Novel Method for Convolutional Neural Networks Fine-tuning and Online-learning