Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yang Yang, Zhiying Cui, Junjie Xu, Changhong Zhong, Ruixuan Wang, Wei-Shi Zheng

Abstract

Current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes. This poses a challenge in intelligent diagnosis systems where initially only training data of a limited number of diseases are available. In this case, updating the intelligent system with data of new diseases would inevitably downgrade its performance on previously learned diseases. Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning built on a fixed pre-trained feature extractor. In this model, knowledge of each old class can be compactly represented by a collection of statistical distributions, e.g. with Gaussian mixture models, and naturally kept from forgetting in continual learning. Experiments on two skin image sets showed that the proposed approach outperforms state-of-the-art approaches which even keep some images of old classes during continual learning of new classes.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_38

SharedIt: https://rdcu.be/cyl6c

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper describes a novel continual learning method which uses a fixed pre-trained CNN feature extractor, the Gaussian mixture models to represent feature distributions of old classes, and a Bayes classifier. Only the GMMs parameters from old classes need to be stored in memory for the classification of all classes. The authors conducted extensive experiments to compare their proposed method with multiple representative continual learning methods in the literature on two skin image datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    . The authors conducted extensive experiments to compare their proposed method with multiple representative continual learning methods in the literature on two skin image datasets. Their method outperforms those methods on the two datasets. They also analyzed the robustness of the model and the effect of feature extractor. The paper is written very well and is easy to follow. The technical analysis and presentation also seem to be quite solid.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    One suggestion regarding further experimental testing: it does continual learning (adding new classes) for each skin image set respectively, how about doing continual learning from one skin set to the other?

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The work described in this paper should be reproducible with the right skill set. I didn’t see code availability discussed. Reproducibility will require an experienced machine learning researcher.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The authors could describe disease characteristics and overlap between diseases better, The clinical side of the datasets is absent. The paper is skewed toward machine learning method, which is important but insufficient.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The work is timely and important. Medical image datasets are famously imbalanced and small, but new data is anticipated – also, with new data coming from different sources and containing different diseases or distribution can impact learning. The strategy attempts to build on continual learning, which in its next generation, holds promise (or even for other researchers to adopt/adapt in their work).

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    The presented work introduces a methodology to update a trained model for the classification of an increasing amount of classes without undergoing the process of catastrophic forgetting. This is achieved by training a CNN-based feature extractor model and then storing a number of S parameters of Gaussian mixture models (GMMs) for the distribution of every of the K features for each class. Therefore, they obtain an effective knowledge storing procedure with 2 * K * S values for each class only. Since this process does not require a retraining of the system and every new class can be stored independently from each other, catastrophic forgetting cannot occur. The method is evaluated against several other methodologies and outperforms the state of the art in continuous learning using the methodology.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The general idea of the paper shows a nice and it seems novel combination of the representative power of deep learning and stable distribution representations of GMMs. The method is well explained and nicely motivated by an analogy to the human learning behavior and the way humans might form knowledge (feature extraction learned in early ages, knowledge built up later with fixed extraction). The design of the method follows this intuition nicely and is elegantly applied to the problem using GMMs. It is evaluated on two skin lesion datasets, but is in principle applicable to other tasks as well. The experimental setup is explained in detail to allow a reproduction of the experiments. The evaluation is done against several methods and focuses on different aspects of the methodology to highlight its strength compared to the state of the art. First, the paper evaluates the overall performance for different amounts of new classes being learned each time. Then, it evaluates the impact of the amount of GMM components and the used feature extractor, already addressing the prerequisites for the method to work properly.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The paper refers to the continuous integration of new classes into their methodology as “learning”, but the image samples for the class are not analyzed effectively anymore. The feature extractor is frozen, therefore the methodology has to rely fully on the features that have been learned previously. The GMMs are only encoding the known feature distributions for a new class, they are not learning new features. The reasoning behind this is explained nicely, but a strong feature coverage would have to be guaranteed. Otherwise, if a new class would show some features not previously seen by the system, there is no way to integrate them into the process. Additionally, since the extracted features are crucial for the method to work (as shown by their own evaluation), it could be beneficial to evaluate their meaning using e.g. a form of feature disentanglement. Several methods are addressed, but some methods which seem to show better performance than iCaRL like Gradient Episodic Memory (GEM) from NIPS 2017 are not compared. Finally, there are some minor writing errors in the text.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provided a detailed description of their experimental setup. They explain their training procedure, specifiy the hyperparameters and network setup. Additionally, they use publicly available data. The code and scripts will be made available as well according to the reproducibility checklist.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The method of storing class-specific knowledge within GMM distributions is very elegant. Although it is nicely explained why a fixed feature representation is beneficial in this setup, this aspect should motivate a stronger evaluation of the extracted features. As mentioned above, a feature disentanglement could be an option. A validation set is mentioned to setup the GMM hyperparameters. Here, it would be good to also specify in a few sentences, how this process was performed. Overall, the work has been nicely prepared.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The idea of storing class-specific knowledge within distributions describable by a few parameters is very efficient and encodes the knowledge within the method explicitly instead of relying on e.g. old example images. The mindset behind this idea is interesting in my opinion and provides a contribution that could be interesting for the community as well, especially in the important field of continuous learning.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    Feature extraction part is changing as the learning task continually added in conventional continual learning settings. This is one possible cause to catastrophic forgetting in neural networks. The authors proposed a generative model for continual learning on top of the parameter-fixed pre-trained feature extractor to achieve consistent performance benefits in continual learning settings.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    Fixed feature extractor is the key of the proposed work. The entire process is: 1) pretraining of the feature extractor, 2) approximation of p(f c) using GMM, 3) prediction of p(c x) with bayes rule using p(f c) in step-2. After step-1, all the pretrained parameters are fixed, so it guarantees robustness of continual learning, e.g., final round learning performance with different class orders is consistent.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    As the authors mentioned in the literature, training data for the feature extractor is very important since all of the following rounds will be totally based on the pretrained ‘fixed’ feature extractor. However, it is very difficult to collect large-scale well-curated training data at the initial stage of continulal learning, and this is why continual learning is important in real applications.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Some ‘YES’ items in the checklist are not specified in the paper (e.g., computing infra details, average runtime, etc), but those items seem to be less important in terms of scope of this work. The proposed method works well consistently with variation of the experimental settings (e.g., order of task) and this is one of the major strenths in this research area. So, the reproducibility is specifically important. The authors checked ‘YES’ for the public release of the code.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Target application limitation:

    Experimental setup in the literature is “learning different classes continually in the same task, i.e. skin image classification”. In more detail, skin153 and skin40 in the skin193 dataset were respectively used for training a feature extractor and continual learning experiments. The proposed work outperforms previous works in this limited experimental setting. However, some of the previous works were originally presented for different experimental setting, e.g., continual learning of different tasks such as ImageNet, VOC, CUB, Scenes, Places365, etc. From this perspective, the proposed method is limited in terms of target application. The experiments in this paper did not show that it also works well on the continual learning of different tasks.

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the proposed method is limited in terms of target applications and it requires quality training data at the beginning, it works well on the specific experimental setting (i.e. continual learning of different classes in the same task). More importantly, it showed consistent performance with different order of learning rounds.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Summary: A novel continual learning method using a fixed pre-trained CNN feature extractor, with a Gaussian mixture model to represent the feature distributions of old classes, and a Bayes classifier. Only the GMMs parameters from old classes need to be stored in memory.

    Positives:

    • Well motivated and novel idea that uses the stable distribution representations of GMMs to avoid catastrophic forgetting.
    • Extensive experiments show the proposed method outperforms several representative baseline methods (but no Gradient Episodic Memory (GEM) from NIPS 2017) on two skin image datasets. Results likely to generalise to other data.
    • Well written, with information provided to reproduce the work.

    Negatives:

    • Feature extractor is not continuously updated, so may miss features that appear later in the training.
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    1




Author Feedback

N/A



back to top