Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Jingyang Zhang, Ran Gu, Guotai Wang, Lixu Gu

Abstract

In clinical practice, a desirable medical image segmentation model should be able to learn from sequential training data from multiple sites, as collecting these data together could be difficult due to the storage cost and privacy restriction. However, existing methods often suffer from catastrophic forgetting problem for previous sites when learning from images from a new site. In this paper, we propose a novel comprehensive importance-based selective regularization method for continual segmentation, aiming to mitigate model forgetting by maintaining both shape and reliable semantic knowledge for previous sites. Specifically, we define a comprehensive importance weight for each model parameter, which consists of shape-aware importance and uncertainty-guided semantics-aware importance, by measuring how a segmentation’s shape and reliable semantic information is sensitive to the parameter. When training model on a new site, we adopt a selective regularization scheme that penalizes changes of parameters with high comprehensive importance, avoiding the shape knowledge and reliable semantics related to previous sites being forgotten. We evaluate our method on prostate MRI data sequentially acquired from six institutes. Results show that our method outperforms many continual learning methods for relieving model forgetting issue. Code is available at https://github.com/jingyzhang/CISR.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_37

SharedIt: https://rdcu.be/cyhMf

Link to the code repository

https://github.com/jingyzhang/CISR

https://github.com/liuquande/SAML

Link to the dataset(s)

https://github.com/liuquande/SAML


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper strives to improve continual learning on streams of different datasets when applied to medical images. In this framework, datasets using different acquisitions used for training in sequence, without access to previously-used datasets. This paper introduces a novel method for continual learning based on penalizing the change of “important” weights to both shape and certainty of the segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This paper addresses a critical need in the current climate of data privacy. Continual learning addresses limitations of both fine-tuning (catastrophic forgetting) and federated learning (implementation difficulty).

    This paper is well-written and the results are very clear. There is a definite improvement over existing methods. In addition, this method does appear to be quite applicable.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The definition of the variables used in the loss equations is unclear. The variables should be included in the figure describing the process to better allow the reader to see where losses are calculated. For example, it is unclear how L_seg and L_sp are calculated from the figure and equally unclear where variables such as r_n and T_e are used from the text.

    It is unclear whether the equations for the importance values is novel or based on previous techniques. If they are not novel in themselves, this is not a detriment the the technique, as their use is novel, but they should be cited.

    Standard deviation of metrics in Table 1 should be included. In addition, statistical tests should be performed to determine if methods are significant improvements.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    This paper should be easily reproducible with code to be provided by the authors, as it also uses available challenge data.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    There are few comments beyond the weaknesses highlighted above.

    Figure 1 requires additional attention to make sure it matches what is in the text. The addition of the variables from the text would be very helpful here. Adding the ground truth as a point of the figure to describe T_r and L_e, would also be helpful.

    The source of equations 2 and 5 should be cited or explicitly described as novel.

    In the dataset section of Experiments, additional description of the dataset (such as imaging parameters) should be included.

    How were the images resized? Were images interpolated to a common resolution?

    A more common abbreviation for Dice Similarity Coefficient is DSC.

    A description of how hyperparameters affect the outcome would be a welcome addition to the discussion.

    Statistical testing and standard deviations should be included in Table 1.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This paper addresses a very important topic and shows promising results. The writing and description of the technique is clear and should be very applicable.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    The authors propose a new neural network model architecture to allow for continuous learning mitigating the problem of the model forgetting previously learned information. The model is trained sequentially on data from six different sites. The model is trained trained to be aware of both shape and global topology. The authors could show that their model could retain information learnt from the very first site after the network was further train on all other five sites. The proposed approach could outperform three other state-of-the-art continual learning methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well written and structured, the authors provide qualitative and quantitative results. The introduced method seems novel and indeed beneficial from a clinical application stand point. Both multi-center studies for which data cannot be easily shared, or longitudinal studies that include data from different scanners/annotators could profit from the proposed sequential, continuous learning approach. Another strong aspect if the ablation study, that helps to understand the different impact of USmAI and SpAI (both boundary and topology awareness). Here the authors could show that both individual mechanisms help to improve segmentation performance over the here best performing state-of-the-art method (i.e. MAS). They could show that the combination (CISR) of both approaches further improved the evaluation metrics. I also appreciate that the authors compared there method not to one but three state-of-the-art methods, which gives the reader confidence in the reported superiority of the introduced approach. Qualitative results also show not only the results on the very first (A) and very last site (F), but also on an intermediate site (D), which I think is great to provide a full picture of the performance. A bonus of the applied methods is the uncertainty map, that could be useful for interpretation. The authors also offer a sensible explanation why performance goes down after training on site C as well as a brief outlook on future direction (including more sites)

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    From an application stand point I think one short coming is that the baseline does not include individual models trained on individual sites. If the continuous learned model does not reach the performance on single sites model, that would mean in practice it would be more beneficial to just train the same architecture on all different sites separately.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The experiments are run on public databases, which makes it theoretically easy to reproduce. Authors will provide their code on GitHub.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    I appreciate the thoroughness of the analysis (different evaluation metrics, several state of the art methods, quantitative and qualitative analysis, ablation study). Both results and presentation are convincing. For a journal I think it would be interesting to examine the impact of the order of the different sites. For example: Could you measure whether sites are more similar to one and another in contrast to other sites, and then check if it makes sense to train in a particular order (e.g. 1st and last site are the furthest away from one and another), this could possibly guide researcher of multi-center studies which databases to use first to train the model. I think it would be excellent if this approach could be extended to another anatomy to show generalisability.

  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I very much enjoyed the paper as it is is an interesting problem to solve. The analysis and presentation is very thorough and meets the expected standard. I wish they authors could make their code open-source to be accessed easily. Overall, the novelty, the striking results as well as the presentation lead me to the conclusion that this paper should be accepted.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    The paper describes a method for continual learning of a segmentation model. Two importance measures regarding shape and semantics are used to penalize changes to model parameters that are regarded as important. During training the model is incrementally trained on six sites and it is proved that the proposed method successfully mitigates catastrophic forgetting.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper is introducing a novel method with a clear presentation and explanation about the parts of the framework.
    • The work tackles an important task of using a model in multiple sites, without catastrophic forgetting effects.
    • The evaluation compares to multiple current state-of-the-art methods and presents a ablation study for all major parts of the framework, which proves that all parts are support the final performance of the approach.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The method has many different modules and author do, in general, a good job describing them. However, it is not easy to understand what the hyperparameters alpha, beta and lambda are enforcing. It is clear that they are weighting the loss terms, but the influence is unclear and there is no explanation how authors came up with the values used.

    • There is no information about what site drives the average of the dice score. Is the model performing equally on all sites or is there one site on which the model performs performs better? An inclusion of site specific performance metrics or a variance of measurements would improve the results.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The method description is well written, and matches the reproducibility checklist. Authors state that they will add a github link after acceptance. The experiments are based on public available data, which further improves reproducibility.

    Considering the paper and the checklist, the work is reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • Overall the experiments are well justified considering the format of a conference paper. However, as noted before some important validation steps are missing. What is the influence of hyperparameters? How is the performance for individual sites?
    • A more broad discussion of differences of appearance due to site specific settings is needed. From Fig 3. it can be seen a wide gap in visual appearance between Site F and Site A, D, more discussion about how this influence the framework would be a good extension.
  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The method is well justified and reproducible. The strength of the paper outweigh the small weaknesses. There is a good discussion potential at the conference.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    All the reviewers are very happy. This is a very strong article with clear methodological contribution and results that show the benefits of the proposed method. Reviewers ask some clarification questions about the model presentation as well as the experiments. I suggest authors to take them into account.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    1




Author Feedback

We sincerely thank all reviewers and meta-review for their positive and constructive comments. They acknowledged that our method is “novel and applicable” (R1&3) and “solves an interesting problem with very thorough analysis and presentation” (R2). Here, we give response to the main comments:

Clarification for model presentation and experiment setting (R1&3). We would like to apologize for the unclear variable notations in Section “Methods” and Fig.1 for computing shape-aware importance. They will be clearly defined in the camera-ready version. Moreover, we analyzed the experiment setting (e.g., the selection of hyperparameters) and found that the shape boundary plays a more important role than the topology counterpart for segmentation memory preservation, motivating us to assign \alpha with a large value than the \beta value. Due to the limited space, the current conference paper did not include these discussions, which yet will be shown in a further journal version.

Novelty of SpAI and USmAI formulation (R1). We would like to emphasize several conceptual novelties of SpAI and USmAI formulation in Eq. 2 and Eq. 5, respectively. First, SpAI is computed based on the parameter sensitivity w.r.t., the shape-related model outputs, which helps the model avoid forgetting shape knowledge when training on subsequent sites. Second, USmAI uses uncertainty-guidance strategy for the awareness of semantic reliability, improving the model robustness during continual segmentation procedure. However, largely different from SpAI and USmAI in our method, Aljundi et al. [1] considers only plain semantics in a common classification task, which may aggravate model forgetting especially for shape knowledge in our segmentation task with even inevitable disturbance of semantic noise. To better understand our conceptual novelties over the common counterpart [1] and its customized application, we will add a reference of Aljundi et al. [1] in the camera-ready version.

Site-specific performance (R2&3). We realized that the site-specific performance would actually contribute to 1) explaining the domain shift between different sites in the sequential data stream, and 2) characterizing the model memory during continual segmentation task. However, as a conference format, we did not have enough space to show and comprehensively analyze the details of site-specific performance. Thus, it would be fairly good to investigate this problem in a journal version of our paper.

[1] Aljundi, R., Babiloni, F., Elhoseiny, M., Rohrbach, M., Tuytelaars, T.: Memory aware synapses: Learning what (not) to forget. In: Proceedings of the European Conference on Computer Vision. pp. 139{154 (2018)



back to top