Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Hao Zheng, Jun Han, Hongxiao Wang, Lin Yang, Zhuo Zhao, Chaoli Wang, Danny Z. Chen

Abstract

A large labeled dataset is a key to the success of supervised deep learning, but for medical image segmentation, it is highly challenging to obtain sufficient annotated images for model training. In many scenarios, unannotated images are abundant and easy to acquire. Self-supervised learning (SSL) has shown great potentials in exploiting raw data information and representation learning. In this paper, we propose Hierarchical Self-Supervised Learning (HSSL), a new self-supervised framework that boosts medical image segmentation by making good use of unannotated data. Unlike the current literature on task-specific self-supervised pretraining followed by supervised fine-tuning, we utilize SSL to learn task-agnostic knowledge from heterogeneous data for various medical image segmentation tasks. Specifically, we first aggregate a dataset from several medical challenges, then pre-train the network in a self-supervised manner, and finally fine-tune on labeled data. We develop a new loss function by combining contrastive loss and classification loss and pretrain an encoder-decoder architecture for segmentation tasks. Our extensive experiments show that multi-domain joint pre-training benefits downstream segmentation tasks and outperforms single-domain pre-training significantly. Compared to learning from scratch, our new method yields better performance on various tasks (e.g., +0.69% to +18.60% in Dice scores with 5% of annotated data). With limited amounts of training data, our method can substantially bridge the performance gap w.r.t. denser annotations (e.g., 10% vs.~100% of annotated data).

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_59

SharedIt: https://rdcu.be/cyhMC

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper introduces a Hierarchical Self-Supervised Learning (HSSL) for the pre-train/base model (similar to the pre-train models trained using ImageNet, but in self-supervised fashion) learning using the combined medical image super-set. The authors argue that the finetuned segmentation network that initialized using the pre-trained HSSL model can help the network obtain relatively better performance. In general, I appreciate the authors spending efforts to tackle the common problem, i.e., how to effectively train a diverse super-set model, in the medical image segmentation task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The overall paper is overall well written and easy to follow.
    • This paper tackles an important problem, i.e., how to get a good and general pre-train model for medical image segmentation tasks so that the substream tasks can be beneficial. The application domain drives the novelty of this work.
    • The proposed hierarchy scheme for the “images,” “tasks,” and “groups” categorization and the subsequent self-supervised learning is in principle. I appreciate the efforts that the authors devoted to this task.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • I have concerns about network optimization. Jointly optimizing all four losses can be very difficult.
    • The DeepLesion, MULAN, and LENS (https://arxiv.org/pdf/2009.02577.pdf) might share similar ideas with the proposed HSSL, apart from the differences in their application domains, the author might want to discuss the technical differences.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    If with provided code and pre-train model, the reproduction of the proposed method is easy. I have concerns about the model optimization. If the training code is not publicly available, the reproducibility could be very challenging.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • It would be great if the authors could demonstrate the loss convergence trends. I noticed that the Lambda_4 is 50, I would like the author to provide some insights, e.g., the scale of each loss or the difficulty levels of the losses.
    • The font in Fig.1 and Fig. 3 is too small, the authors might want to make some adjustments.
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The application domain plus the proposed HSSL jointly drive the novelty of this work. However, I still have concerns listed in the major weaknesses.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    This paper proposes a self-supervised learning method, called HSSL, for segmentation. HSSL first aggregates images from multiple datasets and then conducts self-supervised learning. During self-supervised learning, HSSL optimizes losses of hierarchical levels. The pretrained model is finally finetuned for segmentation. For evaluation, the authors have demonstrated their method on eight datasets from various domains and have shown that HSSL outperforms the existing self-supervised learning methods for three tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Writing:

    • The paper is well-written and easy to follow. Every step of the proposed method is clearly presented.
    • The proposed algorithm is well-motivated using Figure 1. Lack of labeled data is one of the critical issues in the biomedical domain.

    Method:

    • Hierarchical self-supervised learning is very interesting. While the previous self-supervised learning methods only use L_img, HSSL adopts the group and task information to train semantic contexts.
    • Ablation study has been thoroughly conducted. The efficacy of each loss term is well-analyzed.
    • Different numbers of annotations are considered in Table 2.

    Evaluation:

    • Two state-of-the-art self-supervised learning methods have been compared in Table 2. In the experimental results, HSSL outperforms MoCo and SimCLR, which have shown outstanding results for natural image classification.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Method:

    • For the group level, it would have been better if a more precise explanation is given; how the authors assign each dataset into a different group.
    • The parameter searching is missing (lambda_1, …, lambda_4). They are empirically chosen.

    Evaluation:

    • Comparison to state-of-the-art 3D models is missing. As far as I know, 3D models perform better than 2D models for organ segmentation from CT or MRI.
    • The authors said that more benchmarking results for the rest of the tasks (Task 2, 4, 6-8) are available in the supplementary, but I wasn’t able to find them (comparison to MoCo and SimCLR). Could you provide the benchmarking results?
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Good:

    • The authors have promised to release the codes upon acceptance.

    Bad:

    • The experiments have been done with single runs. Training with multiple seeds could result in different scores.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Method:

    • Is it possible to generalize your method to a 3D segmentation model? If it is successfully applied, it will be exciting.

    Experiments:

    • For future works, it could be interesting to see if it works for other data domains, such as histopathology.
  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I have provided detailed comments above. The main issue in the paper is with experiments: lack of comparison with 3D models, running experiments once (single seed), and missing comparison for the rest of the tasks. However, I think the proposed algorithm is quite interesting and novel. I truly enjoyed reading the paper. Overall, I think it can be presented in the oral session but with some elaborations. My initial rating is accept. I am open to improve or downgrade my rating upon the rebuttal.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    A new hierarchical self-supervised learning concept is introduced. This is a selfsupervised framework that learns hierarchical and multi-scale semantic features from aggregated multi-domain medical image data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    This proposal represents an outstanding solution to the problem of limited annotated data in medical imaging.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Actually this paper has no evident weaknesses.

  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Reproducibility is not evident from the paper. More information is needed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    In order for the paper to have stronger impact, more tests should be done to study generalization of the proposal.

  • Please state your overall opinion of the paper

    strong accept (9)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This is a paper that is complete in all aspects. Theory, methodology, structure, results, clarity, etc. The proposal might become a new paradigm for the problem of segmentaion with limited data.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Although there are some negative concerns from reviewers, including: optimizing all losses can be difficult, parameter searching is missing, other similar ideas should be compared with HSSL, All reviewers appreciate the authors spending efforts to tackle the common problem, i.e., how to effectively train a diverse pre-trained super-set model, in the medical image segmentation task; Introducing group and task information to train the model is interesting; and this work is well-written and easy to follow.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    1




Author Feedback

We thank the reviewers for their highly helpful feedback. We are really encouraged that they all found our motivation and ideas important, interesting, and having practical value, and acknowledged the merit of exploiting aggregated information from multi-domain data for SSL pretraining by our new hierarchical design, and that our single pretrained model shows effectiveness of HSSL and outperforms SOTA on multiple datasets.

Common Questions: Q1: Weights in the overall loss function (i.e., λ1~ λ4). (R1) Loss term trends? Joint optimization hard? (R2) No parameter searching. A1: Although there are 4 loss terms, we observed that they converge normally on training and validation sets. Also, in Fig. 3(b), t-SNE projection of extracted features shows that the network indeed captures what we expect and forms clusters hierarchically. In our preliminary experiments, we noticed that the hierarchical losses have the same order of magnitude at the beginning of training and converge later. Thus, we use the same weights for them. As for lambda_4, we tried other weights (e.g., 1 and 100) and found that our model was not sensitive to it; hence we chose 50 for it to stabilize the network training in the early stage, empirically.

Q2: Generalization. (R2) Is it possible to extend to 3D scenarios and other data domain (e.g., histopathology)? (R2 & R3) More tests? A2: Yes, it is possible. Our method has no special design components and can be directly applied to 3D images or different modalities/targets. We tried multiple runs of downstream segmentation models with random seeds, and the results were relatively stable. Will do more systematic tests.

R1 1) Comparison with DeepLesion, MULAN, and LENS. Although these three known methods aggregate a large amount of data, they are different from our method in at least two aspects. First, they focus on lesion tagging/detection/segmentation from CT images, while we focus on segmentation and our target tasks are more diverse in terms of ROIs and modalities. Second, they focus on mining pseudo-labels/masks from radiological reports and/or weak human annotations, while our HSSL aims to pre-train a model in an unsupervised manner.

2) Small font in Fig. 1 & Fig. 3. Will enlarge.

R2 1) How is each dataset assigned to a group? As illustrated in Sect. 2.2, two major factors are considered: content and modality. Specifically, Tasks-1/7/8 are all heart MRI, Tasks-2/4/5 are all abdominal CT, Task-3 is prostate MRI, and Task-6 is knee MRI. We categorize them into 4 groups, respectively.

2) Lack of comparison with SOTA 3D models. In this paper, we do not focus on designing a special pretext task by exploiting 3D image properties, but aim at a more general goal and cover broader multi-domain data. Hence, we mainly compare with general SOTA SSL methods in 2D scenarios. Our chosen baselines (Jigsaw [38], ROT [17], simCLR [10], MoCo [19]) are renowned SOTA and we think our comparison experiments with them would be appropriate. In future work, we will extend our method to 3D scenarios and compare with 3D counterparts.

3) More benchmarking results of the rest tasks (w.r.t. MoCo & SimCLR)? We have more results of the rest tasks, but we did not try all annotation ratios (e.g., 5% or 10%). We observe that the trend is similar to that for the shown tasks in Table 2, thus demonstrating the effectiveness of HSSL in extracting features for downstream tasks. Besides, in Fig. 1 of Supplementary Material, we show more benchmarking results of the rest tasks (TFS vs. HSSL).



back to top