Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Xiaoman Zhang, Shixiang Feng, Yuhang Zhou, Ya Zhang, Yanfeng Wang

Abstract

Automatic and accurate tumor segmentation on medical images is in high demand to assist physicians with diagnosis and treatment. However, it is difficult to obtain massive amounts of annotated training data required by the deep-learning models as the manual delineation process is often tedious and expertise required. Although self-supervised learning (SSL) scheme has been widely adopted to address this problem, most SSL methods focus only on global structure information, ignoring the key distinguishing features of tumor regions: local intensity variation and large size distribution. In this paper, we propose Scale-Aware Restoration (SAR), a SSL method for 3D tumor segmentation. Specifically, a novel proxy task, i.e. scale discrimination, is formulated to pre-train the 3D neural network combined with the self-restoration task. Thus, the pre-trained model learns multi-level local representations through multi-scale inputs. Moreover, an adversarial learning module is further introduced to learn modality invariant representations from multiple unlabeled source datasets. We demonstrate the effectiveness of our methods on two downstream tasks: i) Brain tumor segmentation, ii) Pancreas tumor segmentation. Compared with the state-of-the-art 3D SSL methods, our proposed approach can significantly improve the segmentation accuracy. Besides, we analyze its advantages from multiple perspectives such as data efficiency, performance, and convergence speed.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87196-3_12

SharedIt: https://rdcu.be/cyl1C

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    In addition to the existing restoration-based self-supervised learning framework, this paper proposed to let the model predict the scale of inputs as well as discriminate the modality of inputs. The experimental results indicate the importance of the two components on brain tumor and pancreas organ/tumor segmentation tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The methodology is clearly described and illustrated.

    2. The motivation of scale-aware learning is fairly stated.

    3. The experiments utilize publicly available datasets.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The results are not sufficient to justify the effectiveness of the proposed two components. According to the two target tasks, one of them (brain tumor segmentation) suggests MIAL and SA components make minor performance boost, and the other one (pancreas segmentation) suggests MIAL and SA components get lower performance. Please see comments in #7 for details.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    It is easy to implement the idea based on the existing method description.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. Why adversarial learning, instead of direct imaging modality classification (CT vs. MRI)? Since the MIAL part will not be used for transfer learning, the adversarial training style seems overly complicated. The authors are expected to justify the necessity of adversarial learning, otherwise, the model parameters in the MIAL are wasted.

    2. Trivial solution for the scale classification? Predicting the scale of input, into large, medium, small, is not a difficult task based on the content and appearance of the input. I am not sure whether the model could learn much about the scale. It is interesting to evaluate the model’s three-way scale classification performance in the proxy task (on unseen scans). I expect the accuracy would be nearly 100%.

    3. The results are not sufficient to justify the effectiveness of the proposed two components. The proposed framework (Fig. 2) is built upon Models Genesis with two additional components, i.e., MIAL and SA. Thereby, to demonstrate the effectiveness of the fair comparison in Table 2 is with Genesis. There are negative results in the MSD dataset, where +MIAL and +SA get lower performance than Genesis. I think the incremental performance gain is due to the trivial solution of the scale classification.

    4. How is this performance compared with the challenge? Table 2 reports a Dice of 84.92% for BraTS 2018 and a Dice of 33.92% for the MSD challenge. I understood the top entrances of these competitions were based on 3D U-Net learning from scratch (nnU-Net). Since the authors obtain a great performance boost in the local test, I suggest submitting the performance to the official test set and report the official score. In this case, we can have a rough sense of the proposed method comparing with the state of the arts in segmenting tumors.

    5. Why not stratify the size of the pancreas tumor? The authors report the performance of each method on the stratified tumor sizes for the BraTS dataset, which I found is helpful to understand the impact of scale-aware learning. However, the authors do not stratify the size of the pancreas tumor, and the scale-aware learning seems similar to Models Genesis.

    6. A better convergence seems to be an over-claim. Based on Fig. 3, I would say the learning curves are very similar among all methods. More experiments are needed to demonstrate this point.

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The current results are not sufficient to justify the effectiveness of the proposed two components. The authors are encouraged to validate the methods on more diverse target tasks.

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    This paper a self-supervised learning method for 3D tumor segmentation to distinguish key features of tumor regions: local intensity variation and large size distribution. To this end, a novel proxy task, i.e. scale discrimination is formulated to account for large size distribution, and an adversarial learning module is further introduced to learn modality invariant representations for local intensity variation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is well-motivated with good writing. They present two key challenges of existing approaches and propose methods to separately solve the problems.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The proposed framework seems a simple combination of existing approaches.
    2. Why the adversarial learning module can capture local intensity variation.
    3. More experiments should be specially designed to demonstrate how the proposed modules could distinguish key features of tumor regions: local intensity variation and large size distribution.
    4. An analysis of the statistical significance of reported differences in performance between methods if being presented would be better.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    If code is available, it seems we can reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The authors present two key challenges of existing self-supervised learning approaches and propose methods to separately solve the problems. My concerns:

    1. The proposed framework seems a simple combination of existing approaches.
    2. Why the adversarial learning module can capture local intensity variation.
    3. More experiments should be specially designed to demonstrate how the proposed modules could distinguish key features of tumor regions: local intensity variation and large size distribution.
    4. An analysis of the statistical significance of reported differences in performance between methods if being presented would be better.
    5. Other non-SSL methods should be present for comparison to verify the effectiveness of the proposed methods.
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is well-motivated, however, the proposed framework seems a simple combination of existing approaches.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    The paper presents a self-supervised learning based pre-training method specifically for downstream tumor segmentation tasks. The method improves Model Genesis self-restoration framework in two aspects: encourages the model to capture multi-scale representations via scale aware proxy task; and introduces an adversarial module to learn modality-invariant representations.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • The paper improves the multi-scale modeling ability of the model for tumor segmentation using random scaling crop and scale loss;
    • The paper improves the modality invariant ability when combining four different datasets during pre-training.
    • Strong evaluations, thorough ablation studies and horizontal comparisons with SOTA methods are included in the results, which show the effectiveness of each module and the advantage of the proposed pre-training method.
    • The paper also explored the data efficiency and convergence speed.
    • The significant performance of two downstream tasks using SAR pre-training demonstrates the potential of SAR in general tumor segmentation tasks.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The base self-restoration framework of the paper is the same as Model Genesis. The random cropping with three scales in Scale Aware Module is simply a specific form of Random Resized Cropping, which is widely used as one of the standard data augmentation technique in 2D natural image object detection and segmentation tasks. And MIAL improves the modality invariant of the model using adversarial learning, which is also a popular method used in domain adaptation and multi-modality fusion applications. Therefore, the method part of the paper is more like a combination of existing techniques and thus has limited novelty.
    • The paper does not have the ablation study for only applying randomly cropping of different scales without the scale loss. As above says, random scaling and cropping is widely used as data augmentation in natural images, which has shown the ability to improve the model performance on multi-scale images. So reviewer would like to know whether the performance boosting of SAR is from random scaling crop only, or it benefits from the scale loss.
    • The paper uses three scales of 1/2, 1/4, 1/8, but without any explanation about the reason for selecting such scales. What about using only two scales, or trying to crop more than three scales? Probably different scale selection would influence the final performance. It would be better if the authors give some explanations for their scale selection in the paper.
    • In Table.1, the case number of BraTS is 760, which is more than the total cases 285 for the downstream task. The reviewer assumes that the author considers all the 4 MRI modalities in BraTS in pre-training, so 760 comes from 190x4. But this is confusing since the paper has no descriptions on this. More descriptions on the pre-training datasets is needed.
    • MIAL is used to discriminate two modalities: MRI and CT. But when considering the pre-training datasets, all brain scans are MRI while all abdomen scans are CT. The modalities are entangled and correlated with scan regions, thus it’s hard to tell whether the modality invariant feature from the model is affected by the scan regions.
    • The paper considers relative size (shape of the volume scans) when cropping the sub-volume, but in medical applications, voxel size, which decides the actual size of tumors/organs, could be different in different datasets. When combining multiple datasets, it’s recommended to consider the actual size of the patients and resize all the scans to have the same voxel size, thus different objects could be comparable.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Detailed data preprocessing and split are provided in the paper, which satisfies a good reproducibility of the paper work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • Since Scale-Aware module consists of two parts: randomly cropping of different scales and scale classification loss, it’s necessary to add an ablation study that only use randomly cropping without the scale classification, to demonstrate the effectiveness of the scale classification loss.
    • Please explain why the three scales 1/2, 1/4, 1/8 are selected.
    • Please add some descriptions on how the pre-training datasets are collected. (760 cases in BraTS2018 is unclear and confusing with no further descriptions)
    • It’s recommended to resize all the scans to have the same voxel size when combining different medical datasets, so that all the objects (organs, tumors, body parts) could be in the same original scale.
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Although the proposed method in the paper has limited novelty, but the results demonstrate a good performance and the potential of this SAR pre-training method to be applied in more general tumor segmentation tasks. Therefore, the paper is considered above borderline and has a chance to be accepted.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The reviewers all agreed that the paper in general is well motivated and well written. The reviewers have also identified some weaknesses such as the proposed framework is a straightforward combination of existing approaches, and also the use of adversarial learning model should be further justified.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3




Author Feedback

We thank all reviewers for their constructive comments, and include the response below. All reviewers:

  1. Clarify the Novelty i) We propose a novel proxy task on scale discrimination(as pointed by R2), encouraging the model to capture scale-aware representation. We have demonstrated its effectiveness on general tumor segmentation across multiple datasets with wide size distribution. ii) To avoid performance degradation caused by domain gaps from different modalities, we further combine different datasets during pre-training and adopt an adversarial training module (MIAL).
  2. Necessity of adversarial learning Intuitively, our adversarial learning plays the role of a learnable ‘normalization’, encouraging representations to be agnostic to modalities. It is designed for our considered downstream tasks, namely tumor segmentation, which mainly concerns the local intensity variation. Table 2 also shows scale classification and adversarial learning are complementary to each other, combining both gives the best performance.

Reviewer #1 Q1: Trivial solution for the scale classification During training, we apply a series of transformations to the input that changes its texture and intensity, making scale classification a non-trivial task. Evaluating on the unseen validation set, the accuracy of scale classification is only 90.83%. Q2: Justify the effectiveness of the proposed two components The only negative result is from pancreas tumor segmentation, we conjecture this is due to the intrinsic difficulty of this problem. In fact, none of the SSL methods has shown satisfactory results, and we treat this as our future work. Nevertheless, while reading the results with both of the proposed components adopted, consistent improvements can be observed on all tasks, showing their complementarity nature. Q3: Compared with the challenge We report the official score of Dice for BraTS challenge Method | Enhanced Tumor | Whole Tumor | Tumor Core Scratch | 76.12 | 90.23 | 82.79 Genesis | 76.05 | 90.54 | 82.88 SAR | 79.28 | 90.58 | 83.70 Q4: Stratify the size of the pancreas tumor For pancreas, tumor size ranges(303,324028) We stratify the tumor dice based on their size Method | <2000 |<5000 | >5000 cases | 95 |102 | 84 Scratch | 19.57 | 30.71 | 24.99 Genesis | 25.26 | 38.69 | 32.75 SAR | 26.12 | 39.90 | 35.07 Q5: Demonstrate better convergence We test the models for different epochs and SAR always gets better results Epoch | Scratch | Genesis | SAR 10 | 42.32 | 47.08 | 54.35 20 | 47.97 | 54.67 | 61.58 50 | 59.90 | 71.45 | 74.61 100 | 67.86 | 78.00 | 79.11

Reviewer #2 Q1: Experiments distinguish key features For local intensity variation, we can prove this by visualizing the response of different models to tumor data. For large size distribution, we have stratified the results according to the tumor size (BraTS in Table 2, MSD refers to R1Q4) Q2: Analysis of the statistical significance We perform independent two sample t-test between the SAR vs. others. All the comparison show statistically significant results (p = 0.05) except for MSD tumor (Genesis vs. SAR) Q3: Compare with non-SSL methods We compare with the SOTA 3D supervised pre-trained models (I3D, NiftyNet, Med3D) on BraTS and get dice of 80.83, 75.60, 79.58. SAR gets better result 84.92.

Reviewer #3 Q1: Ablation study of random scaling crop and scale loss We experiment with random scaling crop as data aug when training from scratch and get 73.65 on BraTS, showing no obvious performance gain. SAR gets 84.92, proving the effectiveness of scale loss for SSL. Q2: Explanations for scale selection For BraTS, the smallest tumor occupies nearly 1/8 of the whole volume (28,26,24), largest nearly 1/2(96,155,68). We pick these scales to make sure the learnt representation is aware of the texture and intensity over the range of tumor size. Q3: Datasets descriptions & Resize to same voxel size We have resampled all scans to the same voxel spacing in pre-training. We will add more details in our revision.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The reviewers have justified the novelthy and the necessity of adversarial learning. The paper is sufficient for publication on MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    5



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents a self-supervised learning based pre-training method for tumor segmentation. The restoration-based self-supervised learning framework is interesting. The input scale prediction proposal is very relevant for segmenting tumors because their sizes vary from patient to patient. The results show clearly the good performance of the proposed framework. My proposition is “accept”.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    6



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents clearly an elegant and effective idea that can be of interest for the MICCAI community. Despite initial shortcomings in the display of the added value of the solution and the justification of novel aspects in the framework, the rebuttal is very helpful in justifying and further presenting the missing aspects of the work making it therefore relevant to the community

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    7



back to top