Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Vishwesh Nath, Dong Yang, Ali Hatamizadeh, Anas A. Abidin, Andriy Myronenko, Holger R. Roth, Daguang Xu

Abstract

Deep learning models for medical image segmentation are primarily data-driven. Models trained with more data lead to improved performance and generalizability. However, training is a computationally expensive process because multiple hyper-parameters need to be tested to find the optimal setting for best performance. In this work, we focus on accelerating the estimation of hyper-parameters by proposing two novel methodologies: proxy data and proxy networks. Both can be useful for estimating hyper-parameters more efficiently. We test the proposed techniques on CT and MR imaging modalities using well-known public datasets. In both cases using one dataset for building proxy data and another data source for external evaluation. For CT, the approach is tested on spleen segmentation with two datasets. The first dataset is from the medical segmentation decathlon (MSD), where the proxy data is constructed, the secondary dataset is utilized as an external validation dataset. Similarly, for MR, the approach is evaluated on prostate segmentation where the first dataset is from MSD and the second dataset is PROSTATEx. First, we show higher correlation to using full data for training when testing on the external validation set using smaller proxy data than a random selection of the proxy data. Second, we show that a high correlation exists for proxy networks when compared with the full network on validation Dice score. Third, we show that the proposed approach of utilizing a proxy network can speed up an AutoML framework for hyper-parameter search by 3.3x, and by 4.4x if proxy data and proxy network are utilized together.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87199-4_43

SharedIt: https://rdcu.be/cyl4D

Link to the code repository

N/A

Link to the dataset(s)

http://medicaldecathlon.com/

https://www.synapse.org/#!Synapse:syn3193805/challenge/


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper describes experiments to evaluate the use of proxy networks and proxy datasets to speed up the hyperparameter optimisation process for deep learning based segmentation models. Proxy data refers to the use of a selected subset of the full training data, and proxy networks refers to a simplification of the full model. The experiments evaluate the impact of these two approaches as well as different techniques for selecting the proxy data.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The paper is generally well written and easy to follow (but see comments below regarding presentation of results).

    As far as I know, the investigating of the use of proxy data and proxy networks in deep learning based segmentation is novel.

    The experiments are quite thorough.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    I am not 100% convinced of the need for such techniques. The authors argue that it can reduce network training time from days to hours which might well be true, but is a training time of days such a problem? If the network is to be deployed in a clinical setting performance is the main criterion, so this longer training time is probably a small price to pay for slightly better accuracy.

    A lot of results are presented in the Results section and it feels quite condensed, occasionally losing clarity and focus.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    I don’t have any major criticisms of the paper in this regard.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Overall I think this paper makes a useful contribution to the MIC field by investigating more efficient hyperparameter optimisation schemes for segmentation models, and I would like to see it published. But I see it as a borderline paper for a conference as high-profile and prestigious as MICCAI. The comments below are aimed at improving the paper for publication (whether at MICCAI or elsewhere).

    1. Please see my comment above (in Main Weaknesses) regarding the need for using techniques for improving the efficiency of hyper-parameter search. Can the authors provide a stronger motivation for their work?

    2. As also noted above(in Main Weaknesses), I felt that, in the Results section, quite a lot of results were presented and this caused a loss of clarity at times. It might help if the authors can structure this section better by linking the presented results more clearly to the questions listed in Section 4.2? E.g. I presume the results in Table 1 are intended to answer the first question, etc. Currently the subheadings in the Results section do not seem to relate directly to the questions outlined earlier.

    3. Also, in the Results, Fig. 4 C and D seem to suggest that using proxy data only is better at selecting hyperparameters than using both proxy data and proxy network. Whereas Table 2 seems to suggest that (at least for the prostate data) the proxy data + proxy network approach has the best results. Can the authors comment on this apparent discrepancy?

    4. The authors mention in the Discussion that their proxy data approach comes with an overhead of “squared run-time” to compute the paired distance measures. Can they clarify if this is accounted for in the 4.4x speedup they claim in the previous sentence? If not, what is the actual speedup?

    Other minor comments/suggestions: • p4, Section 3, Proxy Network subsection: In the last sentence of this paragraph, you mention “decreasing” the number of U-net levels to 5, 4, 3, but strictly speaking 5 levels is not a decrease as this is the original number of levels? Can you rephrase? • p4, Section 4.1, paragraph 1: Please define abbreviation “HU”. • p4, Section 4.1: I found the last sentences of paragraphs 1 and 3 (i.e. about use of data for training and validation) confusing at first. When I got to Section 4.2 this became clear. Maybe you could refer forward to Section 4.2? • p4, Section 4.1, paragraph 3: The text “… random patches … were selected with or without the label using a ratio” was not clear to me. What ratio? Can you rephrase? • p4, Section 4.1, paragraph 3: “For inference a patch size of … were used” -> “For inference a patch size of … was used” • p5, paragraph 2, line 3: “validations” -> “validation” • p5, paragraph 4, line 1: “To evaluate, how small of a …” -> “To evaluate how small a …” • p6, paragraph 3, line 3: “The test Dice scores reported in (listed in Tab. 1) …” -> “The test Dice scores reported in Tab. 1 …”

  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper is generally well-written and with some relatively minor changes could be of publication quality. The methods investigated are (as far as I know) novel.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    The contribution of the paper is the evaluation of a NCC/MI based selection of subsamples of CT/MRI datasets for identifying suitable Hyper Parameters.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main strength of the paper is the application of the idea of identifying important samples to be used when doing HPO to CT/MRI images. While the idea might not be novel it seems it has not been applied in the particular use case used in the paper.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    NOVELTY -While the idea of identifying more representative images to form a subset of images to use for HPO has not been done for this particular set of tasks the idea is only of minimal novelty. -The reasoning behind using NCC/MI as a way of selecting which images to include in a training subset is not clear. Although there might be some merits for working with them in MRI/CT scans it is unclear why it is useful and if it applies to other Medical Imaging modalities.

    EXPERIMENTS -The experimental results seem to indicate that there is not a statistically significant improvement by doing HPO on a data subset suggested by the authors compared for an equally large random subset. -The use of a proxy network to identify suitable HP also seems strange. While there will be similar behaviour between HPs for similar networks for the same data, the experiment does not show that the HP identified using such a “proxy” are substantially different compared to any found during a smaller HP search with fewer iterations and with the same amount of computational resources. Further, the search space for for example LR is quite small for optimizers with quite wide acceptable lrs. -The use of a quadratic computation to evaluate the importance of each sample in order to identify which samples to include is a major problem in cases where the dataset is larger. The authors do acknowledge this. However, this means that it is only feasible to use it for smaller datasets, which often means that the dataset is small enough to do HPO on directly instead of relying on the suggested techniques.

  • Please rate the clarity and organization of this paper

    Poor

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The broad strokes of the methods are described but some important descriptions are missing. Also there is no code or appendix describing the information more in-depth. Reproducibility: Low/poor

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    The selection criteria for including data in the training subset is only mentioned briefly, a more in-depth description would be necessary to further understand it.

    The choice of hyper-parameters searched for in the experiments is poor. The optimizers chosen are all known to be relatively robust to the learning rate. These are optimizers one chooses when they don’t want to spend a lot of time searching for hyperparameters. A more interesting optimizer would have been vanilla SGD. Alternatively, dropout rate or early stopping point could have been searched for.

  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    While the use of a smaller network and subset of the full dataset to identify suitable Hyper-Parameters might be novel for this particular task, I would consider the novelty minimal. Further, the experimental results contain flaws and only seem to support parts of the claims made while certain details necessary for replicability are missing. So overall the contributions are small and the execution needs to be improved.

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The main strength of the article is the novelty of the investigation. Proxy networks and proxy data has not been used before for medical image segmentation as noted by the authors as well as the reviewers. R1 is also quite happy with the thoroughness of the experimental section.

    The main weakness, as raised by both reviewers and shared by this meta-reviewer, is the value of the proposed method. The improvements are not substantial, as also noted mainly by R2. With such low improvements, the value of the proposed method is questionable, as also stated by R1. Reducing training time from days to hours is extremely valuable if accuracy can be kept at the same level. Results in Table 1 indicate a substantial degradation.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    12




Author Feedback

Dear Reviewers & Meta-Reviewers,

First, we would like to thank the reviewers for acknowledging the importance of investigation of proxy networks and proxy data, and that it’s novel contribution for medical imaging segmentation. Second, for giving us a chance to submit the rebuttal as our work was only evaluated by two reviewers.

To highlight the novelty of our work we would like to draw attention to the fact that proxy methods are gaining traction in traditional machine learning research as well and are likely to become as dataset sizes increase as suggested by [2] (ICLR 2020).

Our work in that context is the first to show how proxy datasets can be constructed for medical image segmentation. This work is also the first to integrate proxy based techniques with AutoML. Proxy based methods can be utilized for estimating hyper-parameters, neural architecture search and so forth. We show in our work that AutoML frameworks for estimating hyper-parameters can gain speed up by utilizing proxy methods.

In the politest way possible, we would like to point out that the meta-reviewer misinterpreted Table 1 as the final set of results which is not the case. The final set of results are presented in Table 2 and show the performance benefits from our proposed approach.

To clarify the need for Table 1, it was presented to highlight that our proposed method (based on mutual information) for data selection for proxy dataset construction is the best strategy as compared to other strategies. We clearly mention in the Table 1 caption, that the computed results are for a limited amount of data for both Spleen and Prostate: “Spleen Dice scores are reported with 23% usage of full dataset for training. Prostate Dice scores are reported with 31% usage of full dataset”. Consequently, Table 2 was presented to show that the proposed methods of using an entire proxy based pipeline can be used to gain higher (Prostate) or equivalent performance (Spleen) based on Dice score with a speed up for hyper-parameter estimation with AutoML based techniques.

Specifically, the results in Table 2 indicate that we have shown improvement both in terms of faster convergence and higher accuracy in prostate and spleen segmentation tasks. In terms of overall development, our work results in 4.4x and 3.3x for Prostate and Spleen segmentation tasks respectively. Moreover, in context of accuracy, our proposed method results in improvements of at least 3% or higher compared to other methods by the metric of mean Dice score for Prostate. Our method also outperforms random data based hyper-parameters by 6% or higher for Prostate.

We have conducted statistical significance tests (Student’s T-test) and assessed Pearson Correlation Coefficients, which shows that the results are statistically significant (please refer to the title of Figure 2). Assessing significance of individual hyper parameters would be too computationally expensive (requiring on the tune of 3000 GPU hours to get all repeats just for a single method).

We agree that quadratic computation is not ideal. However, the computational problem would occur for medical imaging when segmentation based datasets tip over the size of 10,000 samples or more. The future work would be to develop these methods towards runtimes of O (N log N) or faster. We would like to note that our method can be parallized and therefore this one-time computational cost might be less of an impact even in larger datasets.

Regarding the choice of hyper-parameters, we would like to highlight the AutoML framework works on the continuous range between [0, 1]. While the grid hyper-parameters were just utilized for establishing that proxy data set selection method works well compared to the baselines. Additionally, we would be happy to add more details to the appendix upon acceptance of the paper.

[2] Coleman, C., et. al;.: Selection via proxy: Efficient data selection for deep learning. ICLR, 2020 NA




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Authors’ rebuttal addressed my misunderstanding of the results. Table 2 shows clearly that the article can improve the segmentation accuracy in the prostate dataset.

    While proxy networks have been used before, the investigation for medical image segmentation is novel to the best of my knowledge. Furthermore, reducing training time may have some practical value in the community.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    12



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The interest of the technique presented in this work is evident to all reviewers. The clarifications of the authors regarding the question of performance helped greatly understand the context of the work and the intended meaning of the results tables. Given this additional explanation, the manuscript appears to present results of suitable quality for a MICCAI acceptance

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    8



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors propose proxy data and proxy models to accelerate hyperparameter tuning prior to training a network. These proxies are simplified versions of the total dataset and the full model.

    While the reviews are mixed, I think the authors clarified some confusions raised by the reviewers concerning the results section. I agree that the proposed work is novel and interesting, and while usefulness is not absolutely clear, I think the community would benefit from seeing novel ideas at the conference, that can spark conversations and encourage new ideas in turn. That (to me) is the aim of conferences. For this reason I recommend acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    7



back to top