Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Yangwen Hu, Zhehao Zhong, Ruixuan Wang, Hongmei Liu, Zhijun Tan, Wei-Shi Zheng

# Abstract

Successful application of deep learning often depends on large amount of training data. However in practical medical image analysis, available training data are often limited, often causing over-fitting during model training. In this paper, a novel data augmentation method is proposed to effectively alleviate the over-fitting issue, not in the input space but in the logit space. This is achieved by perturbing the logit vector of each training data within the neighborhood of the logit vector in the logit space, where the size of neighborhood can be automatically and adaptively estimated for each training data over training stages. The augmentations in the logit space may implicitly represent various transformations or augmentations in the input space, and therefore can help train a more generalizable classifier. Extensive evaluations on three small medical image datasets and multiple classifier backbones consistently support the effectiveness of the proposed method.

SharedIt: https://rdcu.be/cyl6l

N/A

N/A

# Reviews

### Review #1

• Please describe the contribution of the paper

Data augmentation is one of the most important recipes for the stability of neural network training. Compared to traditional approaches such as augmentation in input or feature spaces, the authors proposed augmentation technique in logit space. Logit space augmentation is advantageous: 1) interpretable space like input space (feature space is not), 2) no need to predefine the type of augmentation like feature space (input space is not).

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

It is methodologically simple, easy to implement with negligible overhead, but experimental results showed consistent performance benefits. Backup studies for reasoning the benefit of the proposed method such as uncertainty trend during training and different types of sampler are very helpful to understand why the proposed method works well in general.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

All the experiments were based on cross-validation, so the mean & std need to be added in the literature, at least in Table 1 & 2. Explanation of the trend of the std also needs to be added If the std is unexpectedly large.

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors checked ‘NO’ for the question about the description of results in terms of mean/std. Cross-validation mean/std needs to be added.

The authos checked ‘NO’ for the question about the release of the run script to reproduce the experimental results. The proposed method is very simple, easy to implement, and the authors checked ‘YES’ for the question about release of the implementation code, so I think release of the run script might not be an additional burden. This will help followup researchers.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Mean and standard deviation are important informations to estimate how the proposed method works in general experimental settings, especially in terms of the stability and consistency.

Details of the trend of the experimental results need to be described in the literature. Here is one example:

Researchers expect in general ResNet-50 is better than VGG16, but In Table 2,

• VGG16 is better than ResNet-50 in Basic but worse in Cutout in Xray6
• EffcientNet is better than ResNet50 in Xray6 but worse in Skin40 & Skin8
• VGG16 is better than ResNet50 in Xray6, worse in Skin40, and similar in Skin8

At least the author’s subjective interpretation would be helpful to understand these trends.

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The method is very simple and easy to implement with negligible overhead. It showed consistent performance improvements through various experiments with different neural network architectures. Several ablation studies are very helpful to understand why the proposed method works well. Addition of information such as mean/std and analyses of the results is needed for the quality of this paper.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

4

• Reviewer confidence

Very confident

### Review #2

• Please describe the contribution of the paper

This paper proposes a novel data augmentation manner in logit space. To resolve the challenge of neighborhood size selection, the authors propose uncertainty estimation for the neighborhood size adaption.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The idea of data aguemtnation in logit space is quite novel and appealing.
2. Experimental results on three dataset with limited data size demonstrate the effectiveness of the logit space augumentation.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. In Table 1, most of other compared data augmentation methods show very similar results with basic agumenation method, which raise the concern of the fairness of methods comparison.
2. Since the proposed data augmentation is in logit space, quite different with other previous proposed ones. It would be interesting to explore the complementarity of this logit space augmentation with other compared ones, e.g., Mixup etc.
• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Description and mathmatical formulation are clear. Code is suggested to release to faciltate to reproducing the results on this paper.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Although the paper suggests its usage and evaluates the method on dataset with limited dataset. But the idea should also apply to dataset with large data samples. Expect the authors to evaluate the proposed method on large datasets.

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper propse a novel data auguemtnation strategy on logit space. Current experiments demonstrate its superiority on dataset with limited dataset.

• What is the ranking of this paper in your review stack?

2

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

### Review #3

• Please describe the contribution of the paper

This paper proposes a novel data augmentation approach by perturbation in the logit space, resulting in a more generalizable classifier. Three small skin and chest X-ray datasets are used for validating the proposed augmentation method.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• The paper is clearly written.
• Novel data augmentation through perturbation in the logit space.
• Substantial experiments with the detailed reporting on the choice and effect of hyper-parameters (random sampler, sampling number, uncertainty estimate).
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• Experiments are reported only on balanced dataset. It will be more convincing to see the performance of the proposed augmentation on imbalanced dataset.
• Statistical analysis needs to be reported. Since the authors performed cross-validation, standard deviations should also be reported for the models.
• Missing description of computing infrastructure and execution speed.
• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Results could be reproduced.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
• It would be nice to see class-wise performance comparisons.
• “Note that the basic data augmentation was used in all the baselines and our method by default.” How does it impact the proposed method? An experiments would certainly help. -It’s hard to establish the superiority of the adaptive $\sigma$. It has marginal effect.
• In Table 2, more recent method could be used to compare against.
• Could you vary K for different disease classes? It would be interesting to use different K for different classes, given an imbalanced dataset.
• “The fine-grained difference between different diseases in the medical image classification tasks might cause the failure of these augmentation techniques.” Why it’s not an issue for the proposed method?
• It would be valuable if the authors include limitation of the proposed method? when would it fail?

borderline reject (5)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Some implementation details (infrastructure, time, environment) as well as clinical/statistical significance are missing. Another issue is the authors propose a data augmentation technique through logit space perturbation, but it is not clear why it still requires the basic augmentations.

• What is the ranking of this paper in your review stack?

3

• Number of papers in your stack

5

• Reviewer confidence

Very confident

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

All reviewers were in agreement on the novelty of the proposed method. The simplicity of the method, ablation studies, improvements in accuracy over other data augmentation techniques and clear writing were also seen as strengths. The main weakness was the lack of multiple run experiments and standard deviations on accuracies. There was also some concern over why recent data augmentation techniques, e.g. cut-out, mixup etc, did not outperform the basic augmentation. However, considering the novelty of the method the strengths outweigh the weaknesses in my opinion. Furthermore, the concerns should hopefully be addressable by the authors.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

N/A