Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Adrian Galdran, Gustavo Carneiro, Miguel A. González Ballester

Abstract

Highly imbalanced datasets are ubiquitous in medical image classification problems. In such problems, it is often the case that rare classes associated to less prevalent diseases are severely under-represented in labeled databases, typically resulting in poor performance of machine learning algorithms due to overfitting in the learning process. In this paper, we propose a novel mechanism for sampling training data based on the popular MixUp regularization technique, which we refer to as Balanced-MixUp. In short, Balanced-MixUp simultaneously performs regular (i.e., instance-based) and balanced (i.e., class-based) sampling of the training data. The resulting two sets of samples are then mixed-up to create a more balanced training distribution from which a neural network can effectively learn without incurring in heavily under-fitting the minority classes. We experiment with a highly imbalanced dataset of retinal images (55K samples, 5 classes) and a long-tail dataset of gastro-intestinal video frames (10K images, 23 classes), using two CNNs of varying representation capabilities. Experimental results demonstrate that applying Balanced-MixUp outperforms other conventional sampling schemes and loss functions specifically designed to deal with imbalanced data. Code to reproduce our results is released at \url{github.com/…}.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_31

SharedIt: https://rdcu.be/cyl55

Link to the code repository

https://github.com/agaldran/balanced_mixup

Link to the dataset(s)

https://www.kaggle.com/c/diabetic-retinopathy-detection

https://osf.io/mh9sj/


Reviews

Review #1

  • Please describe the contribution of the paper

    The paper addresses a very important problem of medical domain, which is the issue of data skew. They have proposed a Balanced Mix-up sampling technique, which is focused on using regular instance-based and class-sampling based strategies. They have implemented this sampling strategies on 2 different classification tasks.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The approach is good and it is kind of a smart technique, including the modification in existing methods.
    2. The implementation is also clear and well explained.
    3. While dealing with a very important technical issue in medical domain, the paper does not give a very novel solution, but seems to be an effective solution.
    4. The authors have provided various other methods to show the effectiveness of their approach.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. At some places, some sentences are not clear, for e.g., in Section 2.3. It should be elaborated well.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors provide a good explanation of used method, and also they are willing to give their codes also, on some repository. The paper seems to be reproducible.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. At some point, the author should give more explanation, for e.g., what is significance of selected values of alpha in Section 2.1, and How did you obtain the three values of alpha as 0.1, 0.2 and 0.3.
    2. We see that proposed method outerperforms the existing sampling methods but the performance is slightly better than instance-sampling. Does this much difference significant in presented classification scenarios, where we have sufficient test samples.
    3. The authors should provide performance measures without using any data augmentation method also.
    4. Please follow a common format for references.
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Good problem to attempt. Smart solution of the problem statement but lack of technical novelty.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    2

  • Reviewer confidence

    Somewhat confident



Review #2

  • Please describe the contribution of the paper

    This work proposes Balanced-MixUp to address imbalanced medical image classification. The main idea is to combine MixUp regularization and sampling strategies in a unified framework. Experimental results for imbalanced diabetic retinopathy grading and gastrointestinal classification demonstrate its effectiveness.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    1) This paper is well-written and easy to understand. 2) The proposed method is easy to implement. 3) Quantitative experiments well validate the core idea.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    1) The novelty is rather limited, since combing the MixUp and two-branch-resampling strategy has been proposed in BBN [1], and the proposed method is a special case of BBN (if conducting the mixup strategy in the image space). Though the proposed method works for medical image classification, BBN should be discussed in related works and comparison experiments.

    Reference: [1] BBN: Bilateral-Branch Network with Cumulative Learning for Long-Tailed Visual Recognition, CVPR 2020.

    2) The performance improvement is limited compared with the baseline (instance-sampling). After all, there is no re-balancing strategy is used in baseline. Instead, the proposed method includes a hyper-parameter (\alpha) that somehow makes the result sensitive. 3) In Tabel 1 and 3, the results of MixUp should be listed, which will be helpful to quantify the performance improvement of the proposed method, brought by regularization or rebalancing.

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    1) The proposed method is validated on two public datasets, i.e., Eyepacs database and Hyper-Kvasir dataset. 2) The authors have promised releasing their source code, if the paper is accepted. 3) The implementation details are sufficient to reproduce the results.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    1) The differences between the proposed method and BBN should be carefully discussed. 2) Considering that most of the comparison methods fails to outperfom the baseline, finding out the reasons of this phenomenon may be meaningful. 3) The ablation analysis on removing the class-balanced resampling (i.e., only instance sampling and mixup strategy are used) should be necessary to quantify the contributions (regularization or rebalancing).

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The noverity of the proposed method is rather limited, and it appears like a special case of the previouly proposed BBN (Zhou et al.).

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    This paper presents a novel data augmentation scheme, Balanced-MixUp to overcome imbalance-related poor performance in medical image classification problems. The authors build their technique upon an existing new data augmentation method, mixup which generates new training data sample by convex combinations of two data samples as well as their corresponding labels. This paper proposes to sample two data points using both instance- and class-based sampling procedure instead of random sampling proposed in the original mixup algorithm. Balanced-MixUp was shown to improve the performance in majority of the cases in both diabetic retinopathy grading and gastrointestinal image classification tasks across different datasets.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • A novel formulation for data augmentation
    • Well-written paper
    • High motivation: the addressed problem important and highly occurring in medical image applications
    • Benchmarking with different methods on different tasks and on different datasets
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Lack of theoretical justification
    • Marginal improvement in most of the cases
    • Missing -important- baseline (comparison to mixup)
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    The authors claim in the abstract that their code to reproduce the results will be made available. In additional, they also provided enough details about training parametrization in the paper.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    This paper presents a very interesting approach to overcome imbalanced data problem and overall, I am positive about its acceptance. However, there are a few concerns that I want to bring up as follows:

    • The authors claim in Sec. 2.3 that combining instance-based and class-based sampling will “induce a more balanced distribution of training examples by creating synthetic data points around regions of space where minority classes provide less data density”. How do the authors support this? This might be my lack of understanding, but this does not look so obvious to understand. Is there any theoretical justification of this claim?
    • Why did the authors deviate from the original mixup formulation and used Beta(\alpha, 1)? Is this some empirical finding?
    • There is an important missing experiment: the authors did not compare their method to the mixup technique, which should be the very first comparison to be shown in the paper. Since Balanced-MixUp basically derived from mixup, the benefits of the new method should be first shown over mixup before other techniques. I am personally curious to see how would the mixup method perform on the same tasks.
    • It would be interesting to see how the new generated samples look like with the corresponding mixed training samples and \lambda parameter.
    • The authors mention about higher performance of some published works ([11, 29]) on Eyepacs set. Could this approach be used with them and improve those mentioned works’ performances even more?
    • Is there any limitation of the method or the cases where it led to worse performance? It would be nice if the authors can comment on this.
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I believe the paper addresses an important problem, proposes an interesting solution and shows improved performances. However, I believe the claims made could be supported better and a few additional experiments are needed (see my comments above).

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    This paper addresses the issue of data imbalance by proposing the use of a data augmentation scheme. The paper addresses a very important topic for the MICCAI audience and shows experiments on two independent datasets. However, the reviewers have raised several questions regarding theoretical justification, questions about the comparison to baseline, clarity in explanation compared to BBN and novelty. Please respond to these in your rebuttal.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4




Author Feedback

We thank reviewers for finding our work interesting and with potential impact for the MICCAI community. Please find below our responses to their comments.

1) Reviewers 2 and 3 stress the need of including the performance of MixUp alone as a baseline in our experiments. We agree on the relevance of such comparison, and report now that analysis. Our findings indicate that MixUp regularization only brings a performance improvement when compared to standard instance-based sampling in the endoscopic image analysis experiment for the MobileNet architecture. In this case, performance is better than most of the other techniques, but MixUp still underperforms our proposed approach. For all other experiments (endoscopic with ResNext50 and retinal imaging with both architectures), MixUp alone leads to a decrease in performance. These results seem to reinforce the conclusion that Balanced MixUp is indeed an effective extension of MixUp for the imbalanced classification scenario. Our tables have been updated accordingly.

2) Reviewer 2 finds some similarities between our technique and a CVPR 2020 paper (BBN), and suggests a discussion of the differences between both methods. After careful analysis of the BBN technique, we agree that both approaches share some common patterns, but respectfully disagree with R2’s comment on our technique being a special case of BBN. The main difference between BBN and our method lies on the point in which data points are combined: unlike our approach that mixes data in the input image space, BBN mixes data in the feature space, which brings it closer to other techniques like SMOTE or Manifold MixUp. Another relevant difference is that BBN uses a Reversed Sampler that will draw much more frequently minority examples than our Class-balanced sampling, helping to prevent overfitting. Last, BBN has an extra layer of complexity due to the presence of a Cumulative Learning module, which requires to know beforehand the number of epochs for which the model will be trained, and introduces an extra hyperparameter in the “Adaptor layer”. In short, we believe that our Balanced MixUp is a more adequate technique for medical data imbalancing scenarios. We will add part of this discussion to our literature review.

3) Reviewers 1 and 3 mention lack of clarity in some of our explanations. In particular, both reviewers wonder why we selected Beta(\alpha, 1) with \alpha=0.1, 0.2, 0.3 as our mixing distribution. First, we apologize for a typo in eq. 3 which may have led to some confusion: \lambda and 1-\lambda should be swapped here. Second, the rationale behind our choice is that we intend to avoid the excessive sampling of minority class examples, which can easily lead to overfitting. By formulating our method like this, as \alpha tends to 0 we recover the standard instance-based sampling, whereas as \alpha increases we add more minority-class samples to the mix, which results in a hyperparameter that behaves more intuitively for the user.

4) Reviewer 3 asks why our approach will “induce a more balanced distribution of training examples by creating synthetic data points around regions of space where minority classes provide less data density”. We admit a lack of theoretical analysis at this point, which is beyond the scope of this work and will be considered in the future. To compensate, we included Fig. 2 to give an intuition about why this is the case. In this image, the right hand side subplot shows schematically the impact of Balanced MixUp on the examples the model observes, where convex combinations of minority and majority class examples result in a less sparse data space. This is to be compared with regular instance-based sampling (leftmost subplot) and conventional oversampling (center subplot), where the sampling patterns contribute nothing to populate the data manifold.

We would like to thank the AC for constructive criticism, and hope our response will clarify the most relevant doubts highlighted by reviewers.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal seems to address most of the concerns of the reviewers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    7



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    Basically, the reviews are positive and consistent. And the authors’ response clarifies some details especially on the comparison with the baseline and BBN, which convinces me to recognize its novelty. In general, I agree to accept this paper.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    6



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The proposed solution is an improvement upon existing MixUp augmentation strategies. The presentation is solid and clear and the answers to the reviewers point clarify further the originality of the work.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    6



back to top