Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Yuhao Huang, Xin Yang, Yuxin Zou, Chaoyu Chen, Jian Wang, Haoran Dou, Nishant Ravikumar, Alejandro F. Frangi, Jianqiao Zhou, Dong Ni

Abstract

Nodule segmentation from breast ultrasound images is challenging yet essential for the diagnosis. Weakly-supervised segmentation (WSS) can help reduce time-consuming and cumbersome manual annotation. Unlike existing weakly-supervised approaches, in this study, we propose a novel and general WSS framework called Flip Learning, which only needs the box annotation. Specifically, the target in the label box will be erased gradually to flip the classification tag, and the erased region will be considered as the segmentation result finally. Our contribution is three-fold. First, our proposed approach erases superpixel level using a Multi-agent Reinforcement Learning framework to exploit the prior boundary knowledge and accelerate the learning process. Second, we design two rewards: classification score and intensity distribution reward, to avoid under- and over-segmentation, respectively. Third, we adopt a coarse-to-fine learning strategy to reduce the residual errors and improve the segmentation performance. Extensively validated on a large dataset, our proposed approach achieves competitive performance and shows great potential to narrow the gap between fully-supervised and weakly-supervised learning.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_47

SharedIt: https://rdcu.be/cyhMq

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors propose a novel and general Flip Learning framework for WSS based on BBox. According to the authors, the contribution is three-fold. First, the erasing process via Multi-agent Reinforcement Learning (MARL) is based on superpixels, which can capture the prior boundary information and improve learning efficiency. Second, the design of two rewards for guiding the agents accurately. Specifically, the classification score reward (CSR) is used for pushing the agents’ erasing for label flipping, while the intensity distribution reward (IDR) is employed to limit the over-segmentation. Third, the use of a coarse- to-fine (C2F) strategy to simplify agents’ learning for residuals decreasing and segmentation performance improvement. Validation experiments demonstrated that the proposed Flip Learning framework could achieve high accuracy in the nodule segmentation task.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    Problem is very relevant in the scientific community and the experimental framework seems robust

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The dataset used in the experiments is not public, thus the reproducibility is not so straigthforward

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Private dataset without comment about making it free and public

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Authors should revise Table 1 and comments about it. It is confusing to have U-net results in the same table and say that their approach is better (there is contradiction in table 1)

  • Please state your overall opinion of the paper

    borderline reject (5)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed method is not clearly explained, even when the experimental frameworks seems to be carefully designed. It is not easy to follow and read the whole article.

  • What is the ranking of this paper in your review stack?

    4

  • Number of papers in your stack

    5

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    This paper proposed a weakly supervised algorithm for image segmentation using classification and bounding box annotations. An image classifier was trained. Then a segmentation agent was trained such that it flipped the classifier’s prediction by filling a region with the background. The training of the segmentation used reinforcement learning, where the action was to fill a superpixel and the reward was the flip of prediction. The authors further improved the performance by using two agents to fill superpixels simultaneously; penalizing intensity distribution change; adding a stage with finer superpixels. The experiments on 2D US images outperformed other weakly supervised methods and matched fully supervised methods.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The flip learning with reinforcement learning (RL) idea is innovative.

    • One of the common problems in RL is the definition of reward as an agent might learn to cheat. Using a trained classifier and the intensity distribution different, the proposed rewards are meaningful and simple.

    The results are strong.

    • Without pixel-level annotations, the proposed method reached similar performance as a fully supervised segmentation network.
    • The proposed method largely outperformed other weakly supervised methods.

    The clinical value is great.

    • This method only requires class and bounding box annotations, instead of precise mask labels. It largely reduced annotation cost.
    • This method is not specific to a particular image type, thus potentially useful for other modalities.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    It is not clear if the evaluation is on a separate subset. This would impact the entire paper.

    • The trained image classifier should use the same training data as the segmentation agent.
    • The evaluation of the segmentation agent should be on a separate dataset, containing no training data. Although reinforcement learning often train and test on the same environments, for segmentation, the evaluation should not be on a training set. If the evaluation is on the training set, then the obtained results could be not credible.

    Segmentation on unseen data is less efficient compared to end-to-end methods.

    • First, it requires binary label and bounding box annotation.
    • Second, the prediction is performed via reinforcement learning, therefore the agent has to go through superpixels sequentially.

    The prediction is limited by the quality of superpixels.

    • Superpixels may have coarse boundaries, making the predicted mask unnatural.

    The proposed background calculation is limited, as mentioned in the conclusion.

    • Linearly mixing neighbor patches may not be meaningful.
  • Please rate the clarity and organization of this paper

    Satisfactory

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
    • The data will not be accessible.
    • Models were not well explained. The values used for hyper-parameter N and beta for reinforcement learning terminal signals were not given. The architectures for the image classifier and segmentation agent were not given.
    • Evaluation process was not well explained. It is unclear if the authors evaluated on a separate test data set. Although the authors claimed to provide the details of train / validation / test splits in the checklist. The architecture of the fully supervised U-Net was not given.
    • Code would be released according to checklist.
  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    On page 4, agents, It would be better to explain further on the multi-agent training, for instance:

    • Did these two agents share a same network?
    • How the superpixels were assigned to the agents, randomly or left and right?
    • How did the agent traverse the superpixels, randomly or with a heuristic order? The order might affect the final predicted shape.

    On page 6, section 3, it would be clearer if the authors could briefly state what types of labels did other weakly supervised method use.

    • If other methods did not use the bounding box annotations, then it is expected that these methods performed worse.
    • If other methods did use the bounding box annotations, then this makes the results stronger.

    Minor:

    • On page 3, the definition of w_f and w_g were not explained, given a bounding box, how could we define these two values?
    • On page 6, section 3, please cite the white paper of PyTorch https://arxiv.org/abs/1912.01703.
    • On page 7, table 1, the proposed method was not better than U-Net (the supervised baseline), although marked by blue color. It would be better to clarify in the title that it was the best among weakly supervised methods, and it matched the U-Net.
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The proposed flip learning idea with multi-agent reinforcement learning is innovative and simple. However, important details such as dataset split are missing, reducing the credibility of the results significantly.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    This paper proposes a novel and general Flip Learning framework for weakly-supervised segmentation based on BBox. The proposed erasing process via Multi-agent Reinforcement Learning is based on superpixels, capturing the prior boundary information and improving learning efficiency. Two rewards for guiding the agents accurately are designed for pushing the agents ’ erasing for label flipping, and the intensity distribution reward is employed to limit the over-segmentation.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. The paper is well-written and easy to follow.
    2. The proposed method is novel and interesting.
    3. The experimental results seem to be convincing.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The whole algorithm is too complex to be entirely described within an 8-page paper, so it would be better to provide more details, such as making the source codes available after acceptance.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    No source code is provided.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Please see “the main weaknesses”.

  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Overall, it is a good paper and my rating is “accept”. My only concern is that the implementation details are not perfectly elaborated within an 8-page paper.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Two reviewers recommend acceptance of this paper and the third one recommends rejection. While all reviewers agree on the novelty and interest of the technical contribution, they all express serious doubts about the experimental setup. In particular, reviewers are concerned because the dataset in which experiments are conducted is not public and hence the experimental framework and the results cannot be independently verified. The rebuttal should address this points thoroughly and should also clarify the description of the model (R1, R2).

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    3




Author Feedback

We provide explanations to address the main concerns of the reviewers and will improve the writing as suggested.

Q1. Dataset release. (R2, R3) We have discussed with our cooperating hospitals about the dataset release affairs. With their support, we have applied to the IRB. After obtaining the IRB approval, we will release it as the first open-access breast ultrasound MICCAI Challenge.

Q2. Code release. (R3, R5) After acceptance, we will release the source code at (anonymized) https://github.com/miccai-1545/flip-learning. We have uploaded demos, testing software and images to this repository. We believe that this can help improve the reproducibility of the paper.

Q3. Dataset division. (R2, R3) We apologize for the missing description of the dataset split. For experiments, we carefully keep the consistency of dataset division in both classification and segmentation tasks. The dataset was split into 1278, 100 and 345 images for training, validation and independent testing at the patient level with no overlap. The classifier and agents used the same training, validation and testing set.

Q4. Models and the details of agents. (R2, R3, R5) We provide more details for clearer explanation. First, the architecture for both the classifier and agents is ResNet18. Two agents share the parameters in the convolution layers for knowledge sharing, and have independent fully connected layers for decision-making. Second, all superpixels are indexed from 1 to S by the OpenCV function from the top to bottom, from left to right. Both agents start from the center superpixel with index S/2. One agent traverses the superpixels from S/2 to S, the other one traverses reversely from index S/2 to 1. The traverse order has limited impact on our system since the agents take the whole BBox region as the environment. The order we adopted is preferred for simplicity. Third, for agents’ terminal signals, we set N and beta as 2 and 0.05, respectively.

Q5. Comparison experiments /Table 1 (R2, R3) In Table 1, we compared both the fully- (U-net with architecture provided in [10]) and weakly-supervised segmentation (WSS) methods. The result of U-net is referred to compare these two types of methods intuitively. We will change the caption of Table 1 to “The best results of WSS methods are shown in blue” to remove any confusion. Besides, all the other compared WSS methods use the same BBox annotations for labels as our method uses, thus we consider the comparisons are fair.

Q6. Model efficiency. (R3) We need to make it clear that our method only needs BBox annotations, which indicate nodule existence. By using object detection model in our future work, manual BBox annotation will be further discarded through automatic localization to improve efficiency. We agree that the RL-based method is not as efficient as the end-to-end ones. In this regard, we introduced superpixel and multi-agent strategies, which significantly accelerate the inference by more than ten times compared with the pixel-level and single-agent approaches. We will focus on simplifying the RL-based segmentation in the future.

Q7. Quality and assignment of superpixels. (R3) To generate high-quality superpixels, we have tried several advanced methods, including SEEDS, SLIC, MSLIC, SLICO, LSC, etc. Based on our tasks, SEEDS is the superior one. Besides, we also adopted a coarse-to-fine strategy to generate fine superpixels, making the predicted segmentation as accurate as possible. Approximately half of the superpixels were assigned to each agent based on the center index (Details refer to Q4).

Q8. Background calculation and parameter definition. (R3) The background calculation method we proposed is tractable, simple and efficient for our task. Given a BBox, both the w_f and w_g are defined as equal to the width of BBox. In the future, we will adopt advanced methods (e.g., GAN) for better and general background generation.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The rebuttal adequately addresses all concerns pointed out by the reviewers. The availability of public code and data significantly increases the value of this work for the MICCAI community. The final version should include the comments made by the reviewers.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    8



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The authors have thoroughly responded to the comments of the authors. Specifically, they have commented on the data-split of the experimentation which was unclear and potentially conclusion-changing. In addition, the authors have made a considerable effort detailing how their method will be reproducable.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper has good novelty acknowledged by all reviewers, but some complaints about its reproducibility. With the statement in the rebuttal on publicly releasing all code, model and dataset, this should largely resolve the reproducibility issue.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    11



back to top