Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

# Authors

Juan Wang, Bin Xia

# Abstract

This paper presents a weakly supervised image segmentation method that adopts tight bounding box annotations. It proposes generalized multiple instance learning (MIL) and smooth maximum approximation to integrate the bounding box tightness prior into the deep neural network in an end-to-end manner. In generalized MIL, positive bags are defined by parallel crossing lines with a set of different angles, and negative bags are defined as individual pixels outside of any bounding boxes. Two variants of smooth maximum approximation, i.e., $\alpha$-softmax function and $\alpha$-quasimax function, are exploited to conquer the numeral instability introduced by maximum function of bag prediction. The proposed approach was evaluated on two pubic medical datasets using Dice coefficient. The results demonstrate that it outperforms the state-of-the-art methods. The codes are available at \url{https://github.com/wangjuan313/wsis-boundingbox}.

SharedIt: https://rdcu.be/cyl20

N/A

# Reviews

### Review #1

• Please describe the contribution of the paper

This paper proposes a weakly supervised image segmentation method based on bounding box tightness priors. The ideas are novel, and the results are promising.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• The idea of the weakly supervised method is novel. It regards the pixels in positive bags with the maximum output values as foreground and potentially creates a pseudo label for the network to learn.
• The results are excellent compared to the baseline methods.
• The method would contribute to medical image segmentation applications with only bounding box annotations.
• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• No qualitative results.
• The results without smooth maximum approximation are not presented.
• As only pixels with maximum values are regarded as positive in each bag, I’m concerned that the method is not suitable for instances with low-contrast boundaries or irregular shapes.
• Medical background is not reviewed.
• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The code is hidden. It is unknown about the reproducibility.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

This paper proposes a weakly supervised image segmentation method based on bounding box tightness priors. The ideas are novel, and the results are promising. The clarity of this paper is clear. Some comments are listed below:

1. It would be better if the authors show some qualitative results, such as the final segmentation masks projected on original images.
2. Adding results without smooth maximum approximation would be better.
3. The authors should review the medical background.

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The methods are novel.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

### Review #2

• Please describe the contribution of the paper

The paper presents a weakly supervised image segmentation method that learns a deep network from tight bounding-box annotations. The main contribution of this work is a generalized MIL framework that uses oriented line segments to form positive and negative bags, a focal loss for the bags, and a smooth maximum approximation for each bag during learning. The method is tested on two benchmarks with comparisons to other box-based weakly-annotated methods.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
• The proposed generalized MIL framework is well-motivated and the idea of using the bags based on oriented lines is interesting and novel.

• The paper is well-written and easy to follow.

• The experimental results are strong on two benchmarks, outperforming the prior arts by large margins. It also provides a detailed ablative study to show the benefits of its two novel components.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
• The overall novelty is limited. The proposed strategy is an improved version of the MIL framework, and the adopted components are integrated from previous works.

• Selection of hyper-parameters. It is unclear how the hyper-parameters are chosen in this method. In particular, from Table 2, it seems the value of \alpha can affect the final performance significantly, e.g., on the ATLAS dataset. What \alpha do you choose in the final comparison, and by what means?

• Please rate the clarity and organization of this paper

Very Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper provides detailed model settings in training and test. That said, some settings of hyperparameters (see above) are missing. The submission does not provide code.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

See item 4 above for details.

accept (8)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Interesting MIL formulation and strong performance.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Confident but not absolutely certain

### Review #3

• Please describe the contribution of the paper

The paper presents a method for weakly supervised segmentation based on tight bounding box annotations. Unlike previous approaches constraining each vertical or horizontal line in the box to contain at least one foreground pixel, the proposed method considers crossing lines at any angle. Based on the MIL approach of Hsu et al., this method paper uses two smooth functions (alpha-softmax and alpha-quasisoftmax) to approximate the max probability over each line segment. An experimental validation is performed on two benchmark datasets for prostate segmentation and brain lesion segmentation.

• Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

1) Novel idea of applying an MIL-based loss on lines at different angles. Another contribution with respect to Hsu et al.’s approach, the approximation of the max operation with a smooth function, also leads to better results.

2) Large improvements compared to two recent baselines, DeepCut and the Global constraint method of Kervadec et al., for both test datasets.

• Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

1) Despite some novel elements, technical contributions remain limited. Using angled lines is a straightforward modification of the standard technique and a soft approximation of the max function is used in several existing architectures.

2) The experimental setting, where tight bounding boxes are provided for each 2D slice of a 3D volume is not realistic and challenging, in particular for oval-shaped organs like the prostate.

3) Although it leads to a higher accuracy, experiments do not really explain the advantage of using lines at specific angles. This is important since, having non horizontal or vertical lines is one of the main contributions of the paper.

4) The writing can be improved.

• Please rate the clarity and organization of this paper

Good

• Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Main implementation details are given in the paper however the code is not provided by authors.

• Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
• p.1: “has been made great progress” –> “has made great progress”

• p.1: “great interests have been made in” –> “there has been a great interest in”

• p.2: Explain clearly the advantage of having lines at different angles compared to having only horizontal and vertical lines.

• Section 2.1: as far as I understand, in the segmentation tasks of the experiments, each voxel is mapped to a single class label. Therefore, there is no overlap of regions and the network output should use a softmax instead of a sigmoid.

• p.3: “any crossing line in the bounding box has at least one pixel belonging to the object in the box”. Technically, the crossing line should also touch two opposite sides of the box.

• p.3: “any pixels” –> “any pixel” … “any bounding boxes” –> “any bounding box”

• Section 3.2: How did you select hyper-parameters? Did you use a validation set? Is the method sensitive to the choice of lambda, beta and gamma?

• Table 1: It is hard to understand why (-40,40,20) yields the best performance. Can you add some explanation of this phenomenon?

• Table 3: Can you add the results Hsu et al. in this table?

• Conclusion: It would be important to point out some limitations of the current method and potential improvements for future work.

borderline accept (6)

• Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While there is some novelty in the proposed method and it outperforms recent approaches for the same task, improvement can be made in terms of clarifying the contributions, evaluating the method in a more realistic scenario (single 3D bounding box per volume) and proper proof-reading.

• What is the ranking of this paper in your review stack?

1

• Number of papers in your stack

5

• Reviewer confidence

Very confident

# Primary Meta-Review

• Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

All reviewers recommend acceptance of this paper based on adequate technical novelty and convincing experimental results on two datasets. The final version should take into account reviewers’ comments, in particular enhancing overall clarity and presentation, as well as including additional experimental details.

• What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

# Author Feedback

Reviewer #2:

1. It would be better if the authors show some qualitative results, such as the final segmentation masks projected on original images. Response: We thank the reviewer for the comment. We have included qualitative results in Figure 2 in the revised paper.
2. The authors should review the medical background. Response: We do not reivew the medical background because the method developed in this paper is general and it is not limited to a specific medical image problem. Moreover, it works for natural images as well.

Reviewer #3:

1. Selection of hyper-parameters. It is unclear how the hyper-parameters are chosen in this method. In particular, from Table 2, it seems the value of \alpha can affect the final performance significantly, e.g., on the ATLAS dataset. What \alpha do you choose in the final comparison, and by what means? Response: We thank the reviewer for pointing out this issue. Indeed, alpha can significantly affect the final performance. Therefore, we compared alpha=4,6,8 for smooth maximum approximation in the experiments and reported those with highest dice coefficients. We have included such explanations in the revised paper.

Reviewer #4:

1. Section 2.1: as far as I understand, in the segmentation tasks of the experiments, each voxel is mapped to a single class label. Therefore, there is no overlap of regions and the network output should use a softmax instead of a sigmoid. Response: Indeed, there is no overlap of regions for the tasks in the experiments. However, in medical images, the overlap regions are common. For example, in retinal fundus images, the optic cup is overlapped with optic disc. In MRI images of glioma, the necrotic/crystic tumor core and enhancing tumor core overlapped with the non-enhancing solid tumor core, which further overlapped with the edema. In fact, we have tested the proposed approach on optic cup and disc segmentation in retinal fundus images and obtained great performance. In our private dataset, the proposed approach obtains dice coefficients of 0.885 and 0.951 for optic cup and disc, respectively when alpha-softmax approximation is considered, and 0.885 and 0.950 when alpha-quasi approximation is considered. For comparison, the full supervision method gets dice coefficient of 0.900 and 0.971 for optic cup and disc segmentation. We did not report these results in the paper due to page limitation. More importantly, for the two datasets in our experiments, only one class of object was considered for segmentation. In this case, the softmax and sigmoid outputs are equivalent.
2. Table 1: It is hard to understand why (-40,40,20) yields the best performance. Can you add some explanation of this phenomenon? Response: The best angle setting depends on the tasks under consideration. It is (-40,40,20) for PROMISE12 dataset and (-60,60,30) for ATLAS dataset. In particular, for an object under consideration, the best setting depends on the shape of the object and the distribution of saliency parts of the object. Conceptually, the setting yielding more positive samples might have great potential to get better performance. Note two or more crossing lines might select the same pixel, in this case, only one sample is selected.
3. Conclusion: It would be important to point out some limitations of the current method and potential improvements for future work. Response: We thank the reviewer for the comment. We have included the limitations and future work in conclusion as follows: “However, there is still performance gap between the weakly supervised approach and the full supervision method. In the future, it would be interesting to study whether multi-scale predictions and adding auxiliary object detection task improve the image segmentation performance.”

The codes are now available at https://github.com/wangjuan313/wsis-boundingbox

Reviewer #2:

1. Adding results without smooth maximum approximation would be better. Response: The results without smooth maximum approximation were shown in Table 1.

Reviewer #4:

1. p.2: Explain clearly the advantage of having lines at different angles compared to having only horizontal and vertical lines. Response: In the 1st passage of Section 2.3 in our original submission, we motivated the use of lines at different angles. To sum up, using lines of different angles yield more positive samples for consideration during training when compared with using only horizontal and vertical lines, thus resulting in better segmentation performance.
2. p.3: “any crossing line in the bounding box has at least one pixel belonging to the object in the box”. Technically, the crossing line should also touch two opposite sides of the box. Response: The crossing line touches two opposite sides of the box, however, these two points might not belong to the object. In image segmentation, the pixels belonging to the object (not the box) are interested.
3. Section 3.2: How did you select hyper-parameters? Did you use a validation set? Is the method sensitive to the choice of lambda, beta and gamma? Response: a) We explained in Section 3.2 of our original submission, “The parameters in the MIL loss (1) were et as lambda=10 based on experience, and those in the improved unary loss (5) were set as beta=0.25 and gamma=2 according to the focal loss [17]”. b) We explained in Section 3.1 of our original submission, “the dataset was divided into non-overlapping subsets, one with 40 patients for training and the other with 10 patients for validation” and “Same as the study in [5], the dataset was divided into two non-overlapping subsets, one with 203 images from 195 patients for training and the other with 26 images from 25 patients for validation”. Therefore, we reported the performance on the validation set. c) For the two datasets in the experiments, the method is insensitive to lambda, beta and gamma.