Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Lin Wang, Lie Ju, Donghao Zhang, Xin Wang, Wanji He, Yelin Huang, Zhiwen Yang, Xuan Yao, Xin Zhao, Xiufen Ye, Zongyuan Ge

Abstract

In medical image segmentation, it is difficult to mark ambiguous areas accurately with binary masks, especially when dealing with small lesions. Therefore, it is a challenge for radiologists to reach a consensus by using binary masks under the condition of multiple annotations. However, these uncertain areas may contain anatomical structures that are conducive to diagnosis. Uncertainty is introduced to study these situations. Nevertheless, the uncertainty is usually measured by the variances between predictions in a multiple trial way. It is not intuitive, and there is no exact correspondence in the image. Inspired by image matting, we introduce matting as a soft segmentation method and a new perspective to deal with and represent uncertain regions into medical scenes, namely medical matting. More specifically, because there is no available medical matting dataset, we first labeled two medical datasets with alpha matte. Secondly, the matting methods applied to the natural image are not suitable for the medical scene, so we propose a new architecture to generate binary masks and alpha matte in a row. Thirdly, the uncertainty map is introduced to highlight the ambiguous regions from the binary results and improve the matting performance. Evaluated on these datasets, the proposed model outperformed state-of-the-art matting algorithms by a large margin, and alpha matte is proved to be a more efficient labeling form than a binary mask.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87199-4_54

SharedIt: https://rdcu.be/cyl4O

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors present experiments using alpha mattes as a prediction target for segmentation problems. These mattes were generated from data that had been segmented by multiple observers using a pipeline which leveraged their disagreement to produce a trimap which was then provided to an existing matting method from the literature. The authors present empirical evidence that the resulting maps were preferred by clinicians in terms of their ability to faithfully represent the spatial extent of the underlying structure. The authors further propose a multitask approach to simulaneously predict the alpha matte as well as binary masks that were produced by applying thresholds to the alpha matte. They show that their multitask approach outperforms a single-task approach as well as existing matte prediction approaches. The authors promise to release the underlying data for their experiments once the review process is complete.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

“Soft” segmentation labels like the authors’ “alpha mattes” have enormous potential in medical image segmentation. Resolution limitations and obscuring artifacts often make ROI boundaries literally impossible to delineate exactly. The traditional approach to “uncertain” labels is to have multiple annotators delineate the region independently and treat these as samples from the underlying distribution, which captures the uncertainty. This work attempts to represent the uncertainty directly, and use it as a prediction target. I have to agree with the authors here that this is a far more intuitive and elegant approach. I also like that the authors solicited the opinion of expert clinicians on this topic who also overwhelmingly preferred this representation to the the multiple-annotation alternative. Another strength of this work is the empirical experiments which show the value of these derived alpha mattes as versatile objects for creative loss functions.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The alpha mattes being used in this work are ultimately derived from multiple annotations, which means they carry many of the limitations that multiple annotations do – namely that they have some bias toward “modes” of plausible region boundaries. For example if 1000 raters are all 90% sure the boundary is at position A rather than position B, then they will all annotation for position A, rather than the 90-10 split that one would hope for. When raters are asked to make binary decisions, you will lose information regardless of how many raters you have. I will concede that the authors mention that the clinicians were asked to tweak the derived alpha mattes, but no information was given about how often this tweaking was done or what tools were used. I imagine that manually providing voxel-wise confidence values is extremely tedious, so I can’t imagine it was done often or done carefully when attempted.

Aside from this, I have only a few minor suggestions/points of confusion:
- At first I didn’t catch where the P_c function from equation 2 was defined – perhaps the “average score map” should be introduced formally
- It would be nice to mention the methods for creating the ground truth alpha matte within the methodology section
- It’s probably not a good idea to state in your conclusions that “new experiments will be carried out to prove its value in diagnosis” – of course you don’t know whether its value will be proved or not until the experiments have been run. Perhaps “…to test its value…” would be better
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors mention that they will be releasing the mattes that they generated. I didn’t see a statement about the pretrained models or code. It would be nice if they released those as well.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

I like how the authors involved clinicians in the evaluation process and valued their qualitative impression of the proposed representation. Presumably, clinicians have spent their careers building some kind of high-fidelity semantic representation of depicted structures in their minds when they read films. We should be doing our best to emulate and extend these rather than sticking to discrete segmentation masks just because they’re convenient.

I appreciate that there is a dearth of datasets and challenges with multiple annotations, but it would be nice to benchmark the proposed method on a challenge leaderboard.

Also, I appreciate that you are running up against space constraints, but it would be nice to discuss the limitations of this work. Certainly the soft labels have their advantages, but can we be sure that they faithfully represent the underlying uncertainty? How can we empirically evaluate the quality of labels like these?
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The authors clearly outlined the problem they are addressing, how they are addressing it, and supported their claims with experimental data. The problem highly important and relevant to the MICCAI community, and existing solutions are inadequate.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

The paper proposes using alpha matting for medical image segmentation to alleviate the ambiguity issues related to binary masks. To this end, a specific architecture and dedicated losses are proposed. The approach is evaluated on two public datasets, for which the ground truth alpha mattes were generated and made publicly available. The results show a superiority of the proposed approach compared to existing image matting techniques.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Important topic: By addressing the inherent ambiguity, the paper addresses an important problem in medical image segmentation. I like that the paper tries to bring a new perspective to the problem by approaching it with image matting. It is refreshing to see alternatives to the commonly used uncertainty estimation techniques.
- Feedback from clinicians: I appreciate the effort that the authors put into obtaining the feedback of clinical experts for the alpha mattes.
- Publicly available annotations: The author generated alpha mattes annotations for two medical image segmentation datasets and will make them publicly available.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
-Difficult to read: The writing is convoluted and contains many errors. This renders the reading, and consequently, the understanding of the paper difficult. Also, it makes it hard to the motivation and details of the paper.
- Overly complicated method: The method includes many elements/parts that are not really motivated. I wonder whether it wouldn’t be possible to regress the alpha matte from the input image directly. And why is the probabilistic U-Net essential when the GT alpha matte can be sampled/thresholded (as done for the training of the probabilistic U-Net)?. Besides the missing motivation, the different elements are lacking proper ablation. By omitting ablations of individual elements (e.g., probabilistic U-Net, gradient loss, uncertainty-weighted loss, channel-wise attention), the reader cannot identify which elements are essential and which might be omitted/replaced. Finally, the missing motivation and missing ablations lead to an unclear message; I do not know what the paper wants to tell or what I should learn from it.
- Missing experiments to back up the claims: The title claims that the paper is providing a new perspective on medical image segmentation with uncertainty by introducing matting. However, the proposed method is only compared to existing matting techniques. The paper lacks experiments that show the benefit of matting over using uncertainty/certainty maps (not binary masks) in the first place. Only the experiment in Fig. 5 provides some information about the usefulness of matting. However, the performance of prob. U-Net and prob. U-Net+Matting seem to be almost identical, suggesting no benefit.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
- The reproducibility checklist was inappropriately filled in. For instance, the authors affirmed the options “The exact number of training and evaluation runs,” “Details on how baseline methods were implemented and tuned,” “A clear definition of the specific evaluation metrics and/or statistics used to report results,” “A description of results with central tendency (e.g. mean) & variation (e.g. error bars),” “A description of the memory footprint,” but they did neither of those.
- Although publishing code for the review is not mandatory when promising code release, it would be beneficial to provide code already during the review phase for clarifying questions regarding the implementation.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
To improve the paper in the future, I suggest the following:
- Re-iterate the writing and proofreading to improve the readability, and consequently, the understanding.
- Define a clear message: Currently, I am not sure whether the message is to show that the proposed approach is improving over the existing matting methods or if the paper should actually show the benefits of matting (as the title suggests). Defining a clear message and setting up experiments that support this message would greatly benefit the paper.
- Simplify the method: I suggest simplifying the method if the main message should be showing the benefits of using alpha mattes in medical image segmentation. A simple method that shows the benefit of your new perspective will have much more impact than a convoluted, overly complicated method.
- Weaken claims: The paper includes several claims that are not backed up by experiments or by references, e.g., “our model outperforms all the other methods in both datasets, which illustrates our method is more applicable to the medical scenarios,” “[…] which has a better ability to reveal tiny and ambiguous structures and has a big potential for diagnosis,” “and prove that alpha matte is a more powerful annotation method than the binary mask.” Please weaken such claims or back them up with experiments or references.
- Metrics: Please shortly explain the metrics used for comparison in Table 1. Also, it would be beneficial to explain the meaning of the specific/uncommon metrics Dice_u and Dice@0.5 and why they are used.
- Baselines: Shortly explaining the baselines and motivating their selection would be helpful, such that a reader does not need to look up every of the referenced papers. Also, clarify how you apply them to your data.
-Clarify motivation for alpha matte as uncertainty: It looks as that the generated alpha mattes very much resemble an image multiplied by a mask and are, thus, closely related to the actual image intensities. Due to this relation to the intensities, I wonder if it is valid to use mattes as reference for segmentation ambiguity/uncertainty. Motivating the use of mattes as uncertainty/ambiguity in the introduction would be helpful.

Minor:
- Title: The title might extend the title by “medical IMAGE segmentation” to improve the clarity.
- Equation 7: I am not sure what two values “max” is comparing. Please clarify.
Please state your overall opinion of the paper

reject (3)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The poor writing, the unclear message, as well as the missing ablation and motivation of the overly complicated method are the major factors that led to my overall score.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

3
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

The paper tackles medical image segmentation from the perspective of image matting. The authors prepare a matting dataset by considering multiple annotators. The deviations of probabilistic U-Net predictions are used as a measure of uncertainty, creating a trimap for a secondary image matting network.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Tackling medical image segmentation as image matting, using uncertainty to form a trimap, is an interesting idea.
- The obtained results are qualitatively and quantitatively promising.
- The gathered manually labelled datasets for the study on uncertainty learning and matting in medical scenarios will be publicly available to the research community
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The method needs multiple segmentation labels per training sample. Noticing the high cost associated with the acquisition of medical labels, this bottleneck could limit the applicability of the proposed approach.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The provided details seem to be adequate and the method has good reproducibility. The authors mention the training code will be available upon acceptance.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- The adaptation of the methodology for imaging modalities with blur edges, such as ultrasound data, could be an interesting investigation.
- The extension of the methodology for 3D image segmentation, and considerations for temporal data incorporation for video analysis, could also be an interesting future exploration.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The major factors for my recommendation, as detailed in the “strengths” section, are the obtained results and novelty of the application.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The submission considers performing medical image segmentation with uncertainty using alpha matting. The proposed strategy is interesting and looks promising. However, reviewers have concerns about the readability of the manuscript, the complicated method with some unmotivated components, the missing experiments, and the use of multiple segmentation labels. In the rebuttal letter, the authors are suggested to addresses these concerns, and other questions from the reviewers if space allows.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

Author Feedback

We sincerely thank all the reviewers for their time and efforts. We reorganized the comments and responded to them as follows:

Q: Motivation of using alpha matte for uncertainty measurement A: In medical segmentation tasks, we found that the fuzzy regions are difficult to annotate precisely, which are the potential reasons for the inconsistency between multiple annotations. i.e., the uncertainty is related to the structural information of the corresponding image.

Image matting is able to deduce the estimated probability through the patterns of foreground and background regions. Inspired by this, we introduce matting into medical scenes as a soft segmentation method and a new perspective to handle and represent uncertainty region, namely medical matting. Apart from that, our ablation study shows that using soft labels to annotate medical images is informative as it is equivalent to leveraging multiple binary masks for training, which provides a potential route for more efficient use of limited data. In our latest study, experiments reveal that alpha matte outperforms binary masks trained model in the 3D CT nodules staging on LIDC, suggesting its potential application in various diagnosis tasks.

Q: Network structure complexity A: As described in Section 3, our end-to-end network only consists of 2 major parts. The #MASK GENERATOR outputs various segmentation predictions under the target distribution. Such intermediate results are merged to create an UNCERTAINTY MAP, which assists the following #MATTING NETWORK like the trimap. Neither of these two sub-modules is dispensable.

To simplify the model design and demonstrate the efficacy of the alpha matte idea, we only use a vanilla UNet architecture for prob-UNet rather than its more complex descendants.

Q: Alpha matte with a single model It may seem possible to regress the alpha matte directly from the model design perspective. But, it is a consensus in general image matting that it is challenging to regress each pixel directly, as matting is a task intertwined with regression and classification. Hence the trimap mechanism is frequently used both in Laplacian-based and deep learning-based methods [1,5,6,8,11]. Our experiments also show that the uncertainty map, which analog to trimap, can achieve the best performance.

Q: Experiments: Abalation & Comparison & Metric A: As aforementioned, The two sub-models #MASK GENERATOR and #MATTING NETWORK must work together, so ablation study on submodel can not be performed. Our work is the first work using matting to handle uncertainty. There exist no models for a direct and comprehensive comparison. So we designed the comparative experiment in two aspects. One aspect to compare the existing matting methods quantitatively, and shows that general matting methods can not be transferred directly to medical images. Another, we demonstrate that alpha matte can be a better method than the multiple binary labels in learning the uncertainty distribution, which provides a new direction for improving the soft boundary-based annotation method of medical image.

We chose to use the adapted Dice metrics because Dice is a commonly used metric in segmentation, and the modified version can evaluate the similarity between the predictions and the targets at the distribution level and individual level, respectively.

The change of the performance for matting metrics in recent literature is relatively small in value. Work from [5] uses x1000 to emphasize the difference of their method compared to others.

To further increase the readability without changing the initial context, we have added a short overview for each model part. We have improved the structure of the Methodology section. Fig.3 also has been modified to make each sub-module more distinguishable to each other.

At last, the authors would like to thank the reviewers and ACs again for their thoughtful comments.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors addressed most of the conerns raised by the reviewers in the rebuttal period. This submission could attract a lot of attention in the MICCAI community; therefore, it is recommended for acceptance.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

In this paper, the authors proposed to include image matting into medical imaging segmentation task. It is a clear benefit to the clinicians (verified in Fig 2.). The way they collaborate with the clinicians should be encouraged. The release of the data shall also benefit the community. These has been appraised by almost all reviewers too. The weakness of this paper is no discussion on the extra effort for generating such image matting. This makes it difficult to be applied to the other tasks. The experiments part needs more work, especially Fig 4 (image too small) and Fig 5 (message unclear). This work needs to consider most MICCAI readers are not familiar with image matting. Current version is difficult to enlighten the readers.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

12

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper proposed an interesting idea of using alpha matting to deal with uncertainty in medical image segmentation. The concerns raised by the reviewers mainly include writing of the paper, motivation of the method, and description of some evaluation metrics. The authors clarified these points well in the rebuttal. The small weaknesses do not obscure its strength. Thus, I recommend acceptance of this paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

back to top

Medical Matting: A New Perspective on Medical Segmentation with Uncertainty