Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Helena Williams, João Pedrosa, Laura Cattani, Susanne Housmans, Tom Vercauteren, Jan Deprest, Jan D’hooge

Abstract

Automatic medical image segmentation via convolutional neural networks (CNNs) has shown promising results. However, they may not always be robust enough for clinical use. Sub-optimal segmentation would require clinician’s to manually delineate the target object, causing frustration. To address this problem, a novel interactive CNN-based segmentation framework is proposed in this work. The aim is to represent the CNN segmentation contour as B-splines by utilising B-spline explicit active surfaces (BEAS). The interactive element of the framework allows the user to precisely edit the contour in real-time, and by utilising BEAS it ensures the final contour is smooth and anatomically plausible. This framework was applied to the task of 2D segmentation of the levator hiatus from 2D ultrasound (US) images, and compared to the current clinical tools used in pelvic floor disorder clinic (4DView, GE Healthcare; Zipf, Austria). Experimental results show that: 1) the proposed framework is more robust than current state-of-the-art CNNs; 2) the perceived workload calculated via the NASA-TLX index was reduced more than half for the proposed approach in comparison to current clinical tools; and 3) the proposed tool requires at least 13 seconds less user time than the clinical tools, which was significant (p=0.001).

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_30

SharedIt: https://rdcu.be/cyhL8

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The paper presents an interactive 2D segmentation method specific to single simply-connected object segmentation in ultrasound. It does so in a three-step process, the first being automatic segmentation via a U-Net to get an initial pixel-wise representation of the object of interest. This is followed by fitting a 1D circular spline to the object to get a second, smoother representation. The final step is the user dragging on this spline to correct the segmentation.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

One distinct strength of this paper is that it shows a system that is essentially ready for the clinic with a validation performed by actual human users rather than simulations. This gives the paper a much more well-validated feeling and shows that the presented technology is much closer to clinical integration than a large number of other interactive segmentation interfaces. In addition, the comparison against existing clinical methods (4DView Trace and 4DView Point)
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The primary weakness of the paper is the lack of ablation studies within the method itself. It would be more interesting to know how different variations on the presented method could affect the results and the usability. This would also allow for the reader to infer how much of the usability results from strong interface design rather than any novelty in the underlying method. One alternative is to implement the method used by 4D View into the existing system to control for cosmetic interface differences.

There also appears to be something of a mismatch between the scope of the methods being compared. The GE tools appear to be designed for volumetric image analysis rather than 2D. One assumes that the plane selection was done prior to the experiment for all methods, although this is not clearly stated.

Lastly, the size of the experiment is relatively limited with only two experts and evaluation at two time-points. This makes more rigorous evaluation difficult, although the data given suggests the within-method variability is significantly lower than the between-method difference at least for this small cohort.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper appears to be relatively reproducible with the experimental setup described in enough detail that a reader would be able to reproduce something similar to the Beyond software and to construct similar experiments.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The primary way to improve this paper would be to introduce ablations that bridge the differences between the comparative methods. The most obvious would be to use the same basic interface as Beyond with the interaction mechanisms in the 4D View software in order to control for these more cosmetic differences that nevertheless could have a significant impact.

In addition, metrics such as consistency (i.e. user agreement) don’t appear to be collected which would likely strengthen the approach assuming that the more heavily guided Beyond tool gives more consistent segmentations that the entirely manual Point and Trace tools. In addition, ablation studies within the method might also be appreciated as well as basic results for the non-interactive version (i.e. CNN+Bspline without interaction) to get a baseline for these other metrics.

Additionally, they authors should extend the experiment with more participants. This could also be done with more intermediate-level participants if experts are more difficult to acquire. In addition, it would be beneficial to re-run the current two experts to gauge the level of intra-subject variability for each of the methods. This would allow for more rigorous analysis of the quantitative results.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper appears to be the start of a good experiment although it appears to currently be lacking in terms of scientific interest (the methods set forth although possibly a novel combination, are not fundamentally novel) as well as being lacking in terms of validation quality (the comparison is sparse, providing limited scientific information, and has a low number of participants, providing limited evidence of technical superiority). Overall, the work appears to be preliminary but shows a great degree of potential.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper presents a developed tool to segment 2D US images, using a CNN method and it is further improved with a B-splines method active contour segmentation, and after the user can interact, this tasks improve the final result. They perform several experimental results, taking into account different metrics.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well written and structured. Their tool is clinically acceptable, and reflects the expectation of what a health professional looks in order to use it on a daily basis. The 3 steps of the tool, demonstrates that not everything can be solved by a deep learning network. The videos sent were illustrative and powerful to demonstrate the interactivity of the tool, and that is easy to use.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

CNN hyper-parameters are not shown, contrary to the B-spline methodology, that is explained with detail.
The validation of the original tool against the Point and Trace tool was not clear at the beginning but until the moment the results are shown.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Reproducibility can not be done because of hyperparameters of the network used and the data used that is private. Also code is not provided
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

This tool gives a preview of what a clinical tool needs in order to be used in a daily basis. The tool still needs more explanation of how it was constructed, the network and the active contour details to reproduce it. It proposes a very practical solution in the medical imaging area.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a support tool for clinicians. Medical usability is a plus when the authors created the tool. Methodology is improved by user interaction, and taken into account in the process.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

This paper presents a workflow interactive segmentation of 2D images. A first CNN-based automatic segmentation is first computed, and then adjusted by the user with a b-spline explicit active surface (BEAS) framework. This algorithm consists in representing the contour of the shape via a polar function and evolve it so that an energy is minimized. The energy takes into account the result of the CNN but also the image intensities and the interactions of the user (by dragging the control points). The proposed approach is applied to levator hiatus segmentation from 2D images, and compared to the current manual tools provided in the clinical ultrasound system.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Being able to segment images efficiently is very important, yet few papers address the problem of interactive segmentation.
- The experiment design is really thorough, with several criteria (effort, frustration, mental/physical/temporal demand, etc) graded by clinicians. Results were nicely reported (Figure 3, Table 1), with statistical tests showing a clear benefit from the proposed approach.
- Unlike some other interactive segmentation approaches, this method allows the user to edit the segmentation exactly where they want (given enough interactions) and works in real-time, which leads to less frustration for the user.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The technical contributions compared to the original BEAS paper are very limited. The formulation leads to believe that the interactive segmentation and the CNN segmentation are intertwined but it’s not the case. From my understanding, this method is basically equivalent to running a standard BEAS algorithm on a 2-channel input (the original image + a pre-computed CNN segmentation, each with its own Yezzi loss term).
- The baselines (4DView Trace and Point) are very weak. There isn’t any comparison to a standard curve editing (for instance it could have been BEAS without any image-based loss term), let alone another image-based interactive segmentation algorithm. Some recent algorithms are cited, but the authors discard them on the ground that they require too much cognitive load and understanding. I do not find this hypothesis so straightforward that it does not require any experiment: scribbles and clicks are quite intuitive interactions.
- As acknowledged by the authors, there seems to be quite a number of parameters to tune.
- Drawing contours in 2D images is not optimal, but still feasible in a reasonable amount of time. A 3D version of this algorithm would have had much more impact.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Although it is claimed, there is neither code released with the submission, nor a mention in the paper that will be (note that it could have been possible to attach the code anonymously to the submission, as some other papers did).
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- While the localized Yezzi energy makes sense for the image-based term, I am not so sure we need a flexible image term for the CNN segmentation. Wouldn’t it make sense to set u_\theta to 1 and v_\theta to 0, for all \theta?
- I am not sure to understand why “Whitening and histogram normalization were applied to reduce the effects of noise”.
- Apart from that, I have little to comment on: the method significantly relies on an existing framework, and the validation is particularly complete.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I am really on the fence with this paper. On the one hand, the paper is well written and the experimental setup is fine. On the other hand, the technical contribution seems very marginal to me and the baselines were really weak and do not reflect the state of the art in interactive segmentation. Based on its ranking among my batch of papers to review, I am leaning towards rejection but recommend to submit to a workshop like LABELS.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper presents a workflow for interactive segmentation of 2D echo images, consisting of a CNN for pixel-wise segmentation and interactive circular spline fitting for refinement. All reviewers agreed on the practical value of this methodology, a very important aspect for real deployment of MICCAI research in a clinical scenario. Weaknesses include lack of ablation studies of the method presented, limited technical novelty, and an excessive number of hyper-parameters to tune (while to tune them are not clearly documented). In the rebuttal, the authors are invited to clarify the hyper parameter setting, and put the methods into perspective in relationship with other existing tools and curve editing methods.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

6

Author Feedback

We thank the reviewers for their insight and supportive comments, here we address the main points.

Existing tools: “The baselines (4DView Trace and Point) are very weak.” (R3) The baselines in the paper are clinical standard programs used by the pelvic floor clinic at our institute to assess the hiatal area. Nonetheless, as asked by R3 we now performed a comparison to UGIR, a state-of-the-art scribble-based approach published in MICCAI 2020 by Wang et al. (doi:10.1007/978-3-030-59719-1_28). On our clinical dataset, our approach outperforms UGIR in terms of time (63±31 vs 17±12 seconds) workload (NASA-TLX score 76 vs 24) and clinical acceptability (27% vs 97%). This new experiment will be included in the revised paper.

*Ablation study: “The primary weakness of the paper is the lack of ablation studies within the method itself.” (R1) We noticed a typo in the paper, stating the clinical acceptability of the CNN alone was 5%, this should have been CNN+BEAS post-processing, the clinical acceptability of the CNN alone was 2%. This is amended in the paper and acts as an ablation study evaluated by clinical acceptability.

*Regarding comments on novelty, “The technical contributions compared to the original BEAS paper are very limited.” (R3) Our method is the first deep interactive segmentation method to allow for high-precision minimally-interactive boundary-based corrections. While our interactive adaptation method leverages an existing framework (BEAS), our novel combination of BEAS and CNN segmentation presents unique advantages with respect to the state of the art. Indeed, scribbles do not allow the user to impose precise boundaries and extreme points annotations can frustratingly be overridden by existing extreme-point algorithms. We also highlight the novelty in our US segmentation evaluation protocol using experts to grade segmentations and evaluating the workload in a structured fashion (NASA-TLX).

*Hyper-parameters: “CNN hyper-parameters are not shown, contrary to the B-spline methodology” (R2) “As acknowledged, there seems to be quite a number of parameters to tune.” (R3) Standard network training hyper-parameters (batch size, data augmentation, learning rate) are reported in Section 1.4. We will include additional details regarding the U-Net architecture and hyper-parameters in the revised version. CNN parameters were tuned based on the performance of the training dataset and parameters were chosen based on literature by Bonmati et al.(doi:10.1117/1.JMI.5.2.021206). Regarding BEAS post-processing and interactive mechanism, parameters are noted in Section 1.4 (i.e., number of knots, 32 and the size of the neighbourhood, 100 and 10 respectively). The value of the scale factor, h was not noted, and it will be added, as h=1. We based the tuning of BEAS parameters from Barbosa et al. (doi: 10.1109/TIP.2011.2161484). The parameters were tuned on the training dataset and the contours were assessed; this is now clarified in section 1.4. The number of knots was chosen empirically, this limitation is now clarified in the limitation section, and as we mention in the discussion, we plan to automatise this in future work. Hyper-parameters defining equation (4) were identified by a grid search method and evaluated by assessing the performance. Qualities we looked for were the contour following user-defined points, retaining the CNN prior and BEAS shape constraints. Similar results were obtained for different parameter settings. In future work, we aim to perform a study showing the performance with varied combinations for several segmentation tasks. This is now clarified in the discussion.

*3D “A 3D version of this algorithm would have had more impact.”(R3) We aim to expand the work into a 3D interactive segmentation tool as stated in the discussion. However, we believe there is still a strong clinical need for a high precision 2D interactive segmentation tool for medical imaging analysis, i.e., ejection fraction and hiatal area.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

Interactive editing of deep learning results is of high practical interest and needs more discussions in MICCAI. The authors have also properly addressed the concerns in the rebuttal including ablation study and hyperparameter tuning.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

7

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The practical value of this work has been acknowledged in the reviews. I found the authors rebuttal was reasonably responding to the concerns raised by the reviewers, on all aspects from technical novelty to hyperparameter tuning, and hence I recommend acceptance for this paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

11

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper presents a CNN-based interactive 2D segmentation framework for ultrasound images, demonstrating adjustments in real-time and requiring less user-time and workload than existing methods. The study is well-motivated and of clinical interest, and validated in a clinically relevant setting across experienced human operators. The main concerns from the reviewers regarding limited technical contribution, weak baseline comparison, and missing details regarding hyper-parameter tuning have been addressed by the rebuttal to a great extent.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

back to top

Interactive segmentation via deep learning and B-spline explicit active surfaces