Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Yukun Ding, Dewen Zeng, Mingqi Li, Hongwen Fei, Haiyun Yuan, Meiping Huang, Jian Zhuang, Yiyu Shi

Abstract

One of the foremost challenges of using Deep Neural Network-based methods for a fully automated segmentation in clinics is the lack of performance guarantee. In the foreseeable future, a feasible and promising way is that radiologists sign off the machine’s segmentation results and make corrections if needed. As a result, the human effort for image segmentation that we try to minimize will be dominated by segmentation correction. While such effort can be reduced by the advance of segmentation models, for ultrasound a novel direction can be explored: optimizing the data acquisition. We observe a substantial variation of segmentation quality among repetitive scans of the same subject even if they all have high visual quality. Based on this observation, we propose a framework to help sonographers obtain ultrasound videos that not only meet the existing quality standard but also result in better segmentation results. The promising result demonstrates the feasibility of optimizing the data acquisition for efficient human-machine collaboration.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87193-2_44

SharedIt: https://rdcu.be/cyhMm

Link to the code repository

N/A

Link to the dataset(s)

https://github.com/dewenzeng/effort_prediction_mce

Reviews

Review #1

Please describe the contribution of the paper

The paper proposes a correction effort prediction model that can enable the data acquisition process to be exploited; improved input data to a segmentation model can result in better segmentation and thereby lower manual correction effort.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The paper does a good job at describing the key motivation behind improving the data acquisition process, so that the correction effort of a radiologist to correct segmentations is reduced.
2. The paper describes the integration of the correction effort prediction model within the sonographer’s workflow.
3. Through experiments, the authors validate their cost analysis and show that the failure rate (% of frames needing segmentation correction) was reduced from 10.3% to 5.5%, while the number of re-scans needed only increased to about 1.4 (in the binary classification setting).
4. The authors also describe the time savings achieved by the effort prediction model: each rescan takes 14s, with the low-effort and high-effort correction being 19s and 45s respectively (binary classification setting).
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. Perhaps the only weakness is the lack of information related to the predicted effort required by an inexperienced sonographer/trainee. This would have been nice to see, so that an upper bound on the re-scan time.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The methodology in the paper can be reproduced since it is a simulated experiment, however this can only be done upon the release of the source code.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1. It would have been nice to see the rationale for the choice of ResNet-18 as opposed to other networks. Additionally, it would also have been nice to see different effort prediction models being compared against each other. Perhaps this might be relegated to future work.
2. While the concept of “time saved” through the absence of re-scans and reduction in correction effort is certainly appealing in the radiology value chain, it may be specific to the segmentation (individual) task alone within the workflow. That is, it may not necessarily result in the “time savings” in downstream tasks such as report generation, primary care physician (referrer) communication, patient engagement etc. This is just an observation based on the benefits derived from the utility of the proposed approach.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The motivation for the effort prediction model is sound and addresses a valid bottleneck in the radiology/sonography workflow. The cost-benefit analysis that was theoretically proposed is validated by the experiments that were conducted, and the quantitative measurements regarding re-scan times, failure rates, and correction times spent by the sonographer/radiologist contributed were useful.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

This paper proposes a new approach for human-machine interaction in ultrasound imaging. Their application is the acquisition and segmentation of myocardial echocardiography images for perfusion analysis. The overall goal is to obtain accurate segmentations of the myocardium, but instead of optimizing and generalizing the segmentation model itself, they aim at optimizing the data acquisition for not only human but also machine vision. A novel framework/protocol for US acquisition is proposed, where scans with incorrect segmentations are repeated. These incorrect segmentations are detected by a correction effort prediction model. It estimates the effort needed for manual correction, instead of detecting a low performance in typical segmentation metrics such as the Dice score. The method is evaluated on a retrospective dataset with repetitive scans for 130 patients.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

• Novel approach for human-machine interaction for ultrasound imaging. Instead of focusing on the segmentation model, this work takes the characteristics of US acquisition into account to make the application of AI models more feasible in current clinical routine. A real-time error prediction model is used to give feed-back during scanning, so that acquisition of data with poor segmentation performance can be repeated. • The error prediction doesn’t predict errors in the segmentation itself, but rather the cost it would take to manual correct the segmentations. This is a novel approach to this problem.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

• Details on training and implementation are missing. It is unclear which segmentation model was used and how the performance of this model affects the proposed approach. • A discussion on clinical feasibility and sonographer opinion is missing. How acceptable would it be for an expert operator to repeat the acquisition of a completely useful image (visually) just to “please” the AI algorithm? • Evaluate only on retrospective data. Only one dataset, without cross-validation.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors do not give details about the coding framework, computing infrastructure used in this work. Also, no information about the training process is provided, and hyperparameters are not reported. The dataset it private, but the authors plan to release it to the public in the future. However, important information on the data is missing, e.g., which US system is used, if all patients are control subjects or not, where the scans performed by the same sonographer, etc.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

This work proposes an interesting framework for human-machine interaction in clinical US acquisition. The real-time and operator-dependent nature of US requires new approaches for clinical routine, which is addressed here. The two contributions (to couple US acquisition with segmentation success and measuring segmentation performance by the effort needed for manual correction) are valid and interesting. My concerns are about the description of the methods and the data used for evaluation. First, the paper provides too little details about the implementation and training of the models. It is impossible to reproduce this work. It is unclear which segmentation model was used, what is original performance was and on which data it was trained. Was it trained on the same dataset, from the same scanner? How was the error prediction module trained? Second, the method is only evaluated on retrospective data, although I do acknowledge the difficulty of a prospective study. Different error prediction models are compared (with and without attention), and I particularly like the results with the model using Dice for error prediction. This is an interesting finding. However, this paper proposes a new approach for human-machine interaction, but experiments with this interaction are not conducted. They should at least discuss possible pitfalls and sonographer opinions. Would it be difficult to convince a US expert to this way of scanning and to repeat visually accepted scanning results just to “please” the AI model? Important would be also to know how a reduced error rate influences the perfusion analysis (which is the final goal of the imaging). For future work, I would recommend to extent the evaluation by a prospective study. I would include the US operator as much as possible. A study with multiple sonographers would be very interesting. In addition, other datasets and tasks should be included. Also, would it be possible to connect the segmentation performance with the perfusion outcome in one model? Minor comments: • Introduce DNN abbreviation in Intro, instead of abstract • Fig 1: Information is missing in the caption. The figure is difficult to understand.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I find the approach described in this paper novel and interesting. However, I feel that the method needs to be evaluated more thorough and more work is needed (including user studies) to verify that this is a feasible approach for clinical routine.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

The authors proposed a novel idea and method for real-time data quality prediction based on the insight that the ultrasound data acquisition can be exploited to provide better data for DNN- segmentation models that result in better segmentation and lower manual effort.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The clinical relevance and applicability are high. It is desirable to acquire better-quality data upstream, and especially with a real-time feedback module driven by deep learning.

The experiment is well designed and validation was sufficient and convincing.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

I don’t see a main weakness
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors have indicated to make the dataset publicly available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

As a future suggestion it may be good to adopt / compare it with the existing quality control driven CNN methods, for image segmentation and simultaneous quality prediction.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The approach will benefit the clincial practice to acquire better quality data;

The methods were well presented and validated.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The authors propose a model to predict correction effort in segmentation of echo images. All reviewers agree this is highly novel and has potential for real-life applicability. The paper is well written and clear.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Author Feedback

We thank all reviewers and meta-reviewer for their time and recognition of this paper’s contributions. Given all reviewers agree that the work is novel and promising, we focus on addressing the concerns and questions. Implementation details: Reviewer #2 mentioned some concerns about the implementation details such as the coding framework, computing infrastructure, training process, and hyperparameters. We use the PyTorch framework, standard model architecture and default hyper-parameters in public implementation to train networks except for necessary modifications such as the number of channels of the input. Due to the space limitation, some detailed experiment setting was given in the supplementary. In Fig.2 in the supplementary, we described the dataset, how the segmentation model and effort prediction model are trained and validated in detail. Scan data is divided into three parts and no validation data or test data is leaked into any other stage. We did not test on more dataset because we could not find this kind of dataset available. In Fig. 3 in the supplementary, we show the detailed workflow of the simulated experiment. The latency of prediction model is measured on Nvidia 1080 Ti GPU card (given in Sec. 2.3). The segmentation model we used is a state-of-the-art video segmentation model: Recurrent Network for Video Object Segmentation (cited in Sec. 3). The DSC is 0.82 which is satisfactory given the ambiguous myocardial boundary in the ultrasound image (mentioned in the supplementary). We agree with reviewer #2 that it is better to put more and all necessary implementation details in the main paper. We have carefully revised the paper following the comments. Specifically, we moved Fig. 2 in the supplementary to the main paper and added all necessary implementation details in the main paper. We also added the link to our dataset that includes related information such as the scanner and sonographers. Other questions and feedback: We choose ResNet-18 simply because it is one of the most popular and simple models with “standard” implementation available in common libraries like PyTorch and Tensorflow. We can probably increase the performance by using more sophisticated models. However, we want to highlight that the proposed idea and framework do not depend on a particularly powerful DNN model and thus did not focus on the model architecture part. We appreciate the feedback and inspiration from reviewers such as a sonographer-in-the-loop experiment, downstream tasks (perfusion analysis and report generation, etc.) as well as simultaneous segmentation and quality prediction. We plan to include them in future work.

back to top

Towards Efficient Human-Machine Collaboration: Real-Time Correction Effort Prediction for Ultrasound Data Acquisition