Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Cheng Zhao, Richard Droste, Lior Drukker, Aris T. Papageorghiou, J. Alison Noble

Abstract

Automated ultrasound (US)-probe movement guidance is desirable to assist inexperienced human operators during obstetric US scanning. In this paper, we present a new visual-assisted probe movement technique using automated landmark retrieval for assistive obstetric US scanning. In a first step, a set of landmarks is constructed uniformly around a virtual 3D fetal model. Then, during obstetric scanning, a deep neural network (DNN) model locates the nearest landmark through descriptor search between the current observation and landmarks. The global position cues are visualised in real-time on a monitor to assist the human operator in probe movement. A Transformer-VLAD network is proposed to learn a global descriptor to represent each US image. This method abandons the need for deep parameter regression to enhance the generalization ability of the network. To avoid prohibitively expensive human annotation, anchor-positive-negative US image-pairs are automatically constructed through a KD-Tree search of 3D probe positions. This leads to an end-to-end network trained in a self-supervised way through contrastive learning.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87237-3_64

SharedIt: https://rdcu.be/cymbr

Link to the code repository

https://github.com/dachengxiaocheng/Visual-Assisted-Probe-Movement.git

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors developed a method to direct a user of 2D ultrasound to move it to a specific pose during obstetrical imaging of the fetus.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The algorithm appears novel and the analysis is also appropriate; however, it is not clear why a 15mm distance was used as a metric.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Since you need to determine the pose of the acquired images, why not generate a 3D ultrasound image using the motion tracker and then examine the 3D image for the features required for evaluating the fetus. 3D ultrasound imaging in obstetrics is widely used and there are very many publications on this topic.

How accurately must you know the pose information using the motion tracker? Adding a motion tracker adds cost to the system. What motion tracker did you use.

If I understand correctly, the physician will need to acquire a large number of images for each patient, from which you will generate about 400 evenly distributed images as patients may be at different points during the pregnancy, fetal abnormalities may appear very differently, and the fetus may be in a different position at each examination. How long does that take? Does this aspect require special training and can the images be acquired freely with no specific method to scan the abdomen?

For performance evaluation, why did you use 15mm as the distance target for a correctly retrieved landmark? This distance appears quite large if specific abnormalities are needed to be viewed. Is one retrieved landmark sufficient for clinical use? The rotation around a retrieved landmark may not be correct.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors will provide the code
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

See comments in the “weaknesses” section.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The main issue is the utility of the method with questions what not use 3D ultrasound images and issues around the motion tracking.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

2
Reviewer confidence

Not Confident

Review #2

Please describe the contribution of the paper

This paper proposed a triplet-learning framework based on transformer and NetVLAD for landmark retrieval in probe movement guidance. The topN recall is promising based on the proposed method.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
Strength:
1. A good application paper based on the integration of Transformer, NetVLAD and triplet learning for landmark retrieval.
2. The TopN recall is good compared to baselines.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Weakness:
1. The novelty is limited;
2. The claims of this work are not fully convincing, especially for the self-supervised part. Please refer to the detailed comments.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The reproducibility of this work seems OK. According to the implementation part, this work follows the network architecture of transformer and NetVALD. The hyperparameters are also presented in details.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
This paper proposed a triplet framework for landmark retrieval. For techical contribution, the novelty is limited since the proposed method is a direct combination of transformer, NetVLAD and triplet learning. However, there are also several issues to be concerned:
1. The use of NetVLAD seems not essential. Why the authors choose NetVLAD as the feature aggregation module? The only support is the good accuracy in Table 2. Do you think this is due to the special application in this work or it is a general observations? NetVLAD also has several extentions including TEN [1]. Since NetVLAD is not orginally proposed in this paper, I suggest that more aggeration layers should be evaluated.
2. This work claims that it is a “Self-supervised Network Training” method. However, this is actually the way to construct the triplet (P,N,A) for model training based on the KD-Tree. For ‘self-supervised learning’, it usually requires no supervision which cannot fully match the pipeline of this work.
[1] Zhang, et.al ‘Deep TEN’, CVPR 2016
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The novelty of this method is limited and the experiment setup is not convincing.
What is the ranking of this paper in your review stack?

5
Number of papers in your stack

4
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper
1. Developing an algorithm that assists the US-probe movement by applying landmark retrieval.
2. Proposing a network that generates a generalized descriptor for automatic landmark retrieval.
3. Contrastive learning, a form of self-supervised learning, is used to avoid human annotations.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

A new approach providing visual-supported navigation unlike conventional methods that provide measured parameters only. Human annotation is minimized by using self-supervised learning, which is an effective method in the area of medical imaging where ground truth is hard to acquire. The network itself was designed by transforming conventional networks to accomplish the targeted specific tasks.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The clinical importance of the proposed work is controversial. Untrained individuals can be assisted by the proposed system but many trained sonographers can accomplish what they want without such assistance.
2. The performance comparison with the CNN SOTA network seems insufficient. Other network types should be considered.
3. There is no evidence or explanation as to whether the VGG network pre-trained with ImageNet is effective for ultrasound images.
4. Five test cases were prepared, but only three results were shown for the 3rd trimester.
5. The reason for using the attention feature is not clear.
6. It is claimed to work in real-time, but no data on the execution time is provided. 7. 7. There is no information on how the test set is obtained.
7. There are two typos (per-trained, SOAT).
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Authors say the results will be released as an open-source project to contribute to the US research community.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The approach is impressive and the understanding level of the neural network seems very high. It would be good if more test case images are provided. Although detailed explanations of the network were provided, adding motivation or justification of choosing such a network will help readers.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The clinical importance of the proposed work is controversial. Untrained individuals can be assisted by the proposed system but many trained sonographers can accomplish what they want without such assistance. The main academic contribution of this work is applying self-supervised learning in the new application of probe navigation. However, more rigorous explanation/justification/comparison of the proposed network should be given.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

All reviewers agreed this paper presented an interesting application and the proposed model is of certain interest. However, they also requested further clarification about the following issues. First, the proposed approach needs to be better motivated against 3D Ultrasound imaging widely used in obstetrics (Reviewer #1), and the concern about its clinic importance needs to be addressed (Reviewer #3). Second, more rigorous explanation or justification of the proposed network needs to be provided to facilitate understanding (Reviewer #2 and Reviewer #3). Third, please respond to Reviewer #2’s concern about the technical contribution. Fourth, please discuss the sensitivity of the proposed approach to the accuracy of pose information and explain why a 15mm distance was used as a metric in the evaluation (Reviewer #1).
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Author Feedback

We thank the reviewers for their valuable comments and respond to the main queries below.

Why not pose as a 3DUS problem(R1) Our motivation is to support uptake of the new generation of lower-cost 2D US probes outside of traditional hospital setting, where users are not highly trained. Assistive guidance tools are potentially useful then. 3DUS is typically based in hospitals and performed by highly trained sonographers. This is why we pose the problem as a 2D US one. The proposed method could be adapted to the 3DUS case. The only required architecture modification is to replace the 2D CNN encoder with a 3D CNN one.

Clinical importance(R3) Simplifying US to be more accessible to non-expert operators is a recognized priority for wider deployment of US in clinical practice. This paper aims to address this challenge, investigating technical feasibility of a novel image-retrieval based solution to provide helpful visualization cues to guide an operator during 2D fetal US scanning. The solution is implemented using data from a realistic data-based US simulator as a proof-of-principle. Future work would address issues in adapting the solution to a clinical setting and assessing its usability and usefulness to guide operators in practice.

More rigorous network justification(R2, R3) Refer to Table2 and Sec.3.3; SOTA baseline comparisons[2][4] are reported. The patch-style encoder[2] lost some local geometry cues compared to CNN-backbone we used. The Transformer improves performance compared with its absence[1] as the attention mechanism inside extracts co-contextual cues within the feature vector. For US image retrieval, VLAD aggregation including more statistics mechanism performs better than maxpooling aggregation[4] commonly used in large-scale public benchmarks. A potential reason is it depends on dataset scale. Performance difference between maxpooling and VLAD may decrease with increased dataset scale. We find it also happens on the 3D point-cloud/lidar retrieval. Following R2, we report performance of Trans-TEN: r@1=82.5%, r@5=91.5%, r@10=94.0% as a supplement. DeepTEN has a similar mechanism to NetVLAD, hence similar performances.

Technical novelty is limited(R2) We respectively disagree. We propose a new way to formulate US-probe movement guidance via automated landmark retrieval achieved by global descriptor learning and matching. Technical novelties: (1) the global descriptor is learned via contrastive learning using self-constructed A-P-N data-pairs by a probe position KD-Tree without human annotation; and (2) the proposed network is forerunner Transformer research investigating its image retrieval use. Independently [4] published a little earlier(arXiv not accepted,10/02/2021) c.f. MICCAI deadline 03/03/21. Our work focuses on US images, while [4] focuses on natural images. We will release the code as an open-source project on paper acceptance.

Why is it self-supervised learning(R2) Unsupervised learning learns without supervision while self-supervision learns under the supervision from data itself without human annotation. The global descriptor is learned via contrastive learning using automatically constructed data pairs according to the probe-position KD-Tree without human annotation. We name it self-supervised learning as the supervision comes from the data only.

Why use a 15mm distance in evaluation; sensitivity of method to pose information(R1) The hyper-parameter 15mm is empirically set according to the number of landmarks and 3D volume i.e. density of landmarks. It can be adjusted according to the specific clinical task. We have not used a motion tracker. The images come from a real clinical data-based US simulator(ScanTrainer) so the 3D relation between probe position and image is known. This 3D-to-2D image relation is only used in training. At test, and hence for use, you only need a 2D image(no motion tracker).

Other minor points of clarification will be carefully addressed in the final paper.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper addressed an interesting application that provides guide to the probe movement during obstetric ultrasound scanning, and formulates this problem as a landmark retrieval problem solved by the proposed Transformer-VLAD model. It contains several interesting seeds in the less studied application problem, the solution, and the SOTA network model, which could be of certain research interest to the community of MICCAI. In the rebuttal, the authors answered the major questions from the reviewers, especially with a new comparison with trans-TEN as requested by Reviewer#2. If it could be accepted for publication, the authors should highlight their response to the motivation and the clinical importance, and incorporate the clarifications to other major concerns in their final paper.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

10

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper has mixed reviews, and the meta-review lists many questions that the authors needed to address in the rebuttal. Unfortunately, it seems that the authors did not answer the AC’s questions and focused on other technical questions. Given that AC’s questions remain unanswered, I believe the paper needs to go through another round of revision, which means that I recommend it to be rejected at this stage.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

18

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The paper receives conflicting comments: 2 borderline acc, 1 prob rej. (R2) The rebuttal has addressed most of R2’s concerns (self-supervised learning, limited novelty) and therefore I will downplay R2’s opinions. The rebuttal also clarified various subtle points raised by R1 and R3. Overall, I think that the paper addresses a clinical important problem with a sensible approach and satisfactory results.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

9

back to top

Visual-Assisted Probe Movement Guidance for Obstetric Ultrasound Scanning using Landmark Retrieval