Back to top List of papers List of papers - by topics Author List
Paper Info | Reviews | Meta-review | Author Feedback | Post-Rebuttal Meta-reviews |
Authors
Sophia Bano, Brian Dromey, Francisco Vasconcelos, Raffaele Napolitano, Anna L. David, Donald M. Peebles, Danail Stoyanov
Abstract
During pregnancy, ultrasound examination in the second trimester can assess fetal size according to standardized charts. To achieve a reproducible and accurate measurement, a sonographer needs to identify three standard 2D planes of the fetal anatomy (head, abdomen, femur) and manually mark the key anatomical landmarks on the image for accurate biometry and fetal weight estimation. This can be a time-consuming operator-dependent task, especially for a trainee sonographer. Computer-assisted techniques can help in automating the fetal biometry computation process. In this paper, we present a unified automated framework for estimating all measurements needed for the fetal weight assessment. The proposed framework semantically segments the key fetal anatomies using state-of-the-art segmentation models, followed by region fitting and scale recovery for the biometry estimation. We present an ablation study of segmentation algorithms to show their robustness through 4-fold cross-validation on a dataset of 349 ultrasound standard plane images from 42 pregnancies. Moreover, we show that the network with the best segmentation performance tends to be more accurate for biometry estimation. Furthermore, we demonstrate that the error between clinically measured and predicted fetal biometry is lower than the permissible error during routine clinical measurements.
Link to paper
DOI: https://doi.org/10.1007/978-3-030-87234-2_22
SharedIt: https://rdcu.be/cyl8i
Link to the code repository
N/A
Link to the dataset(s)
N/A
Reviews
Review #1
- Please describe the contribution of the paper
In this paper, the authors present a fully automatic method to measure fetal biometry from standardized ultrasound planes. Their work is incremental in combining successful previous models of fetal anatomy segmentation, but because of the completeness of the overall framework and its good validation on hundreds of US images, it makes a valuable contribution to MICCAI.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- Great integration of many previously solved aspects of fetal ultrasound analysis into a common method.
- Clarity of the method, it’s easy to follow what has been done.
- Powerful validation of many US images.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- Lack of addressing usability of given US frames.
- Please rate the clarity and organization of this paper
Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
Neither dataset nor code open, but at least almost all key parameters provided in the manuscript.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
In general, I find this work an excellent continuation of previous fetal ultrasound analysis efforts. I don’t have any major concerns with this work. There are a couple of questions, though:
1) The authors write that all images were considered optimal and of sufficient diagnostic quality. If I remember correctly from exams on my own kid, the biggest challenge was not necessarily to manually click these measurements but to identify suitable planes in the first place. Given there is quite substantial prior art on this issue, I wonder if the authors have considered it or could at least make a comment in the discussion.
2) The explanations of which measurements are needed are somewhat repetitive, e.g. the list is in both Section 1 and 2. Maybe one of them could be removed and instead, the reader could be provided with an exemplary formula of how these measurements can be used to assess fetal weight.
3) I wonder if the choice to use rectangle fitting on the femur is the best way to extract a length measurement. In Fig. 2, there is an example image where the rectangle diameter will over-estimate the femur length. Have you used different ways for length extraction?
4) On the multi-label segmentation: Since there are already works on standard plane classification, it would have been plausible to train distinct segmentation networks for each of the three anatomies because there is zero overlap, i.e. there are no frames with more than one anatomy visible. In particular, a dedicated network could have improved the poor segmentation of the femur. Apart from the obvious performance advantages, are there any other reasons for your architecture choice?
5) I think that there is no need to resample all images before data augmentation and cropping, this will just deteriorate the image quality a bit.
Minor remarks:
- Please perform a thorough spelling and grammar revision. A few examples: “The proposed framework semantically segment” in the abstract -> “segments”, “Recommended assessment and quality control metrics varying” in the intro -> “vary”, “Obstetricians” should be lower-case, “that the error … were minimal” in the conclusion -> “were”, etc.
- Unlike stated in Section 2, Fig. does not contain a), b), c) sub-figures but uses left/middle/right.
- Fig. 3 is cut-off at the right side.
- Please state your overall opinion of the paper
accept (8)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
Great systems paper, well-done validation of the method, clarity of the approach.
- What is the ranking of this paper in your review stack?
1
- Number of papers in your stack
4
- Reviewer confidence
Confident but not absolutely certain
Review #2
- Please describe the contribution of the paper
An AutoFB framework based on U-Net and Deeplabv3 to segment the head, abdomen, and femur in ultrasound fetal images is proposed.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The combination of a U-Net and a Deeplabv3 neuronal networks to segment and to measure the main ultrasound fetal metrics in the second gestational trimester.
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
-
- The hausdorff distance can be a good complementary metric to evaluate segmentation. The IoU is an area-based metric, and a hausdorff is a contour-based one. I would recommend, add this metric.
-
- The authors processed a total of 346 acquisitions, from 42 fetuses. Therefore, one or more images were acquired for each fetus. How do you know if there is not a high correlation between the acquisitions of the same fetus? This can affect the segmentation performance? Justify this quantitatively. Maybe a stratified CV would be better.
–It is not clear how it classifies between the three planes
-
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
-
It is necessary to give more detail about how to adjust the ellipse and networks architecture.
-
In section 3.2 it is mentioned that a GE VOLUSON equipment was used, where a “ruler markers template” was made to determine the resolution of the images (px / mm). This means that the system is not generalizable.
-
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
-
a space is missing in: three standard planes.While this task –> three standard planes. While this task and decoupled from the encoder.We briefly
-
Only define abbreviations once
-
It is not clear how it classifies between the three planes
-
The bounding box in the measurement of femur length is only correct, when the femur is completely horizontal to the acquisition plane. What happens when the acquisition of the femur is angulated?
-
There is repeated information in section 3.2 and 2 (fetal biometry) I recommend joining both sections.
-
The authors mention: In this paper, we propose to perform all the relevant measurements for fetal weight assessment within a unified automated system, which is our main contribution. However, in section 3.2 it is mentioned that a GE VOLUSON equipment was used, where a “ruler markers template” was made to determine the resolution of the images (px / mm). This means that the system is not generalizable to make the measurements, as mentioned in their contribution. Can you justify this? What you mean by “unified automated system”?
-
Were the annotations made by obstetric experts?
-
The authors processed a total of 346 acquisitions, from 42 fetuses. Therefore, one or more images were acquired for each fetus. How do you know if there is not a high correlation between the acquisitions of the same fetus? This can affect the segmentation performance? Justify this quantitatively. Maybe a stratified CV would be better.
-
The hausdorff distance can be a good complementary metric to evaluate segmentation. The IoU is an area-based metric, and a hausdorff is a contour-based one. I would recommend, add this metric.
-
A discussion is necessary comparing the results with other works reported in the state of the art.
-
It is necessary to give more detail about how to adjust the ellipse and networks architecture
-
- Please state your overall opinion of the paper
borderline reject (5)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The segmentation of the femur, head and abdomen in fetal ultrasound images is well studied. Their results are competitive but there is no clear methodological contribution.
- What is the ranking of this paper in your review stack?
2
- Number of papers in your stack
3
- Reviewer confidence
Very confident
Review #3
- Please describe the contribution of the paper
The paper presents a novel method to automatically estimate fetal biometry.
- Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The paper is well-written and easy to follow. The results are also well presented. The paper has value to automate fetal biometry estimation
- Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The novelty of the method in segmentation is very limited. In fact, they used a segmentation network and then fetal biometry parameters can be estimated easily.
- Please rate the clarity and organization of this paper
Very Good
- Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance
The code and data are not publically available but the details are well explained that the method can be implemented.
- Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
1- Why authors use a simple block-matching when other parts are CNN? 2- The authors use one network for segmentation of the three planes. Why not first classify the plane and then use specialized network for segmentation of that plane? 3- The authors mentioned that they used weighted cross entropy since the data is not balanced. Why the authors did not use Dice loss (it is also robust to unbalanced data)
- Please state your overall opinion of the paper
borderline reject (5)
- Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
The paper has value in fetal biometry estimation but I do not think it has some ground breaking contribution.
- What is the ranking of this paper in your review stack?
2
- Number of papers in your stack
3
- Reviewer confidence
Confident but not absolutely certain
Primary Meta-Review
- Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.
In this paper, the authors present a framework to measure fetal biometry from standardized ultrasound planes.
All reviewers have acknowledged that : (i) the contribution of the paper is not an innovative method, but rather a system or a framework with an extensive validation, and that (ii) the paper is very clear and well written.
Because of the limited technical novelty, this paper had gotten mixed scores, so I would like to invite the authors for a rebuttal.
In particular, below are some points I suggest to be addressed in the rebuttal:
- How does this work make a significant contribution to the field?
- Can you please justify your architecture choice? and clarify the way the 3 planes are handled. This is an important point raised by all reviewers.
- The authors processed a total of 346 acquisitions, from 42 fetuses. Are the images originating from the same fetuses in the same fold?
In addition, if space is left, the authors can also address additional points raised by the reviewers.
- What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).
3
Author Feedback
We thank the reviewers for their constructive feedback. All reviewers acknowledged the completeness, clarity and detailed experimentation validating the end-to-end framework. Below we reply to the specific comments:
MR Contribution to the field: Our main contribution is a unified fetal assessment framework that performs fetal anatomy segmentation including 3-plane classification through multiclass segmentation and biometry estimation (pg 2, ln 27). To the best of our knowledge, such a holistic framework is lacking in the literature and even if individual components have been experimented with, this paper describes and validates their integrated use on a clinical task, estimating fetal biometry & weight. Existing methods only focus on segmenting/estimating biometry of specific anatomy or plane classification (pg 2, ln 10). We demonstrated the robustness by experimenting using real clinical data and validated both inferred segmentation and estimated biometry. Moreover, AutoFB is of high clinical relevance as it will enable automating biometry for monitoring fetal growth, a task currently affected by high inter-operator variability [5] due to manual selection of key landmarks in US plane.
Architecture choice & 3 planes classification: R3 suggests performing classification first and then segmentation instead of our proposed multi-class segmentation framework which jointly solves the 3-plane detection and anatomy segmentation. While this alternative could work, we do not see any reason to opt for a more complicated system that involves different stages of network training, tuning more parameters, and likely more computation time. We demonstrate that our solution works reliably while being simple and intuitive to understand. We will make this clearer in the CR.
Data split into folds (R2): The acquired data from 42 fetuses (346 US images) was divided into 4 folds such that each fold contained approximately the same number of images and all US images originating from a single fetus were only included in a particular fold (Table 2; pg 6, ln 3). Hence there was no subject data overlap in the 4 folds used for cross-validation. We will make this detail clearer in the CR.
R1 & R2 Bounding Box (BB) fitting for femur measurement: We fitted a horizontal BB (Fig. 2; pg 5, ln 14) to the femur predicted mask and considered its diagonal as the Femur Length (FL) estimate. A femur is not necessarily aligned to the horizontal/vertical axis, hence the use of BB diagonal as FL always holds. In a rare case where the femur is aligned with the horizontal/vertical axis, our modelling still holds as then the BB width/height and diagonal length become approximately equal.
R2 Segmentation of femur, head & abdomen is well studied: While fetal anatomy segmentation has been studied and evaluated with specific metrics (IoU, DICE), its use for calculating biometry and its accuracy assessment with the relevant clinical metrics (distance, weight errors) is far from well-studied in the literature.
Adding Hausdorff distance: This is a good suggestion. Though we evaluated using IoU which is the most commonly used metric in image segmentation, Hausdorff distance could evaluate the consistency of the contour. We will investigate this in future.
A ruler markers template for px/mm calibration makes the system not generalisable: Please note obtaining US scale is always system-dependent because it must be extracted either from a) visual interface of the US machine; b) raw data, involving access to a proprietary API. It is not clear which aspect R2 thinks could be generalised better than the current approach. We use a) since we do not have access to b). The same template matching approach is easy to deploy on systems other than GE Voluson since all medical grade US machines have a similar ruler available.
R3 Why DICE not used? During the ablation study (not reported), we experimented with different losses including DICE and found wCE to be the most effective.
Post-rebuttal Meta-Reviews
Meta-review # 1 (Primary)
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The practical value of this work has been acknowledged in the reviews. Even though there is not technical novelty, this paper has some value for the MICCAI community working on US images. The authors were able to show the impact of their paper in the rebuttal and reasonably responded to the concerns raised by the reviewers, hence I recommend acceptance for this paper.
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).
11
Meta-review #2
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
The authors present a fully automatic framework to measure fetal biometry from standardized ultrasound planes. I think that this is a solid work, with a contribution given rather in solving a practical/clinical need, thus the completeness of the overall framework and its good validation on big data set, it makes in my opinion a very interesting and valid scientific contribution to MICCAI. The meta reviewer raised some concerns though related to the 3 planes handling and possible risk of having mixed planes from the same subject in training and testing. Authors clarified both points in a very satisfactory manner in their rebuttal. I think though authors should include if possible Haussdorf distance metrics (percentile), as complementary to other overlap metrics, provide a more solid assessment. Overall I would recommend the paper for acceptance.
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).
5
Meta-review #3
- Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.
This paper, titled “AutoFB: Automating Fetal Biometry Estimation from Standard Ultrasound Planes”, was reviewed by 3 reviewers with varying experience/seniority, all agreed that this submission was well written:
R1 strength - good integration - clarity - validation weakness - usability - reproducibility R2 strength - method weakness - metric for evaluation - correlation between data - reproducibility R3 strength - writing quality - results - clinical significance weakness - novelty
The primary AC noted that the limited technical novelty was the main reason for mixed scores. Reviewers also pointed out that Reproducibility may be of an issue as this is a complex/complete system and no code will be released. This AC reminds the reviewers that, according to the reviewer guideline (https://miccai2021.org/en/REVIEWER-GUIDELINES.html), a CAI-based paper should be evaluated differently from MIC-based paper, where the requirement for novelty may differ. Based on my reading of the manuscript and the rebuttal, this AC judged this manuscript to fall under the categories of “1. Presentation of a device or technology that has potential clinical significance”, and “2. Demonstration of clinical feasibility, even on a single subject/animal/phantom”.
As this paper is well written, describing a complete system with an extensive validation, I would recommend the decision to accept this manuscript. Authors are strongly recommended to re-consider the issue of reproducibility to increase its potential impact to the CAI community.
- After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.
Accept
- What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).
4