Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Samuel Budd, Matthew Sinclair, Thomas Day, Athanasios Vlontzos, Jeremy Tan, Tianrui Liu, Jacqueline Matthew, Emily Skelton, John Simpson, Reza Razavi, Ben Glocker, Daniel Rueckert, Emma C. Robinson, Bernhard Kainz

Abstract

Fetal ultrasound screening during pregnancy plays a vital role in the early detection of fetal malformations which have potential long-term health impacts. The level of skill required to diagnose such malformations from live ultrasound during examination is high and resources for screening are often limited. We present an interpretable, atlas-learning segmentation method for automatic diagnosis of Hypo-plastic Left Heart Syndrome (HLHS) from a single `4 Chamber Heart’ view image. We propose to extend the recently introduced Image-and-Spatial Transformer Networks (Atlas-ISTN) into a framework that enables sensitising atlas generation to disease. In this framework we can jointly learn image segmentation, registration, atlas construction and disease prediction while providing a maximum level of clinical interpretability compared to direct image classification methods. As a result our segmentation allows diagnoses competitive with expert-derived manual diagnosis and yields an AUC-ROC of 0.978 (1043 cases for training, 260 for validation and 325 for testing).

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87234-2_20

SharedIt: https://rdcu.be/cyl8f

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

This paper proposes an algorithm that uses disease-sensitive atlas generation based on the Atlas-ISTN method ([21]) to segment 4-chamber ultrasound images of fetuses, calculate ratios based on segmentations, and classify images for the diagnosis of hypoplastic left heart syndrome (HLHS). Apart from the loss function that was adapted to generate disease-sensitive atlases within the Atlas-ISTN framework, the rest of the pipeline is a combination of simple methods (logistic regression and Gaussian process based on Laplace approximation). The proposed algorithm was trained and tested on private data of 1628 4-chamber ultrasound images that included 68 HLHS fetuses. Both train and test data were highly imbalanced. Segmentation results were evaluated based on F1 score (dice) using expert segmentations; and HLHS diagnosis was compared to expert detection performance based on F1 score and AUC. The method was compared to an implementation of stochastic segmentation network (SSN) and UNet.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The approach is built upon the recent and strong Atlas-ISTN framework. The application is important, and the study was designed well for the application. The results are encouraging showing performance that reached those of experts. The paper was clear with well-organized sections.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Technical novelty was relatively limited as the technique was built upon a modification of Atlas-ISTN with additional standard classification components; but this remained a minor weakness as the study was already well designed for the application.

The results were not discussed and interpreted appropriately. The message Figure 5 and Table 1 intended to communicate was not included explicitly and was left for the reader to decide. It seems these sections were prepared in a rush or the limited space did not allow a good explanation of the results.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

If the programs are not released, it may be hard to reproduce the methods.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

It seems limited space dramatically reduced the clarify of the discussion and interpretation of the results based on Figure 5 and Table 1. If this was the case, then Figure 1 can be very safely removed, opening some space to discuss the observations in Sections 3 and 4. Without clarification of findings in the rebuttal it will be hard to judge the merit of the work.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a relatively well-designed study to address an important problem; but the paper should be more carefully evaluated after authors clarify their interpretation and discussion of the results and findings. Currently it is not well described.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

6
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

The authors applied an existing method, called Atlas-ISTN, to fetal ultrasound images for cardiac segmentation and HLHS diagnosis. The method, which was evaluated on a relatively small dataset, achieved competitive performance and yielded an AUC-ROC of 0.978.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well-written, very organized, and has good visualization. The proposed application is novel and important in clinical practice.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The authors claimed that the proposed method is novel, which is true. However, this method has been proposed by another group: https://deepai.org/publication/atlas-istn-joint-segmentation-registration-and-atlas-construction-with-image-and-spatial-transformer-networks

So, the method should not be listed as a main contribution. The main contribution of the paper is the application of an existing novel method (Atlas- ISTNs) to the problem of fetal cardiac segmentation from ultrasound images. Please re-write the contribution section to reflect this fact. Right now, the contribution gives the impression that the paper proposes a novel algorithm.

“In this paper we introduce a novel method for the diagnosis of HLHS from US images using pathology-robust segmentation.” should be “In this paper, we apply Atlas- ISTNs to the problem of fetal cardiac segmentation from ultrasound images for the first time.”

The authors also need to better discuss the results. It is a well-known fact that better segmentation leads to better quantification and hence diagnosis. This is not a new finding.

Finally, the authors discussed (in the introduction) the challenges of detecting HLHS due to the high variations of the images, diverse acquisition scenarios, and various imaging artefacts. However, the authors did not discuss how they handled these challenges, or show how the proposed method performs under these conditions.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

It is hard to reproduce the paper based on the current description. I, however, think the authors did not provide technical details due to the page limits.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The authors claimed that the proposed method is novel, which is true. However, this method has been proposed by another group: https://deepai.org/publication/atlas-istn-joint-segmentation-registration-and-atlas-construction-with-image-and-spatial-transformer-networks

So, the method should not be listed as a main contribution. The main contribution of the paper is the application of an existing novel method (Atlas- ISTNs) to the problem of fetal cardiac segmentation from ultrasound images. Please re-write the contribution section to reflect this fact. Right now, the contribution gives the impression that the paper proposes a novel algorithm.

“In this paper we introduce a novel method for the diagnosis of HLHS from US images using pathology-robust segmentation.” should be “In this paper, we apply Atlas- ISTNs to the problem of fetal cardiac segmentation from ultrasound images for the first time.”

The authors also need to better discuss the results. It is a well-known fact that better segmentation leads to better quantification and hence diagnosis. This is not a new finding.

Finally, the authors discussed (in the introduction) the challenges of detecting HLHS due to the high variations of the images, diverse acquisition scenarios, and various imaging artefacts. However, the authors did not discuss how they handled these challenges, or show how the proposed method performs under these conditions.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

It is an application paper. It focuses on using an existing method (Atlas-ISTN) for the segmentation of anatomical areas in fetal ultrasound images and HLHS classification.
What is the ranking of this paper in your review stack?

5
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

A method for the diagnosis of Hypo-plastic Left Heart Syndrome HLHS using an approach to jointly segment, register, build an atlas and predict disease.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper is well written and organized. The approaches used are explained clearly. Identifying HLHS is challenging for experts and an approach to help diagnose this condition is appreciated.
There are few approaches that address the detection of anomalous hearths in US images and aim to automate the detection of such conditions.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
The contribution to the paper is limited and consists of the inclusion of a prediction branch (3 fully connected layers) to classify Control vs. HLHS. As stated by the authors, the performance is equivalent to simple image classification approaches.

A recent paper “Detection of Cardiac Structural Abnormalities in Fetal Ultrasound Videos Using Deep Learning” - Komatsu et. al proposes an approach to identify abnormal hearth in US videos and requires fewer manual/user interactions than the proposed approach. The proposed approach assumes rigid alignment of the images. The images have been manually aligned for the training and this could have a considerable impact on the predictions by the proposed approach.
1. The 4CH image is acquired by a sonographer and requires a significant amount of expertise. How are the images aligned? Is this an automated step? There are no details about the alignment and this is relevant for the results. The segmentation and prediction will have a significant impact depending on the results of the alignment.
Addtional Comments:
1. “We report the confusion matrices for each method using the ground truth segmentations shown in Figure 5” Using the ground truth segmentations only? This sentence is misleading. The features (area ratios of each anatomical class) should be computed using the segmentations produced by the approach vs. segmentations produced by experts. Please clarify this.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The data is not publicly available and reproducing the results will be challenging.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Detecting rare conditions during US screening is a challenging task that requires years of training and expertise. Although, the approach presented here is not fully automated, it is a step in the right direction. I appreciate work that helps interpret the results produced by deep learning algorithms as it is often the case they are treated as black boxes. I encourage the authors to continue this work and possibly follow up with a fully automated framework to detect HLHS.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
1. The results presented here perform similarly to a simple image classification approach.
2. However, the choice of architecture, i.e., providing a segmentation for a a given subject could help interpret the results in a clinical setting.
3. The choice of architecture has been presented before and the contribution is limited.
4. Overall, the paper is well written and the results are interesting nevertheless.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The paper is well written and organised, good experimental design, extensive dataset (though highly-non-balanced), though results need additional information and discussion.

All reviewers pointed out that the methodological contribution of this work was minor, and the paper rather uses pre-existing methods to carry the study. R2 requests to mild the claim of novel contribution. I would recommend to clearly state/emphasize more your modifications as regards, this already published network, that if I understood properly is only the addition of one loss (L_HLHS)?

A crucial point for this rebuttal is given by R1 & R2 (and myself) that require further explanations as regards results, without such clarification of the findings is difficult to evaluate the value of this framework.

R2 Discuss/clarify how the authors deal with different image acquisition difference, artifacts, and if the method perform robustly to those.

R3 Cites additional work where fewer manual interaction was needed. Could the authors confirm if manual interaction is needed for the registration and in case yes, which impact this might have in the training? Please clarify the overall alignment process.

This meta-review would like also to know how the high-imbalanced data was attenuated. It is unclear to me how NC and HLHS were distributed for the segmentation task ? Authors say 1043 / 260 / 325 with equivalent class imbalance. Was any data augmentation applied? Maybe more for the HLHS cases ? The authors should give more details of these DL methods and training.

From the confusion tables it looks like only 16 HLHS were evaluated ? Maybe I am reading something wrong. Overall, the F1 score for HLHS seems quite low.

This brings back to the first and crucial point for the authors to clarify and discuss the findings.

I would like to remind authors that the purpose of this rebuttal is to provide clarification or to point out misunderstandings, and include new details that can better highlight the value of this work. I will not consider any promise of adding future experiments and results.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Author Feedback

We thank the reviewers for their constructive comments and provide clarification on key points raised.

Novelty: Atlas-ISTN [21] is yet to be published in a peer-reviewed publication and is only available on arXiv. However, since we base our research on this preprint, we have thoroughly discussed the baseline approach, give credit, and will expand on the extensions that we have added. The AC rightly summarizes this as the addition of a loss. While this may sound simple, it enables a novel method to set the state-of-the-art in HLHS detection from fetal cardiac ultrasound. Indeed, we show that with the introduction of a diagnostic classification branch to the Atlas-ISTN method, diagnostic performance of automated heart segmentations becomes on par with using expert manual segmentations. We hope that the AC would agree that this is a relevant novel contribution on a vastly different application than originally discussed in [21].

Discussion of results: F1 and AUC scores in Table 1 show that our ‘Area Ratios’ classification method achieves state-of-the-art performance for HLHS classification over previous classification methods. Classification performance of ‘area ratios’ extracted from automated segmentations is on par with those extracted from expert manual segmentations. The addition of a disease-conditioned branch to the Atlas-ISTN improves the downstream ‘area ratios’ classification task performance over both expert segmentations and previous segmentation methods.

Figure 5 shows the performance of the ‘area ratios’ classification using segmentations produced by experts and by each tested segmentation method. Subfigures (e-f) and (k-i) highlight the improved sensitivity (fewer false negatives) of Atlas-ISTNs with a disease conditioning branch over expert segmentations (a,g) and other segmentation methods (b-d, h-j). Our application is for fetal screening and as such sensitivity is the desired metric to improve, and due to the low prevalence of HLHS, F1 scores for HLHS across all methods may seem low.

Table 1 shows our diagnostic branch (H) is competitive with previous image classification approaches, further to this our method uses only a single 4CH image as opposed to previous methods that use multiple heart view US images or video sequences. Our method provides greater interpretability by producing a segmentation (from which ‘area ratios’ classification is performed) and a disease specific atlas for free.

This more detailed discussion will be included in the camera-ready version by removing Figure 1 as suggested by R1.

Robustness: We rely on the rich variability present in our database that originated from 1628 patients and 3 different scanners in two centres. Indeed, a more detailed analysis of domain sensitivity is required before clinical translation can begin. During our experiments, no correlation between performance and scanner/centre/artefacts was observed. We will add this discussion to the paper.

Image alignment: We align images using ‘Procrustes’ alignment of ‘Apex-Base’ and ‘Spine-Sternum’ lines. We have these from our ground truth annotation exercise, thus we didn’t explore this step in detail. Automated methods like [bit.ly/3v2OA1J, bit.ly/3v2OA1J, bit.ly/3v2WbgH] could be used to establish this alignment and manual input is optional, depending on the operational constraints in a clinical setting. During fetal examination in our referral clinic, we could use ‘Apex-Base’ and ‘Spine-Sternum’ lines directly since these can be annotated as part of the standard of care (the angle between them is a disease indicator).

Data imbalance: The train, validation and test sets contain 42/1043,10/260 and 16/325 HLHS images respectively. No data augmentation was performed. We will perform cross-validation to mitigate data imbalance and will integrate the above details into a hopefully camera-ready version of the paper.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The authors present a novel method based on disease-sensitive atlas generation (previous work in Atlas-ISTN method [21]) to segment 4-chamber ultrasound images of fetuses, calculate ratios based on segmentations, and classify images for the diagnosis of hypoplastic left heart syndrome (HLHS). The paper is well written and organised, good experimental design, extensive dataset (that is highly-non-balanced as the pathology is less represented). The methodological contribution of this work is minor, as corresponds to the addition of one loss (L_HLHS) to previous scheme [21]. However, the major contribution relates rather the novel application domain. In the rebuttal the authors emphasize this aspect and better analyzed the obtained results. Some limitations remain, such as overall the “manual” nature of the alignment and the un-balance. I believe this work is solid, original and of interest for MICCAI attendees.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

As pointed out by the reviewers, the key strength of this paper is the impressive performance on a challenging clinical application with a sizable dataset. The rebuttal clarifies some issues raised in the reviews regarding novelty, results and requirement of manual interaction.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

9

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper tackles cardiac segmentation and HLHS diagnosis using fetal ultrasound images, but the novelty is limited as mentioned by all reviewers.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

19

back to top

Detecting Hypo-plastic Left Heart Syndrome in Fetal Ultrasound via Disease-specific Atlas Maps