Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Patrick Carnahan, John Moore, Daniel Bainbridge, Mehdi Eskandari, Elvis C. S. Chen, Terry M. Peters

Abstract

Recently, developments have been made towards modelling patient-specific deformable mitral valves from transesophageal echocardiography (TEE). Thus far, a major limitation in the workflow has been the manual process of segmentation and model profile definition. Completing a manual segmentation from 3D TEE can take upwards of two hours, and existing automated segmentation approaches have limitations in both computation time and accuracy. Streamlining the process of segmenting the valve and generating a surface mold is important for the scalability and accuracy of patient-specific mitral valve modelling. We present DeepMitral, a fully automatic, deep learning based mitral valve segmentation approach that can quickly and accurately extract the geometry of the mitral valve directly from TEE volumes. We developed and tested our model on a data set comprising 48 diagnosticTEE volumes with corresponding segmentations from mitral valve intervention patients. Our proposed pipeline is based on the Residual UNet architecture with five layers. Evaluation of our proposed pipeline was assessed using manual segmentations performed by two clinicians as a gold-standard. The comparisons are made using the mean absolute surface distance (MASD) between the boundaries of the complete segmentations,as well as the 95% Hausdorff distances. DeepMitral achieves a MASD of 0.59±0.23mm and average 95% Hausdorff distance of 1.99±1.14mm. Additionally, we report a Dice score of 0.81. The resulting segmentations from our approach successfully replicate gold-standard segmentations with improved performance over existing state-of-the-art methods. DeepMitral improves the workflow of the mitral valve modelling process by reducing the time required for completing an accurate mitral valve segmentation, and providing more consistent results by removing user variability from the segmentation process.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_44

SharedIt: https://rdcu.be/cyl6k

Link to the code repository

https://github.com/pcarnah/DeepMitral

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

The authors of this paper propose DeepMV, which is a mitral valve segmentation approach based on a 3D Residual UNet architecture. The authors are motivated by the surgical planning workflow for mitral valve repair, which requires specialized training and patient-specific valve models enabling patient-specific valve repair. DeepMV was trained and test on 36 and 8 transesophageal echo (TEE) volumes leading to a mean surface distance error of 0.59 +- 0.23 mm, a 95% Hausdorff distance error of 1.99 +- 1.14 mm, and a Dice score of 81%. DeepMV has applications towards improving the surgical training and planning of mitral valve repair.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
1. The motivation for the approach is clear, and the explanation of prior literature is satisfactory.
2. The approach is the first mitral valve segmentation method proposed for 3D TEE volumes.
3. The baseline Residual UNet in the proposed approach was compared with other deep learning models (e.g. VNet) , and subsequent training/testing was done with the best performing baseline model.
4. The authors have used the open-source MONAI framework for training/testing.
5. Evaluation is satisfactory with comparisons against prior deep learning approaches.
6. A good discussion of the failure cases of the approach has been provided along with future work directions.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
1. The data quantity is low, but that is understandable since the approach was limited to diseased mitral valves in patients with mitral valve regurgitation.
2. The paper mentions that current methods for mitral valve segmentation have a runtime in the range of 15 minutes to 3 hours, but the runtime for DeepMV was not provided.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper should be reproducible given that they have based their method on the open-source MONAI framework.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
Major.
1. The network seems to be confusing the Chordae Tendinae and the leaflet, and that throws off the segmentation result. The authors propose to use an additional class for the Chordae, but perhaps penalizing the network with a weight map would work better in this circumstance.
2. Additionally, since the goal of the approach is to segment the leaflets, it would be useful to provide uncertainty estimates for the segmentation (T. Nair, MICCAI, 2018 - https://arxiv.org/abs/1808.01200)
3. As a significant bottleneck in the MV repair workflow is the time taken to segment the MV, it would be useful to contrast the runtime of the proposed approach against prior work. This will add some justification for the use of the proposed method.
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

A mitral valve segmentation approach called DeepMV was proposed to enable the surgical planning workflow for mitral valve repair. DeepMV was trained and tested on limited TEE volumes, but the validation was sufficient and the results are close to the axial resolution of the ultrasound volumes used (0.5mm). The failure cases were accounted for and justified in the discussion section.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper presents an automatic method for segmenting mitral valves from 3D ultrasound images. The method combines an image preprocessing stage with a Residual U-Net model to achieve a novel capability: fully automatic segmentation vs. interactive, based on CNNs, runs in seconds vs minutes or hours, with an accuracy of 0.6 mm that establishes the state of the art. Furthermore the dataset consists of 48 volumes of diseased valves rather than healthy valves, and labeling is performed interactively by trained users for training and by clinical experts for testing.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
The main strengths of the paper include:
1. Establishes a novel capability in automatically segmenting mitral valves from 3D TEE images, as well as a new state of the art in terms of accuracy and speed
2. Works with diseased valves which is different from typical projects with healthy volunteers
3. Authors promise to release the code that achieves these results as open source
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
Main weaknesses of the paper include:
1. The name “DeepMV” has already been claimed by “multi-view deep learning” work.
2. Data acquisition is entirely from a single probe and institution, so there may be specific systemic characteristics that the model may overfit to, such as the expected resolution of the valves.
3. Does not directly address the challenges and variabilities that plague ultrasound imaging, such as inconsistent image quality, Doppler effects, noise, artifacts, sonographer skill, etc.
4. Takes for granted that images will contain valid data, so unexpected results with invalid images or acquisitions is not considered. Also for various reasons the probe may not pick up the valve at all, so such situations should be handled as a routine matter rather than assuming the entirety of the valve will always be visible in an image.
Please rate the clarity and organization of this paper

Excellent
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Authors diligently and thoroughly describe the test setup, and promise to release code as open source so that others may benefit from the hyperparameters authors diligently found.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Overall, the present work is an interesting and potentially valuable contribution to the MICCAI community, because it creates the novel capability of automatic MV segmentation from 3D TEE very quickly and accurately. As authors mention, even when results are imperfect they are still useful to streamline manual workflows, where most of the bulk segmentation is correct and trained users need only perform minor edits. These edits can even be used for further training. An open source release of the work will increase value even further.

Authors provide a clear presentation of the results, which is much appreciated. This effort helps to overcome the inherent difficulty in assessing performance when accuracy is not a single value but a measurement over a continuous space. Authors and collaborators have a firm understanding of the clinical background, so it would be instructive to the broader MICCAI community if more nuance were added to the results. For example, although scalar values such as MASD are reported, can authors comment on which parts of the valve are potentially more important to segment accurately than others? Perhaps there are situations in which a higher MASD is acceptable so long as critical areas are more accurate?

Authors comment that valve thickness can be accounted for in the present framework. In this case typical valve thicknesses should be quoted so that readers can relate them to the image resolution.

The results are impressive and promising to motivate continued work. In the future, it would be interesting to see how well the model transfers to different imaging probes and imaging institutions. I’m curious if the valve appears in the bottom third of the volume where the acquisition resolution is lower (due to the acoustic physics of the probe), or the top third of the volume where the resolution is higher, would the results still hold? Although the Cartesian resolution is quoted as 0.5 mm, this is post-scan conversion (the acquisition geometry is more radial/polar) so the true resolution towards the bottom of the volume will be lower so I am curious if MASD would continue to be 0.6 mm.

Because authors chose to analyze a snapshot of volumes at a particular phase of the cardiac cycle, and presumably each acquisition contains multiple cycles, the experiment becomes vulnerable to cherry picking a favorable snapshot. Beside cardiac cycle phase, were there any other criteria for selecting volumes for training/testing? This detail ought to be included in the results/discussion.

Another issue the MICCAI reader would find interesting is whether there are any systemic or identifiable sources of error? Was the error more common in a certain part of the valve, or were there patterns of position/orientation offsets?
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I am entering a recommendation of borderline accept based on the novel, valuable capabilities created, the state-of-the-art results achieved, and the potential for sharing of the work to benefit the global community. This positive assessment is tempered by the move to overload the name “DeepMV” - this can cause confusion as it has been previously claimed within the computer vision commmunity, and out of respect for the original work.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

6
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

This paper presents automatic volumetric segmentation of the mitral valve in 3D transesophageal echo. Several deep learning architectures are tested against an unpublished dataset, and pipeline code including preprocessing and a Residual UNet implementation is released.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The Dice scores and MASD reported are impressive given the difficult structure and limited data.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Validation comparison against the cited alternatives is lacking.

-When “improvement in accuracy over the existing state of the art approaches” is claimed, there is no fair comparison against the same data. This makes the claim unverifiable. The further claim that an atlas method would not perform as well on diseased valves when trained on healthy is not relevant given that you both train and test on diseased data.

-Did you study inter-rater variability on your own dataset? You mention several times that this scheme is more consistent, yet no numbers are cited for this data.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The work should be somewhat reproducible with the code release and basis in open-source monai. The dataset, the likes of which no one else has, would be at least as useful.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Small issues:

-48 diagnostic TEE volumes with corresponding segmentations from mitral valve intervention patients – maybe 48 diagnostic TEE volumes from mitral valve intervention patients with corresponding segmentations?

-accurate mitral valve segmentation, and providing more consistent – either “while providing” or “and provides…”

-We collected a total of 48 volumes – are there 48 distinct patients? It’s unclear.

-The final random sampling step … No data-augmentation was performed – These two sentences seem to contradict each other. If you’re randomly sampling at every epoch, isn’t that data augmentation?

-batch sizes of 32, composed of 8 different volumes, with 4 random samples being taken from each volume – how is that not data augmentation?

-the resulting segmentations will be more consistent than semi-automatic or manual approaches – you could test this claim with some test-time augmentation, using different volume crops.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

Though the pipeline software developed within an open-source framework could be valuable for researchers with the right data, the claim of “improved performance over existing state-of-the-art methods” is not tested.
What is the ranking of this paper in your review stack?

2
Number of papers in your stack

5
Reviewer confidence

Somewhat confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

This paper presents an automatic method for segmenting mitral valves from 3D ultrasound images. The method, called DeepMV, exhibited excellent segmentation results and showed promise to be used in surgical training and planning of mitral valve repair. All reviewers find the work well-motivated and of high quality. The major weakness of the work is the limited dataset of a single-center nature (also without taking into consideration the noise patterns of echo images), which may limit its potential in real clinical deployment. The authors are invited to clarify the issues as raised by the reviewers, provide run time statistics. Also please modify the name of the proposed network, as “DeepMV” has already been claimed by previous work with methodological implications (“multi-view deep learning”).
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

5

Author Feedback

We thank the reviewers for their informative reviews. Thank you for bringing to our attention the overlap in naming, we will accordingly modify the name of our method. In our work we discuss comparisons of performance in terms of runtime between our method, and prior literature, however it has been noted that we lack detailed results in terms of the run time statistics for our method. While included in the discussion section that deep learning methods perform on the order of seconds, compared to minutes or hours for prior work, we did not include the detailed runtime statistics for our architecture. We will add additional detail summarizing the runtime statistics of our method both on GPU (roughly 2 seconds) and on CPU (roughly 10 seconds). We realize that we are lacking a direct comparison between our method and prior work on the same dataset. While we would have preferred to perform a direct comparison of our results against the methods described in prior work, given there are no public available implementations of all these methods, we were unable to perform such a comparison. In the absence of a public baseline dataset, or all methods being publicly available, we are limited in the depth of comparison that can be made between methods. If we were to have implemented these methods ourselves, we would still have been unable to confirm that our implementations were faithful to the original and achieve the same performance. Given these circumstances, we must compare the performance of our methods based on the published results. We acknowledge that a major limitation in our work is the composition of the dataset used, as it is from a single center, using a single ultrasound probe. Additionally, factors regarding ultrasound imaging challenges and variabilities were not directly addressed. We realize this may limit the potential in clinical deployment of our model trained with our current dataset. Our motivation behind this work was to establish a baseline of performance of deep learning networks for the application of mitral valves. As noted by R1 we provide insight into the performance of various network architectures for the application of mitral valve segmentation, which can help guide the research community in their approaches in the future. Although this dataset is single centre, we have demonstrated very strong segmentation results, which serve to establish a baseline in performance for future work. Additional study of the generalizability of this method may be beneficial to further evaluate the performance, as well as identify sources of error due to imaging variability. However, we feel the contributions of our work stand as a novel demonstration of the applicability of deep learning techniques to mitral valve segmentation, as well as establishing baseline performance levels of various potential architectures for this application. In our data curation, we do select images where we ensure that there is valid data, and the entirety of the mitral valve is included. The reasoning behind this decision is the motivation behind the segmentation problem being patient-specific valve modelling. In this application, only imaging that contains the entire valve can be used, as it is a requirement to perform accurate modelling. As such, our development of this segmentation work was guided by these requirements. For our methods to be used in practice for valve modelling, this same selection process would be in use, and would not be a major limitation in our work. We discuss as a direction for future work expanding the size of our dataset and performing deeper analysis into the segmentation performance on various valve pathologies. Although we are unable to release our current data, as part of this future work we hope to create a publicly available dataset to facilitate further work on the mitral valve segmentation problem and improve researchers ability to benchmark the performance of their methods.

Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper presents an DL method for segmenting mitral valves from 3D ultrasound images and showed promising results for it to be used in surgical training and planning of mitral valve repair. Although the issues of datasets remains, the authors have properly addressed the concern in the rebuttal. Please modify the name of the network as suggested by the reviewers.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

4

Meta-review #2

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

This paper presents an automatic method for segmenting mitral valves from 3D ultrasound images, and showed excellent segmentation results. I think the authors have answered the main concerns of the reviewers well.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Accept
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

2

Meta-review #3

Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

The rebuttal did not address the single-centre issue directly, but i agree from a technical point of view, this work is sound and solid to serve as the first step towards a useful application. However, I disagree that the data set size is one of the factors “do not substantially detract from the value of the work”. In particular, with only 8 volumes as test set, not-reporting variance of the results in Table 1 is not acceptable, also leading to unconvincing improvement that is important to support the main claim of the paper. For example, such a large improvement from 050 (res-u-net) to the proposed 0.74 is not consistent with similar segmentation tasks in the literature and could be caused by the large variance due to small test data set.
After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

Reject
What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

24

back to top

DeepMitral: Fully Automatic 3D Echocardiography Segmentation for Patient Specific Mitral Valve Modelling