Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Ricardo A. Gonzales, Jérôme Lamy, Felicia Seemann, Einar Heiberg, John A. Onofrey, Dana C. Peters

Abstract

Tracking the tricuspid valve (TV) in magnetic resonance imaging (MRI) long-axis cine images has the potential to aid in the evaluation of right ventricular dysfunction, which is common in congenital heart disease and pulmonary hypertension. However, this annotation task remains difficult and time-demanding as the TV moves rapidly and is barely distinguishable from the myocardium. This study presents TVnet, a novel dual-stage deep learning pipeline based on ResNet-50 and automated image linear transformation, able to automatically derive tricuspid annular plane systolic excursion. Stage 1 uses a trained network for a coarse detection of the TV points, which are used by stage 2 to reorient the cine into a standardized size, cropping, resolution, and heart orientation and to accurately locate the TV points with another trained network. The model was trained and evaluated on 4170 images from 140 patients with diverse cardiovascular pathologies. A baseline model without standardization achieved a Euclidean distance error of 4.0±3.1 mm and a clinical-metric agreement of ICC=0.87, whereas a standardized model improved the agreement to 2.4±1.7 mm and an ICC=0.94, on par with an evaluated inter-observer variability of 2.9±2.9 mm and an ICC=0.92, respectively. This novel dual-stage deep learning pipeline substantially improved the annotation accuracy compared to a baseline model, paving the way towards reliable right ventricular dysfunction assessment with MRI.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87231-1_55

SharedIt: https://rdcu.be/cyhWe

Link to the code repository

N/A

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    The authors presents a new method to automatically track the tricuspid valve from MRI (according to the 4-chamber view), allowing evaluation of right ventricular dysfunction.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    The main objective is to get equivalent results as from echocardiography, that is the exam of reference. Moreover, the authors present a complete automatic method that provide convincing results.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    Even if the study is very interesting, I have some remarks or questions :

    • The main drawback of this work is the absence of comparison with the exam of reference, i.e. echocardiography. And no discussion comparing the two approaches.
    • I am not convinced by the first network. Why not doing only one network with a linear transformation as a preprocessing step. Indeed I think this is this linear transformation that improve the results. In general, deep learning is not hampered by rotation.
    • The spatial resolution of the image is relatively high.
    • I am not convincied by the study of additional stage influence. Then, the proposed method seems not to be robust.
    • There is no link between the results and the diseases (as the dataset get various different diseases)
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Nothing in the text about the reproducibility, except that the code is included in the Segment software. According to the checklist, the code is not available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • Why the authors did not try to do the same process for the mitral valve ? Maybe as a future work ?
    • The authors should discuss and compare their results with US. In particular, what about the impact of the temporal resolution.
    • Clearly the second image on the left in the figure 1 is distorded and not credible.
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The study is interesting, but there is some missing discussion (comparison with the US, results according to the disease, etc.)

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    4

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    A pipeline to automatically label the tricuspid valve plane on LAX cine MRI sequences, of value to estimate clinical parameters of the RV such as the TAPSE. The authors use twice a ResNet to (1st stage) roughly define the tricuspid plane and reorient the images accordingly, and (2nd stage) more precisely identify this plane. They demonstrate the value of the additional stage with a nice set of evaluations of the tracking accuracy and comparison to experts’ measurements.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • A sound processing pipeline for the targeted application
    • The potential limitations of the methodological contribution are compensated by its applicative value and thorough assessment.
    • Writing is clear, with nice figures, and situates well the work vs. the clinical needs.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The methodological contribution is rather an engineering pipeline, which chains two existing networks with a linear reorientation of images in-between, and trajectory smoothing at the end.
    • Experiments assessing the additional stage influence may be revised, see below.
  • Please rate the clarity and organization of this paper

    Excellent

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Nor the data or the code are publicly available. However, the description of the experiments and the parameters used is clear. The authors also mention that their pipeline has been implemented in the freely available Segment software, without further details or explicit link to a demo.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Major comments:

    • I appreciated the effort to evaluate the added-value of the second step after the first one. However I wonder testing several additional 2nd steps is of relevance, as demonstrated by the Table. Was this expectable? Would the scheme “2 only” be equivalent to the scheme “1”? If so, this may be reminded, or the authors should remind that “2” actually means “linear reorientation + ResNet”.

    Other comments:

    • The acronym TVnet may be confusing, as it is already used, in particular for optical flow motion tracking.
    • The Abstract may briefly detail which type of network was used.
    • The Introduction situates well the work vs. the current limitations for the application, but I’m missing a brief review on state-of-the-art methods for landmark tracking, and in particular valve or plane tracking.
    • p.2: “two stage framework based on an automated linear transformation of the image”: the “based on… linear…” may be confusing. I would rephrase to say that the linear transformation comes in-between the two valve identification stages.
    • p.2: The authors use a nice database with a variety of RV diseases, but never discuss the estimated TAPSE and RV e’ with respect to these diseases, which could be of value.
    • Fig.1: the ES frame may be indicated on the bottom-right plot.
    • p.6: May some orders of magnitude be given for the valve/cardiac dimensions and the accuracy of state-of-the-art methods? This would help to situate the performance of the authors’ method.
    • Table 1: is accuracy computed at each instant? (I’m not sure of it, as TAPSE and RV e’ are computed at a given instant).
    • The authors partially mention “uncertainties” and “residual” from the first stage. If relevant, could they briefly discuss if these could be exploited to better guide the second stage?
    • Fig.3: I appreciated the assessment in particular in this figure. Could the authors provide 1-2 animated views (e.g. a “good” and a “less good” tracking) as Supplementary Material?
    • p.8: I would update “significantly” by “substantially”, as statistical significance was never quantified.
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The contribution is not on developing a new network or method, but consists more of a processing pipeline chaining existing methods in a sound way, which is still an innovative contribution for the proposed application, supported by detailed experiments.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    Summary The authors propose dual-stage deep learning that incorporates residual neural network (ResNet50) and linear transformation for automated time-resolved tracking of the tricuspid valve (TV) in long-axis four-chamber (4CH) cine MRI images. Stage 1 network is dedicated to the coarse detection of TV points to homogenize the field of view (cropping) and orient the 4CH image to a standardized view among patients. Stage 2 is dedicated to accurate localization of the TV insertion points/landmarks from the standardized images of stage 1. The method was evaluated in a total of 120 patients with diverse heart disease etiologies. One hundred patients were used for training and 20 patients for testing. The authors evaluate the method performance against manual observers. They also evaluated inter-observer reproducibility and clinical metrics, including TAPSE and RV e’, showing overall promising results.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. Overall, a well-written manuscript with rigorous validation.
    2. The use of a dual-stage network pipeline to overcome the practical inhomogeneity in the RV 4CH field of view size and image orientation is appealing.
    3. Evaluation of the method is performed in a complex cohort with different heart disease etiologies.
    4. The evaluation of the additional Stage Influence adds an extra level of transparency on the method convergence.
    5. Results, including analysis speed, are presented both for GPU and CPU implementations.
    6. The method addresses a clinical need for robust TV tracking, which is, while more challenging, is less explored in literature are compared to the mitral valve or the left heart in general.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. The choice of the training and test data should be better explained to avoid potential selection bias.
    2. Fixed training and test sets are used without cross-validation. Hence, unclear whether overfitting existed.
    3. Single-center study. Hence, generalization beyond this center and the two observers is not addressed and remains unclear.
    4. The Conclusion/Discussion section does not address/interpret the reasoning behind the shortcoming in the proposed method vs. the manual observer2 agreement, and the limitations are not presented.
    5. Some methodological details are missing (see detailed comments).
    6. Limited-size test set.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Satisfactory. Reproducibility and inter-observer variability are addressed.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Detailed Comments

    1. What is the linear transformation used in Network 1? Is it rotation and translation only or did it include scaling too? Please specify.
    2. Follows from 1, if scaling was used, then how was it defined? If scaling was not used, how did the different RV size among the different patient etiologies included?
    3. Was the four-chamber cine MRI data retrospectively or prospectively gated? Was the same gating method (prospective or retrospective) consistently used for all patients? a. If not, this could introduce variability in the percentage of cardiac cycle covered, reflecting on the TV tracking, jitter, etc.
    4. How were the 100 patients vs. 20 test sets defined? How did you avoid selection bias?
    5. Follows from 4, why a fixed 100/20 training/test datasets were used for validation instead of multi-fold cross-validation to avoid bias to the training/testing selection?
    6. The etiologies/pathology types in the test set and training set are not described (this is currently described only for the whole cohort). These should be described in the results to better understand the disease distributions between the test and training sets.
    7. Given that the stage 1 network is aimed at homogenizing the data: Does the use of data augmentation of an added value here? Should not this be already handled by this stage 1 network? Would excluding data augmentation affect the performance?
    8. Observer 1 annotation was used as the ground truth, but their MRI expertise is not described and should be addressed briefly, e.g., how many years of MRI experience, a clinician, or a researcher. The same applies to observer2.
    9. The accuracy of the manual inter-observer variability was almost two-fold better than of the proposed method. However, the reason behind this shortcoming of the network is not addressed in the discussion. The challenges behind this shortcoming and how to address them should be briefly discussed.
    10. It should be noted in the limitations that this is only a single-center study with limited data for testing. Hence, future studies are needed to assess the critical point of generalization.
    11. When presenting the comparison results of clinical-metrics, TAPSE, and RV e’ of automated vs. both observers, the authors indicate that the results are comparable. However, the authors do not provide a p-value of paired statistical test to support this. Particularly, RV e’ appears to be different and statistical analysis is needed to assess the degree of statistical difference.
    12. It should be noted that a smaller jitter, i.e., a smoother tracking curve, does not necessarily correspond to a more accurate curve. An inaccurate curve with systematic error can be smooth. Hence, authors should be more conservative in their interpretation of the jitter in the conclusions section. Minor comments
    13. The Conclusions section should be renamed to “Discussion and Conclusion”.
    14. The manuscript title and keywords should include the word MRI, i.e., Cine MRI
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
    1. The use of a dual-stage network pipeline to overcome the practical inhomogeneity in the RV 4CH field of view size and image orientation is appealing.
    2. Evaluation of the method is performed in a complex cohort with different heart disease etiologies.
    3. The evaluation of the additional Stage Influence adds an extra level of transparency on the method convergence.
    4. The method addresses a clinical need for robust TV tracking, which is, while more challenging, is less explored in literature are compared to the mitral valve or the left heart in general.
  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The authors proposed a pipeline to automatically label the tricuspid valve plane on LAX cine MRI sequences with two stages. This is a challenging and interesting task in CMR analysis and the work is thereby well-motivated. The contribution, as pointed out the reviewers, are more on the implementation level, but the application is sound, results are well documented, and the manuscript is clearly written. The two-stage strategy is also found very practical and appealing in practical use.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2




Author Feedback

We would like to thank the Reviewers and the Meta-Reviewer for their careful evaluation of our work, their constructive suggestions and the provisional acceptance. We have now clarified and addressed the comments.

Comments on the ‘Abstract’ and ‘Introduction’ sections:

We have included in both title and keywords the imaging modality (MRI), specified the type of network on the abstract and opted for using the word ‘substantially’ instead of ‘significantly’. As part of the literature review, we have highlighted the need for a fully-automated valve plane tracking method and briefly mentioned the available semi-automated methods. We have also clarified the use of the automated image linear transformation in our proposed pipeline.

Comments on the ‘Methods’ section:

We have clarified the ECG-gating method and the MRI experience of the observers, and specified the image size, the systolic and diastolic periods and the meaning of ‘stage 1’ (network 1) and ‘stage 2’ (linear transformation + network 2) in figure 1. Within the linear transformation task, the step (i) already mentioned the subtask of image interpolation to a standard spatial resolution (scaling) and the chosen range covers enough space for any tricuspid valve to be included. We have also clarified the augmentation value for both stages. To increase the test set, assess the generalization capability and to reduce the need of describing every set, we have expanded our whole cohort, number of images in parenthesis, from 120 patients (3,570) to 140 patients (4,170) of similar conditions, increased the test set from 20 patients (600) to 28 patients (840), and performed a 5-fold cross validation.

Comments on the ‘Experiments and Result’ section:

The accuracy in table 1 was computed at the end of each stage and in three iterations more, but for a clear understanding, the Euclidean distance error of each point (in mm) and the TAPSE error (in mm) are now computed in the same manner (instead of ICC, which is already shown in figure 3), to further increase the analysis. Supported by the increased training set (3,330 images) and test set (840 images) and the multi-fold cross-validation experiment, the reported accuracy is on par with expert human level performance. The significance of the clinical-metric correlations can be found at the end of the section. For a clear visual representation of the results, we will provide animated views as supplementary material.

Comments on the ‘Discussion and Conclusion’ section:

We have renamed the section from ‘Conclusions’ to ‘Discussion and Conclusion’. As part of future study, we have included the assessment in a multi-vendor, multi-centre population and the application to another cardiac landmark annotation.

Comments on the reproducibility:

All the source code for the tricuspid valve tracking method will be uploaded in the open-source version of the medical software Segment and this will enable researchers to see all details and provide opportunities to use and improve the work.



back to top