Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Joëlle Ackermann, Matthias Wieland, Armando Hoch, Reinhold Ganz, Jess G. Snedeker, Martin R. Oswald, Marc Pollefeys, Patrick O. Zingg, Hooman Esfandiari, Philipp Fürnstahl

Abstract

Computer-assisted orthopedic interventions require surgery planning based on patient-specific three-dimensional anatomical models. The state of the art has addressed the automation of this planning process either through mathematical optimization or supervised learning, the former requiring a handcrafted objective function and the latter sufficient training data. In this paper, we propose a completely model-free and automatic surgery planning approach for femoral osteotomies based on Deep Reinforcement Learning which is capable of generating clinical-grade solutions without needing patient data for training. One of our key contributions is that we solve the real-world task in a simulation environment tailored to orthopedic interventions based on an analytical representation of real patient data, in order to overcome convergence, noise, and dimensionality problems. An agent was trained on simulated anatomy based on Proximal Policy Optimization and inference was performed on real patient data. A qualitative evaluation with expert surgeons and a complementary quantitative analysis demonstrated that our approach was capable of generating clinical-grade planning solutions from unseen data of eleven patient cases. In eight cases, a direct comparison to clinical gold standard (GS) planning solutions was performed, showing our approach to perform equally good or better in 80 percent (surgeon 1) respectively 100 percent (surgeon 2) of the cases.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_52

SharedIt: https://rdcu.be/cyhQ7

Link to the code repository

https://caspa.visualstudio.com/CARD%20public/_git/MICCAI21_DRL_FHRO

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    An automatic approach was proposed for surgical planning of femoral head reduction osteotomy (FHRO) based on deep reinforcement learning (DRL) for young patients with Legg-Calvé-Perthes (LCP). This probably is the first report on FHRO planning using DRL. An analytical representation of real patient data in a simulation environment tailored to FHRO.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    1. While Legg-Calve-Perthes (LCP) is the most common orthopedic hip disorder in younger children, FHRO is rarely performed due to its underlying complexity.
    2. Clinically, planning a FHRO surgery is complex, even with modern computer-assisted 3D surgical planning methods.
    3. There are not many FHRO data that are available for training. Therefore, the novelty of this work is the surgical planning is based on an analytical representation of real patient data. Probably it is more appropriate in this FHRO application than the typical deep-learning approach trained with lots of clearly defined ground truth.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    1. It is unclear how many patient’s data were used to derive the analytical models. Reference #1 is blinded.
    2. Fig 2 in Result section showed a comparison between DRL result and gold standard. It appears to me that the osteotomized area in gold standard (GS) is quite larger than the one planned by DRL. Yet in Table 1 it marked “B” indicating better than GS. It is unclear how the judgement was completed by S1 and S2.
    3. It is unclear how to clinically identify (or distinguish) the necrotic bone from the “good” bone only based on the 3D model. It is unclear if the cross-sectional CT image is also used in this proposed approach.
    4. It is unclear how to transfer the plan to the patient at the time of the surgery.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Acceptable

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    1. This is a very good written article.
    2. I think the proposed DRL is appropriate for planning rare surgeries like FHRO.
    3. While no-go zone, shape of the reconstructed femur head, and keeping the residual articular step-off to a minimum can be clearly seen on 3D models, it is unclear on the the determination of the wedge that should comprise most of the necrotic bone while preserve the intact bone and cartilage only based on 3D models. This needs to be clarified.
    4. It needs to clarify the evaluation criteria how the S1 and S2 surgeons to evaluate the planned outcomes versus the gold standard. Simply saying better or worse is not enough.
  • Please state your overall opinion of the paper

    accept (8)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    Its novelty in this specific FHRO clinical application.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    4

  • Reviewer confidence

    Confident but not absolutely certain



Review #2

  • Please describe the contribution of the paper

    This work proposed a model-free and automatic surgery planning method for femoral osteotomies based on deep reinforcement learning.

    The main contribution of this study is that the authors solved the real-world task in a simulation environment which was designed base on an analytical representation of real patient data. In this simulation environment, deep reinforcement learning can be applied.

    The authors verified the proposed method with expert surgeon’s grading. The results show that the proposed method performed better or equally good as the gold standard planning solutions created by surgeons.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    (1) The paper was structured well and the clinical background and the aim were stated clearly.

    (2) This work is a good example that applies deep reinforcement learning(DRL) on computer-assisted intervention problems. Using a general artificial method (like DRL) to solve the real-world problem is the new trend (such as AlphaGo, AlphaZero). This paper presents a similar idea that will be interesting for the MICCAI community.

    (3) The authors provided a good evaluation of the proposed method. The results show that the proposed method performed better or equally good as the gold standard planning solutions created by surgeons.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    (1) The authors simplify the orthopedic surgery planning problem using analytical representation. It is unclear whether this representation is reasonable, especially the IZ1, IZ2, and NZ (because the pathological femoral head is irregular)

    (2) Although the idea is good, the authors introduced Deep reinforcement learning very briefly, it may be difficult for the readers to follow this study. (their training code is not available)

    (3) IN general, it is not easy to train the model for deep reinforcement learning. The authors didn’t give enough information to explain why their training works, which should be added. 1) How the authors deal with the “reward hacking problem”(which often happens in DRL)? 2)The rewards range from -40 to 2, which seems that the agent does not get positive rewards often. Why the agent learns well?

    (4) The authors combine four different goals, however, different goals have varied units. Directly combining different goal items is unreasonable.

    (5) The authors didn’t provide any curve (like cumulative rewards or Q-value) about the training, which is important to reflect how the agent learns the planning.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Since the code is not available and the implementation details were nor provided, it is hard for readers to follow this work.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Major: (1) provide more information about why the agent can learn the planning well: A - How the authors deals with the “reward hacking problem”, which often happens in DRL. B - The rewards range from -40 to 2, which seems that the agent does not get positive rewards often. Why the agent learns well?

    (2) the authors combine four goals directly, which is not reasonable because the goals have different units. Normalization should be performed before combination.

    (3) The authors should provide the cumulative rewards or Q-value curves of the training process, which will be interesting for the readers.

    (4) More visualization of the planning results will make the paper more convincing.

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work proposed a model-free and automatic surgery planning method for femoral osteotomies based on deep reinforcement learning.And this is a good example that applies deep reinforcement learning(DRL) on computer-assisted intervention problems, which will be interesting for the MICCAI community

    The authors verified the proposed method with expert surgeon’s grading. The results show that the proposed method performed better or equally good as the gold standard planning solutions created by surgeons.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    3

  • Reviewer confidence

    Confident but not absolutely certain



Review #3

  • Please describe the contribution of the paper

    In this study, the authors presented the first successful application of DRL to orthopedic surgery planning.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    A qualitative evaluation with expert surgeons and a complementary quantitative analysis demonstrated that our approach was capable of generating clinical-grade planning solutions from unseen data of eight patients.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    The network was evaluated on unseen CT-reconstructed 3D models of 8 LCP patients. The sample Size seems too small. How to solve the overfitting problem?

  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    difficult

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    In this paper, a completely model free and automatic surgery planning approach is proposed for femoral osteotomies based on Deep Reinforcement Learning which is capable of generating clinical-grade solutions without needing patient data for training.

    The network was evaluated on unseen CT-reconstructed 3D models of 8 LCP patients. The sample Size seems too small. How to solve the overfitting problem?

    Mathematically, a reinforcement learning problem is described as a Markov Decision Process, please compare it with other methods, such as class incremental learning.

    Proximal Policy Optimization (PPO) [25] to perform best for our problem by defining objective J as a weighted average between an adaptive KL penalty and a clipping mechanism. why it can perform best?

    How to validate your model?

  • Please state your overall opinion of the paper

    probably reject (4)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    the reproducibility of the paper.

  • What is the ranking of this paper in your review stack?

    3

  • Number of papers in your stack

    4

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    The paper presents a novel application of model-free deep reinforcement learning for surgical planning tailored to orthopaedic interventions, which is based on analytical representation of real patient data in a simulation environment, to overcome convergence, noise and dimensionality problems. Qualitative evaluation with expert surgeons and quantitative analysis were performed on unseen data of eight patients. The clinical need and complexity of the procedure is well-motivated, and the paper is well-written and well structured.

    Some of the criticism of the paper are regarding the validation sample size, better analysis and discussion of the results, missing details in the methodology, and concerns around reproducibility The following points should be highlighted in the rebuttal:

    • Justification for number of samples used in the validation experiment
    • Clarification regarding discussion of results, including evaluation criteria for expert surgeons vs GS
    • Missing details in the methodology, including training of the DRL model (its limitations, and why it works), and some clinical details (identification of necrotic bone from good bone)
  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2




Author Feedback

We thank all reviewers for their constructive and supportive feedback, particularly acknowledging the novelty and creativity of our approach. We will highlight the major points suggested by the meta-review in the following.

Reproducibility (R2,R3): We will make the source code, pre-trained models, training curves and the analytical representation of all patient cases publicly available by including a link in the final paper.

Justification for number of test set samples (R1,R3): We agree that our test set of 8 patients appears small, but we consider it as sufficient for a CAI-paper (see reviewer guide). As described in Sec 2.3, training was performed purely on simulated data without real patient data. This renders overfitting (R3) less critical and makes our method particularly suited for rare diseases such as FHRO. Note that even the largest radiological FHRO studies (e.g. [Dror Paley, Orth. Clin NA 2011]) have study sizes of max. 20 patients from multiple centers and over the course of 5 years. We will add 3 more patient cases (paper + public repository) which were originally excluded as the patients did not undergo surgery and thus no GS were available. These cases have been evaluated by the surgeons and classified as ‘acceptable’.

Clinical identification of the necrotic area (R1,R2): In our study, we followed the same procedure for cartilage identification that is established in our institution for computer-aided surgeries in general. The surgeons selected the necrotic bone area manually on the segmented 3D bone surface (recognized by bump-like irregularities) and used the original CT data as decision support. For the patient set, this information was then transferred into the analytical ellipsoid and Gaussian representation.

Clarification of surgeons’ evaluation criteria (R1,R2): Thanks for pointing this out. The criteria used by the surgeons were only mentioned in the introduction: spherical geometry, bringing the intact bone into the load bearing zone, intact no-go zones, minimal residual articular step-off and sufficiently large neck pillar. We will state in the results section that the evaluation was performed based on these criteria. We will add a small image of the DRL solution of each case to Tab. 1 and provide larger high-resolution images in the suppl. material.

Missing details about DRL method and clinical evaluation criteria (R2,R3): Our reward range of -40 to 2 represents the range over which the agent optimizes, whereas a negative reward value does not necessarily represent a penalty. Therefore, the sign of the reward value does not reflect how well the agent learns (R2). Regarding the objective function, we agree with R2 that the aggregation of the goals requires caution. In our setting, pre-normalization was not performed, since the weights in Eq. (4) account both for varying value ranges and the term importance. Regarding our agent choice we found PPO to perform the best in terms of learning progress, achieved rewards and low standard deviation. We will add a short explanation of these details to the methodology. Concerning reward hacking, our reward function was designed based on well-established clinical criteria where each component was tested individually by capping the remaining rewards. In addition, we will draw the reader’s attention to the reward hacking problem in the discussion and refer to relevant literature.

Simplification through analytical representation (R2): To show that our representation is reasonable, we have measured the error of the ellipsoid fit with the original femoral head shape and found an average closest-point RMSE of 0.9 mm (sentence will be added to the manuscript). We further evaluated the difference between the manually defined and analytically modeled necrotic zone for one example case, yielding a closest-point RMSE of 1.3 mm.




Post-rebuttal Meta-Reviews

Meta-review # 1 (Primary)

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    The paper presents the first use of DRL for femoral head reduction osteotomy (FHRO) planning, a clinically complex procedure to plan so the study is well-motivated. The rebuttal addresses the main critique regarding sample size, surgeons’ evaluation criteria, details of the DRL model, and clinical details (identification of necrotic area). Concerns regarding reproducibility will be addressed by making the code and pre-trained models publicly available upon acceptance.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2



Meta-review #2

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This paper, titled “A new Approach to Orthopedic Surgery Planning using Deep Reinforcement Learning and Simulation” was reviewed by 1 clinician and 2 engineering students and received somewhat polarizing reviews:

    R1 strength - clinical relevance - novelty weakness - clarity - clinical translation R2 strength - clinical relevance - writing quality - evaluation weakness - missing technical details R3 strength - evaluation - clinical relevance weakness - small dataset leading to questionable clinical significance

    All agreed the writing quality is excellent and is of clinical relevant. The primary AC provided 3 points for authors to focus in their rebuttal. In it, authors stated that the “source code, pre-train models, …” will be publicly available by including a link in the final paper if accepted, thus the issue of reproducibility is addressed. It should also be noted that, based on the Reviewer Guidelines (https://miccai2021.org/en/REVIEWER-GUIDELINES.html), this paper falls under the category of “2. Demonstration of clinical feasibility, even on a single subject/animal/phantom”, thus this AC is satisfied with the rebuttal in this regard. In the rest of the rebuttal, authors provided sufficient details on how they may be addressed adequately in the revised manuscript.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    2



Meta-review #3

  • Please provide your assessment of the paper taking all information into account, including rebuttal. Highlight the key strengths and weaknesses of the paper, clarify how you reconciled contrasting review comments and scores, indicate if concerns were successfully addressed in the rebuttal, and provide a clear justification of your decision. If you disagree with some of the (meta)reviewer statements, you can indicate so in your meta-review. Please make sure that the authors, program chairs, and the public can understand the reason for your decision.

    This manuscript received very good evaluations (revs #1 and #2) and a mild-negative one (#3), which highlights the value of the work, despite some flaws and clarifications required. Most of the clarifications required by the authors were properly explained in the rebuttal by the authors. The work is valuable and should be considered for publication in MICCAI.

  • After you have reviewed the rebuttal, please provide your final rating based on all reviews and the authors’ rebuttal.

    Accept

  • What is the rank of this paper among all your rebuttal papers? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    10



back to top