Paper Info Reviews Meta-review Author Feedback Post-Rebuttal Meta-reviews

Authors

Tajwar Abrar Aleef, Ingrid T. Spadinger, Michael D. Peacock, Septimiu E. Salcudean, S. Sara Mahdavi

Abstract

Treatment planning in low-dose-rate prostate brachytherapy (LDR-PB) aims to produce arrangement of implantable radioactive seeds that deliver a minimum prescribed dose to the prostate whilst minimizing toxicity to healthy tissues. There can be multiple seed arrangements that satisfy this dosimetric criterion, not all deemed ‘acceptable’ for implant from a physician’s perspective. This leads to plans that are subjective where quality of treatment depends on the expertise of the planner. We propose a method that learns to generate consistent treatment plans from a large pool of successful clinical data (961 patients). Our model is based on conditional generative adversarial networks that use a novel loss function for penalizing the model on spatial constraints of the seeds. An optional optimizer based on a simulated annealing (SA) algorithm can be used to further fine-tune the plans if necessary (determined by the treating physician). Performance analysis was conducted on 150 test cases demonstrating comparable results to that of the manual plans. On average, the clinical target volume covered by 100% of the prescribed dose was 98.9% for our method compared to 99.4% for manual plans. Moreover, using our model, the planning time was significantly reduced to an average of 3 sec/plan (2.5 min/plan with the optional SA). Compared to this, manual planning at our centre takes around 20 min/plan.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_56

SharedIt: https://rdcu.be/cyhRb

Link to the code repository

https://github.com/tajwarabraraleef/TP-GAN

Link to the dataset(s)

N/A


Reviews

Review #1

  • Please describe the contribution of the paper

    This paper proposes a generative adversarial network (GAN) to quickly create seed placement plans for low-dose-rate prostate brachytherapy treatment planning with the aim of reducing the subjectivity and variability associated with the planning procedure based on physician/centre preferences and experience. Compared to existing automatic planning approaches, this work proposes the incorporation of additional space constraints, restricting adjacent seed placements, into the loss function, and includes a method for further refining the plan in post-processing to account for additional clinical constraints with minimal time added. The paper compares the proposed approaches to three existing methods, as well as ground truth clinical plans, based on standard clinical parameters.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • Comparison to existing methods: The paper directly compares their approach to three existing methods on the same large dataset, evaluating all of these approach on standard dosimetric metrics against ground-truth clinical plans.
    • Ablation study: The paper includes an ablation study to provide further insight into how the inclusion of various components (needle plans, data augmentation, adjacent seed restrictions) impacted performance.
    • Reproducibility: The paper includes a very clear explanation of the network and hyperparameters, as well as information about the set-up and training time, in addition to referring readers to an already established website for sharing the code.
    • Accessibility: The creation of these types of automation tools are particularly helpful for centres with limited institutional experience, adding to the accessibility of the clinical procedure.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • Limited scope: This work assumes that a needle plan has already been created with the scope limited to the seed placement within these preplanned needles; however, the paper argues in its motivation that it aims to address some of the spatial constraint challenges, such as pelvic arch interference, which generally would dictate possible needle trajectories rather than seed placement.
    • Lack of dataset diversity/description: The paper also motivates the problem by describing the need to reduce the subjectivity and variability in planning associated with physician and institutional preference/experience; however, the dataset used only includes data from a single centre and the experience level of the physicians choosing the ground-truth plans to be delivered is not described. This raises the question of how the plans created would be received at other centres with different preferences.
    • Statistical significance: Although the paper says that statistical significance between methods was evaluated with paired t-tests, this does not seem to be the correct statistical approach for the type of analysis/data being evaluated and these results are not clearly reported in the paper.
    • Limited novelty: The paper clearly acknowledges that many other studies have explored methods for automating this process, including the three existing methods that are used for comparison in the study (John et al. (2005), Nouranian et al. (2015), Anonymized). The specific application of this work offers some novelty through the incorporation of the GAN with a loss function accounting for spatial constraints but the approach itself offers limited novelty and incremental improvements on most metrics.
  • Please rate the clarity and organization of this paper

    Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Overall, the reproducibility of this paper is good. The paper provides a clear description of the networks, data splits, and hyperparameters, in addition to the set-up and training time. The paper refers readers to an already established website for sharing the code, with the authors indicating that both training and evaluation code is shared. The dataset used for these experiments, however, does not have approval to be made publicly available.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • Lack of dataset diversity/description: I recommend working to better motivate the need for the automated tool, perhaps emphasizing the increased accessibility for centres with low institutional experience, rather than physician variability and preferences, given the limitation of a single institution dataset. A brief description of the experience level of the planners/physicians generating the ground-truths is also necessary to appropriately interpret the work.
    • Description of metrics/analyses: The description of the metrics that will be evaluated (first two sentences of the Methods) and statistical tests would be more appropriately placed in the Methods rather than the Results.
    • Table 1 formatting: Table 1 is slightly misleading on first glance, as it is common practice to bold the best value (typically with statistical significance) in a column, whereas in the paper, the values associated with the proposed methods are all bolded. Although this is stated in the caption, I would recommend considering an alternate formatting method to highlight the proposed methods (ex. lightly shaded rows or thicker borders) and then bolding values in each column that have been demonstrated to be statistically significantly better (providing a simple way to communicate this information (see next point)).
    • Statistical significance: The paper evaluated the statistical significance between methods using paired t-tests; however, given the number of methods being evaluates it would appear that ANOVA with post-hoc analysis would be a more appropriate approach to assess this. Further, the actual results from the statistical significance testing does not appear to be reported anywhere in the paper. This should be described in the text and/or added to Table 1.
    • Ablation study: For the ablation study, I cannot find what parameter the Dice coefficient is describing – is this the PTV V100%? Please clarify this in the text. In addition, other combinations of components should be added to the table (ex. the case where the needle plan and adjacent seed loss function are incorporated but without data augmentation) or the choice of specific scenarios should be justified.
    • Figure 2: Figure 2 is interesting and valuable for the reader. If possible, it would also be extremely interesting for the reader to see a case where the approach performed poorly.
    • Section headings: The Discussion & Conclusion mainly restates the objective and significance of the work with some conclusions, whereas much of the discussion and description of limitations in actually in the Results section. I recommend changing “Results” to “Results & Discussion” and changing “Discussion & Conclusion” to “Conclusions” alone.
  • Please state your overall opinion of the paper

    borderline accept (6)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    This work is of interest to the MICCAI community and is clinically useful with comparison to a ground-truth and other existing methods; however, the paper is hindered by disconnects between the motivation described and the work performed and issues with the analyses, as well as limited scope.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    4

  • Reviewer confidence

    Very confident



Review #2

  • Please describe the contribution of the paper

    This paper proposes to generate treatment plans for brachytherapy using a GAN. Being given the planning target volume (more or less the segmented prostate) and the needle plan (number and position of needles), the system generates a plan (position of the seeds inside the needles at discrete locations) that delivers the suitable dose (the same that the one that was prescribed in all the cases of the dataset). The method does not have any dose computation in the model. To deal with potential hotspots an additional loss is introduced to penalize adjacent seeds. The results are post-processed and simulated annealing may be used as a final stage. The model is evaluated on a large dataset.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
    • A GAN is developed to generate brachytherapy treatment plans
    • It provides results comparable to human-made plans, rapidly
    • It has been tested on a large number of cases.
  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
    • The model needs to be fed with the needle plan (number and position of the needles)
    • It does not integrate any dose computation (in the Loss for instance) and it is not clear how it could generalize to treatments with different prescribed dose
    • Some (important) steps (needle plan generation and post-processing) are not described.
  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    Data not available

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

    Abstract: OK Introduction:

    • Replace “planing” by “planning”
    • Regarding the literature “Bert, Visvikis and colleagues” have published very efficient methods in 2019 and 2021. You should probably check whether your criticisms apply to them and if necessary, position your work with respect to these recent references: (1) Villa, M., Bert, J., Valeri, A., Schick, U., & Visvikis, D. (2021). Fast Monte Carlo-based Inverse Planning for Prostate Brachytherapy by Using Deep Learning. IEEE Transactions on Radiation and Plasma Medical Sciences. (2) Mountris, K. A., Visvikis, D., & Bert, J. (2019). DVH-based inverse planning using Monte Carlo dosimetry for LDR prostate brachytherapy. International Journal of Radiation Oncology* Biology* Physics, 103(2), 503-510. Methods:
    • The construction of the dataset is clear
    • A crucial input of the system is the needle plan; it would be important to explain in a few words how such a plan can be automatically generated.
    • The symmetry of plans in the dataset seems a very unusual property; could you comment on this?
    • An important issue is related to the fact that your dose model is totally implicit: this requires to have in the model plans where the prescribed dose is constant. This implies that the network cannot generate a plan for a prescribed dose slightly different. Could you comment on this?
    • The post-processing stage is poorly described: “it attempts to relocate the seeds” > how? “or remove them” > how the dose prescription is handled with one (or more) removed seeds? “ a further uniformization stage is then used”> what is the method and justification? Results:
    • V150 corresponds to over-treated regions, isn’t it?
    • You mention a Dice; please explain on what data it is computed (PTV and V100% ?)
    • In radiation therapy, dose constraints must be fulfilled (on the prostate and on the organ at risks), it would be very interesting and useful to provide the % of cases that do not fulfil the constraints (if any) and to discuss them (if any). Discussion
    • Since no US/MRI image is given to the network, the anatomical information is only rendered by the PTV/CTV. Isn’t it a problem regarding organs at risks such as the urethra? (or is the urethra explicitly given in the PTV?)
  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    The paper deals with an important problem and a significant evaluation has been performed. My concern is about the ability to generalize since the dataset contains patients having exactly the same prescribed dose.

  • What is the ranking of this paper in your review stack?

    2

  • Number of papers in your stack

    5

  • Reviewer confidence

    Very confident



Review #3

  • Please describe the contribution of the paper

    The presented paper explores the use of deep learning for treatment planning of low-dose-rate brachytherapy. Using segmentation masks and the potential locations for needle insertion as input, a neural network is employed to create a treatment plan by finding brachytherapy seed positions that result in a desirable dose distribution. The proposed method is tested with a dataset of 961 patient and is shown to slightly outperform three other methods, two based on simulated annealing and one based on sparse dictionary learning, while also being considerably faster. When the generated treatment plans are further finetuned using simulated annealing, the generated plan quality approaches that of treatment plans that were manually designed by clinical experts.

  • Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

    To my knowledge, this is the first study that aims to directly design brachytherapy treatment plans with the help of an encoder-decoder neural network architecture. In the process, the authors overcome several challenges posed by the specific application with the help of their domain knowledge. The proposed method is cleanly compared against the clinical standard and all introduced neural network components are benchmarked in an ablation study.

  • Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

    While the paper is mostly clearly structured and written, there are a few ambiguities regarding the used methodology and the conducted evaluation. Most noteworthy, the authors only report dosimetry-based evaluation metrics for the comparison of their method against baseline algorithms, but not in the ablation study. Conversely, geometry-based metrics are only reported for the ablation study. I believe both types of metrics should be reported for either evaluation as it is not apparent how the geometric and dosimetric evaluation metrics relate.

  • Please rate the clarity and organization of this paper

    Very Good

  • Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

    In addition to a clear description of the neural network architecture in the manuscript, the authors have laudably indicated their intention to publish their code. Combining both information sources, a reader should be able to reproduce the authors’ experimental setup.

  • Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
    • Other groups have explored rule-based systems (e.g. Oud et al. (2020), “Fast and fully-automated multi-criterial treatment planning for adaptive HDR brachytherapy for locally advanced cervical cancer”, Radiotherapy and Oncology, 148, 143-150) and reinforcement learning (e.g. Shen et al. (2019), “Intelligent inverse treatment planning via deep reinforcement learning, a proof-of-principle study in high dose-rate brachytherapy for cervical cancer”. Physics in Medicine & Biology, 64(11), 115013) to facilitate fast, automated treatment planning in brachytherapy. The use of deep learning to generate treatment plans has also been explored for external beam radiotherapy. I suggest briefly mentioning these relevant trends in the introduction.

    • Please describe whether needle plans are also provided to the baseline methods (SA_Seattle, jSDL and SA_genN) and - if not - whether there would be any expected benefit in doing so.

    • In the ablation study, AUC and DSC were used as evaluation metrics. Were these metrics calculated using the ground-truth and GAN-derived binary seed matrices? Why were not the dosimetry-based evaluation metrics used in the benchmark evaluation? Please clarify these ambiguities. Furthermore, the dosimetry-based evaluation should also be reported for the ablation study as it is not apparent how the geometric and dosimetric evaluation metrics relate.

    • The authors should outline how exactly the needle and seed matrices were transformed to the same dimension as the planning CT.

    • In the abstract the authors write that “the clinical target volume covering 100% of the prescribed dose was 98.9%”. I believe this should say “the clinical target volume covered by 100% of the prescribed dose was 98.9%” (CTV V100%).

    • I am not sure what the authors mean with the term “prehistoric treatment plans”.

  • Please state your overall opinion of the paper

    Probably accept (7)

  • Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

    I believe the presented paper should be accepted as it constitutes the first study to explore the use of an encoder-decoder neural network architecture for brachytherapy treatment planning. While developing their method, the authors have shown much attention to domain-specific details that often get overlooked in such application-focused studies.

  • What is the ranking of this paper in your review stack?

    1

  • Number of papers in your stack

    4

  • Reviewer confidence

    Confident but not absolutely certain




Primary Meta-Review

  • Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

    Overall enthusiasm from the reviewers, all recognizing the detailed experimental results, providing clear description of the dataset used, with very well detailed comparison to clinical plans determined by experts. They also emphasize the important clinical problem the GAN-like architecture it is attempting to address. They also recognize the original use of a GAN, with a encoder-decoder neural network architecture, for generating brachytherapy plans.

    Some weaknesses were noted. Reviewer #1 has mentioned several issues which would need to be addressed in a revised version, such as the scope of work, description of the dataset and adding statistical significance to the results. Other aspects include improved consistency in reporting results (R#3) and generalization capabilities since the dataset contains patients having exactly the same prescribed dose (R#2).

    The recommendation is therefore accept.

  • What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

    4




Author Feedback

We thank all the reviewers for their time and detailed feedback. We have addressed the main concerns below.

*Why does ablation study and comparison study have different evaluation metrics? As we are also interested in capturing the centre’s planning preference, our ablation study aims to see how well we can replicate the manual plans. Hence we prioritized geometry-based evaluation here such as plan similarity in terms of seed location, number of seeds used, and adjacency of the seeds.

For the final comparison, dose coverage is of more importance. We don’t report similarity metrics anymore as multiple solutions exist and manual plans are just one acceptable solution out of numerous valid solutions. Hence, a high similarity doesn’t always mean a better plan and vice versa. We also don’t report adjacent seeds because the post-processing step removes them if they are present. We do report the number of seeds used which can be compared to the manual plans to check if on average more or fewer seeds are being used.

*Why are all the ablation settings not reported? We don’t report the other combinations because the features we introduced were meant to be iteratively added on top of one another. For instance, if the needle plan is not included, it doesn’t matter if we use the Ladj loss or Augmentation, the network will not converge. Similarly, without Augmentation, the model predicts too many seeds for Ladj to have a significant effect.

*What is the experience level of the physicians who created the ground truth? How would the created plans be received at other centres with different preferences? We have used high-quality plans from successful LDR-PB patients treated at our centre. These plans were created by our centre’s expert Medical Physicists and then approved/revised further by our Radiation Oncologist and Treating Physician.

As data from our centre is used to train the model, the same planning style might not be preferred in other centres. If that’s the case, then they just have to retrain the model using their dataset.

*Why does the dataset contain a single prescribed dose? Can the model generalize when a different dose is prescribed? Standard guidelines for monotherapy LDR prescription dose using 125I sources is 140-160Gy. Our centre uses a prescribed dose of 144Gy and therefore our dataset contains a single prescribed dose. If other centres have a different standard for dose prescription, they have to retrain the model on their dataset for it to work.

*As no explicit contour of organs at risk (OAR) is provided, how does the model learn to avoid OARs? Our centre doesn’t contour the OARs separately as it is very challenging to do so from ultrasound images. However, expert planners from our centre implicitly consider the location of the OARs during planning. The approximate location of the urethra can be inferred from the PTV by following surrogate models available in the literature. The approximate position of the rectum can also be estimated from the PTV using simple geometry. As all plans were made considering the OARs, our model essentially learns to avoid OARs. This is demonstrated in our results from Table 1.

*The symmetry of plans in the dataset seems a very unusual property; could you comment on this? Our centre positions the template centrally to the prostate and therefore creates symmetric plans. This is done because, during the implant procedure, it is easier to replicate the symmetric positioning of the prostate. If the initial imaging is done in an asymmetric way (and an asymmetric plan is created), physicians will have to recreate that same asymmetric situation at the time of implant which is more difficult.

*Dice coefficient reported in the Ablation study was calculated on what? As we have mentioned in the article, all plans in our dataset are represented as binary matrices. Dice was used to compare similarity between the seed locations in the predicted and real plans.



back to top