Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Anirban Dutta, Anil Kamat, Basiel Makled, Jack Norfleet, Xavier Intes, Suvranu De

Abstract

Functional brain connectivity using functional near-infrared spectroscopy (fNIRS) during a pattern cutting (PC) task was investigated in physical and virtual simulators. 14 right-handed novice medical students were recruited and divided into separate cohorts for physical (N=8) and virtual (N=6) PC training. Functional brain connectivity measured were based on wavelet coherence (WCOH) from task-related oxygenated hemoglobin (HBO2) changes from baseline at left and right prefrontal cortex (LPFC, RPFC), left and right primary motor cortex (LPMC, RPMC), and supplementary motor area (SMA). HBO2 changes within the neurovascular frequency band (0.01-0.07Hz) from long-separation channels were used to compute average inter-regional WCOH metrics during the PC task. The coefficient of variation (CoV) of WCOH metrics and PC performance metrics were compared. WCOH metrics from short-separation fNIRS time-series were separately compared. Partial eta squared effect size (Bonferroni correction) between the physical versus virtual simulator cohorts was found to be highest for LPMC-RPMC connectivity. Also, the percent change in magnitude-squared WCOH metric was statistically (p<0.05) different for LPMC-RPMC connectivity between the physical and the virtual simulator cohorts. Percent change in WCOH metrics from extracerebral sources was not different at the 5% significance level. Also, higher CoV for both LPMC-RPMC magnitude-squared WCOH metric and PC performance metrics were found in physical than a virtual simulator. We conclude that interhemispheric connectivity of the primary motor cortex is the distinguishing functional brain connectivity feature between the physical versus the virtual simulator cohorts. Brain-behavior relationship based on CoV between the LPMC-RPMC magnitude-squared WCOH metric and the FLS PC performance metric provided novel insights into the neuroergonomics of the physical and virtual simulators that are crucial for validating Virtual Reality technology.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87202-1_61

SharedIt: https://rdcu.be/cyhRg

Link to the code repository

N/A

Link to the dataset(s)

N/A

Reviews

Review #1

Please describe the contribution of the paper

A pilot study on the field of surgical neuroergonomics studying potential differences in brain response due to different training. The task chosen is gauze cutting with endoscopic scissors.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The topic under investigation; fascinating, much needed, and impact on surgical training is potentially large.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Many experimental design and data processing and analytical decisions are questionable. The draft appears a bit rushed; introduction and discussion and nomological validity are shallow and conclusions are virtually missing
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Good. Except for minor details, it is easy to conceptualize de replication of the experiment, and the reproducibility of results. Either I missed it or it is not stated where or how the data will be available.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
OVERALL COMMENTS

The experimental design has decisions which needs some explanation; why was between subjects design preferred over within subjects design (with randomized training)? How were the group allocation randomized? Was any kind of training provided before execution? It seems that the task was only executed once, is this correct? Was this cross-sectional? If not, how many sessions? The analytical strategy appears almost circumstantial (the systemic information was not regressed out but instead studied separatedly, wavelet coherence analysis has not been validated for this purpose -in general, and less in fNIRS!-, there is no clear hemodynamic process that I am aware of that resembles the Morlet mother wavelet, the decision to average wasted the purpose of a wavelet analysis, etc).

SUGGESTIONS TO IMPROVE THE DRAFT
- Improve all the background on the existing literature of neuropsychological transfer of knowledge. There is already literature evidencing that indeed the brain will learn differently; according to this wealth of literature physical training (closer to real task training) was expected to result in higher ad-hoc performance, but virtual training is more likely to provide generalizable (more transferable) skills due to higher degree of abstraction.
- The literature review on surgical neuroergonomics is also shallow; even if we restrict ourselves to fNIRS
- Report previous training, or skill levels by the subjects, inc. familiarity with virtual environments, etc.
- Report details of the virtual training conditions and environment; degree of immersiveness, naturalistic or other user interface, how the virtual scoring of the cut is actually comparable to the physical scoring of the cut, etc
- Provide prior power analysis.
- Where were the channels positioned exactly? Please show map and variability of positioning.
- It is mentioned that BrainAtlas was used to estimate channel sensitivity, but it is not clear how registration was achieved?
- If you were going to average, what was the original point of using wavelet analysis? Shouldn’t a simpler classical coherence analysis suffice?
- Use of stats seem more carried out using a predefined blind pipeline that some conscious effort to delimit/quantify uncertainty which is the real purpose of statistics; e.g. What correction was used for the many multiple comparisons carried out? taking the Shapiro-Wilkis p-value (or any other test for the sake of it!) at face value without actually looking at the data distribution through the histogram, QQ-plot etc to decide how to proceed, not questioning for the confidence intervals, statistical power, interaction terms, shouldn’t ANCOVA have been used to ensure potential differences at start were accounted for? Etc In general, the use of statistics is disappointing from a mathematical point of view.
- The discussion is centered around whether some feature from the data analysis can see an (statistical difference). As aforementioned, this is utterly circumstantial. By mere associative modelling there ALWAYS be one (or more) statistics that exhibits numerical difference!! In fact, it is mathematically impossible not to. The statistical validity of establishing such relation is unrelated to whether a true underpinning relation does actually exist; Let me emphasize that I’m not asking for a causal analysis, I am just stating a mathematical fact. So the numerical exercise without proper validation of the data analysis path on synthetic data is meaningless. I cannot find in the discussion any justification for skipping pipeline/data analysis pipeline validation, nor a nomological validity stating how presented findings do align or contradict existing knowledge. The question is therefore not whether such score exist (which seem the only thing that the authors are seeking) but whether it conveys any suitable domain interpretation (to be precise, I am referring here to the mathematical term interpretation; a solution to a problem that reconciles a model and the observations bounded by the uncertainty of the measurement system) which here is absent.
Minor:
- Add space before every “(“
- Ensure you add your standard deviation bars to Fig 3.
- Provide cohort demographics
- Suggestion: Provide exemplary fNIRS timecourse.
- Follow closely the recent best practice recommendations by the SfNIRS board [Yucel et al (2021) Neurophotonics 8(1):012101-1]
- Please avoid subjective appreciations such as “carefully mounted” or “made sure that…” and explicitly report the objective efforts carried out.
- How was self-paced response dealt with?
- Report the recruiting strategy as well as the inclusion/exclusion/elimination criteria.
- How was saturation related issues (e.g. apparent non-recordings, mirroring, etc) dealt with?
- Please provide a reference for the chosen pathlength factors. Also, it is mentioned that the values refer to “partial” pathlength factors and not DPF (differential). Partial PF (PPF) are used when reconstruction models have several layers. What were these other layers? What were the considered thicknesses? Many years ago, Strangman et al published a layered reconstruction with PPL but this is not implemented in HomER as far as I am aware. Can the authors confirm?
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The necessity for this type of studies is critical, and yet there aren’t that many. So every effort in this direction, we have to make it count.
What is the ranking of this paper in your review stack?

4
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #2

Please describe the contribution of the paper

This paper proposes using near-infrared spectroscopy to investigate the differences in learning associated with training in a physical surgical simulation environment versus training in a virtual surgical simulation environment. Identifying differences in learning associated with different simulation environments is an important area of work.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

This work addresses an important issue and provides an innovative way to assess the impact on learning of different training environments for surgery by examining brain activations during the learning process.

The collected dataset seems to be rich with fNIRS data on trainees performance simulation-based training for surgery.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Clarity is lacking in the statistical testing.

The paper does not provide sufficient analysis on the meaning of the results.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The work leverages existing open-source frameworks AtlasViewer to measure cortical regions and HOMER3 to correct for motion.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

Major comments: I would expect changes in brain activation during surgical training to be significantly affected by skill level of the trainees. How can we know that participants were roughly all at the same skill level? Were all medical students from the same year of training? Do they have equal performance based on outcomes in the simulation environment (i.e. subjective assessment on the simulator)? Can this be accounted for in statistical testing?

I believe this paper requires more clarity on how the statistical tests were performed. Were multiple ANOVAs performed? If so, there should be correction for multiple tests with some explanation. If not, there should be some explanation of how single ANOVA was performed and posthoc testing. At present, it is unclear to me whether the observed significant difference is due to multiple tests.

The primary result of this paper seems to be that the WCOH metric for LPMC-RPMC differed for trainees on the physical versus virtual simulators. What does this tell us about how the learning process differs on the two types of simulators? Are the differences observed meaningful in practice?

Minor comments: How long did participants work with the system prior to measurement after baseline? How might this affect the results?

Fig 1 (A) and (C) each contain 10 measures. Fig 2 (B) and (C) contain 8 measures and 9 measures, respectively. Why are there different numbers of measures used?
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the idea behind this paper is innovative, it is unclear to me the meaning of the results and how factors such as skill may affect them.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

1
Reviewer confidence

Confident but not absolutely certain

Review #3

Please describe the contribution of the paper

This work investigates functional brain connectivity and compares changes thereof during a simulated training environment for an FLS task, namely, pattern cutting. Two training platforms are compared, a conventional box-trainer commonly used for FLS training modules and a virtual reality simulator.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

Great work is presented in this paper. Comparing physical vs. virtual training modules for FLS has been done before. But using functional brain connectivity as criteria for comparison is original. This was explored before but using fMRI, whereas the work in this paper uses fNIRS which is more portable yet less costly and encumbering to the user. Furthermore, although the sample size (number of study participants) is not very large, the experiment and equipment are very well constructed and statistical analysis is scientific and sound.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The sample size for the two groups (8 and 6) is not large enough to draw statistical conclusions.
Please rate the clarity and organization of this paper

Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The authors include sufficient details of the methods for reproducibility purposes. The equipment and technical details used are well described and referenced.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- The paper needs a proof read. There are several instances of syntactical, formatting, and grammatical errors.
- A conclusion is needed for the paper. Authors should cut down on other parts of the paper and insert a conclusion that summarizes the work and its implications and reiterates the claims.
- The authors should include visuals of the two training platforms, physical and virtual.
Please state your overall opinion of the paper

strong accept (9)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

I am extensively versed in the literature of physical and virtual training platforms and approaches for surgical training and studies to compare them. The work presented here is well aligned with conventional approaches and experiments. It also adds a criteria for comparing the two platforms that is not very well addressed, i.e., monitoring functional brain activity during task completion.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

4
Reviewer confidence

Very confident

Review #4

Please describe the contribution of the paper

This paper study the difference in functional brain connectivity between two cohorts performing a pattern cutting task on FLS and VBLaST skill trainers. The brain connectivity was assessed thanks to wavelet coherence (WCOH) and wavelet phase coherence (WPCO) metrics and compared to a baseline compute from a resting state.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The main strength of the paper is the use of a low-cost system to assess if differences exist between the FLSbox and VBLaST skill trainer.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The main weaknesses of the paper are the absence of comparison with other systems and the lack of clarity. Secondly, the comparison between performance scores assessed with two different methods create a bias that was not discussed.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The relevant information to reproduce is provided on the paper.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

The paper is difficult to understand due to the numerous acronyms used. Moreover, at least the first occurrence of fMRI is not defined (introduction part). Two different acronyms are used for the coefficient of variation, CV on abstract and CoV on part3.3.

The performance scores were assessed with different methods on FLS and VBLaST, this creates a bias for making the comparison. It will be more accurate to use the same assessment method, or at least discuss this bias. On FIGS2 (supplementary material), what is the meaning of each color?

Results show statistical differences for the LMPC-RPMC functional connectivity between FLSbox and VBLaST on the WCOH metric. This point is not enough discussed especially because the main objective was to demonstrate the absence of differences between both types of trainer boxes.
Please state your overall opinion of the paper

borderline accept (6)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

The paper present interesting results, however the clarity of the paper is poor and have a leak of discussion.
What is the ranking of this paper in your review stack?

4
Number of papers in your stack

7
Reviewer confidence

Somewhat confident

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

The study presents a novel portable neuroimaging approach for monitoring brain activation profiles to distinguish between bimanual training on a physical versus virtual surgical simulator, using fNIRS signals during a pattern cutting task. The study is performed on 14 novice medical students, demonstrating the feasibility of online monitoring of the brain activation during surgical training. The topic is of clinical interest, with high potential impact in the field of surgical training, and the study is well-motivated.

Criticism of the paper is related to the clarity of the experimental design, data processing, and better discussion and analysis of the results. Feedback from the reviewers regarding details in the experimental design and statistical testing, clarification on the meaning of the results (statistical differences on the WCOH metric for LPMC-RPMC between FLSbox and VBLaST), addition of other relevant existing literature, and improvements to figures and text/proofreading should be incorporated in the final submission.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

1

Author Feedback

We have provided a point-by-point response to reviewers’ comments in Supplementary Materials. Few salient ones are provided below:

\4. Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work. Many experimental design and data processing and analytical decisions are questionable. The draft appears a bit rushed; introduction and discussion and nomological validity are shallow and conclusions are virtually missing

Reference for the experimental design highlighted in revised draft: Assessing bimanual motor skills with optical neuroimaging BY ARUN NEMANI, MERYEM A. YÜCEL, UWE KRUGER, DENISE W. GEE, CLAIRICE COOPER, STEVEN D. SCHWAITZBERG, SUVRANU DE, XAVIER INTES SCIENCE ADVANCES03 OCT 2018: EAAT3807

We have revised our draft with one-way multivariate analysis of variance (one-way MANOVA) in SPSS version 27 (IBM) to improve data processing and analytical decision making. We have also improved the introduction and discussion sections in the revised the draft - a snippet below. Many prior works [3–7] have assessed surgery training using fNIRS mainly to compare skill levels; however, we could not find prior works on portable neuroim-aging to compare surgery training in physical versus virtual simulators. This is crucial since surgical training field has seen a rapid emergence of new virtual training technologies which are gradually replacing physical training simulators with virtual training simulators. Here, virtual simulators been shown to improve acquisition of skills [8]; although, the neural correlates of skill acquisition in physical versus virtual simulators are unknown.

Nomological validity is now addressed in the Discussion section based on our related works - a snippet below. Here, nomological validity can be derived from prior work [19] that showed corti-cal functional MRI variability in parietal cortex explained the movement extent variability. In our study, novice subjects performed more variable movements in the physical simulator than the virtual simulator that was related to higher varia-bility in the functional connectivity metrics in the physical simulator. Our results also aligned well with the prior work using whole-brain imaging [1] that have demonstrated the necessity of the modulation of interhemispheric inhibition (pri-mary motor cortices) for bimanual coordination.

The analytical strategy appears almost circumstantial (the systemic information was not regressed out but instead studied separatedly, wavelet coherence analysis has not been validated for this purpose -in general, and less in fNIRS!-, there is no clear hemodynamic process that I am aware of that resembles the Morlet mother wavelet, the decision to average wasted the purpose of a wavelet analysis, etc).

References provided for Morlet and decomposition of wavelet analysis, one listed» Duan L, Zhao Z, Lin Y, Wu X, Luo Y, Xu P. Wavelet-based method for removing global physiological noise in functional near-infrared spectroscopy. Biomed Opt Express. 2018;9(8):3805-3820. Wavelet approach allowed mean and variance (CoV) calculations. The crucial brain and behavior correspondence was found based on CoV [19], i.e., relative variability in interhemispheric LPMC-RPMC functional connectivity is postulated as the “neural correlate” of the relative variability in the performance.

Report details of the virtual training conditions and environment; degree of immersiveness, naturalistic or other user interface, how the virtual scoring of the cut is actually comparable to the physical scoring of the cut, etc

We have added references from prior works – one reference below. Sankaranarayanan G, Lin H, Arikatla VS, Mulcare M, Zhang L, Derevianko A, Lim R, Fobert D, Cao C, Schwaitzberg SD, Jones DB, De S. Preliminary face and construct validation study of a virtual basic laparoscopic skill trainer. J Laparoendosc Adv Surg Tech A. 2010

back to top

Interhemispheric functional connectivity in the primary motor cortex distinguishes between training on a physical and a virtual surgical simulator