Paper Info Reviews Meta-Review Author Feedback Post-rebuttal Meta-Reviews

Authors

Carlos A. Loza, Laura L. Colgin

Abstract

We propose a generative model for single-channel EEG that incorporates the constraints experts actively enforce during visual scoring. The framework takes the form of a dynamic Bayesian network with depth in both the latent variables and the observation likelihoods-while the hidden variables control the durations, state transitions, and robustness, the observation architectures parameterize Normal-Gamma distributions. The resulting model allows for time series segmentation into local, reoccurring dynamical regimes by exploiting probabilistic models and deep learning. Unlike typical detectors, our model takes the raw data (up to resampling) without pre-processing (e.g., filtering, windowing, thresholding) or post-processing (e.g., event merging). This not only makes the model appealing to real-time applications, but it also yields interpretable hyperparameters that are analogous to known clinical criteria. We derive algorithms for exact, tractable inference as a special case of Generalized Expectation Maximization via dynamic programming and backpropagation. We validate the model on three public datasets and provide support that more complex models are able to surpass state-of-the-art detectors while being transparent, auditable, and generalizable.

Link to paper

DOI: https://doi.org/10.1007/978-3-030-87240-3_53

SharedIt: https://rdcu.be/cyl6t

Link to the code repository

https://github.com/carlosloza/DNDBN

Link to the dataset(s)

https://doi.org/10.5281/zenodo.2650142

Reviews

Review #1

Please describe the contribution of the paper

The authors have proposed a deep neural dynamic Bayesian network to simulate the visual EEG scoring of experts. The model has been validated on a small publicly available dataset and have been compared with some other methods in the literature.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The method is novel and very interesting.
- The topic of paper is clinically significant.
- The evaluation metrics are sufficient.
- The data collection, processing, and division methods are explained properly.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- No statistical analysis has been carried out. Moreover, only the average has been reported in Table 1.
- There is no discussion about the limitations of the study.
- The proposed method has so many hyper-parameters. This might have negative impact on its practical value.
- The number of cases is very small.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Based on the text and the checklist, the paper provides sufficient details about the models datasets, and evaluation.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

1- Please add a proper statistical analysis. 2- Please add a measure of spread to the Table 1. 3- Please discuss the limitations of the work. Especially, the large number of hyper-parameters and also small number of subjects.
Please state your overall opinion of the paper

accept (8)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This is a very good paper with a strong technical innovation.
What is the ranking of this paper in your review stack?

1
Number of papers in your stack

5
Reviewer confidence

Confident but not absolutely certain

Review #2

Please describe the contribution of the paper

The paper describes the application of a hidden semi-Markov model combined with a neural network for sleep spindle detection. The likelihood distribution parameters are generated by the neural network while the sleep sequence is modeled by the HSMM.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

The paper presents an interesting approach to sleep spindle detection using a combination of graphical modeling and neural networks.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

The model description is dense and difficult to follow. Details regarding the specific GEM updates and neural network training are omitted.

The results section would benefit from the addition of subsections to make the experiments clearer. In addition, some models seem to be left out of some analyses.
Please rate the clarity and organization of this paper

Satisfactory
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The methods are complicated and not completely described. Without code the methods would be difficult to reproduce. The dataset appears to be public.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

While the overview of the HSMM graphical model seems complete enough for someone with background in graphical modeling to understand, details regarding the inclusion of the neural network in the training process are omitted. It is unclear if the network is trained as art of the GEM algorithm or separately from the GEM procedure.

Some discussion of integration of DNNs with graphical models should probably be included in the introduction. For example, HMM-DNN systems have been in use in speech recognition for some time (Deep Neural Networks for Acoustic Modeling in Speech Recognition). In these methods the scores from the neural network are used in the HMM by Bayes rule rescoring or other post hoc approaches. Instead, the authors opt to use the neural network to predict parameters of the likelihood distributions, which is an interesting approach. However, more direct comparison to alternatives should probably be discussed in greater detail.

The results are presented in an undivided section with no sub-sections. This makes the results section difficult to read. Some restructuring would improve clarity.

Table 2 is not very informative, as NLL is not an interpretable metric and can not be applied to other models.

Similarly, Table 3 seems to show very low accuracies which calls the efficacy of the proposed method into question.

Wendt, Martin, and Parekh baselines are not discussed. in addition, in the text they are referred to by citation number whereas in the Table 1 they are referred to by author, making it harder to understand what results correspond to what approach. In addition, these models seem to be excluded from results on page 8.

The HMM model is not included in table 1.

Figure 3 is very difficult to interpret. It is unclear if figure 3B shows learned duration distributions or distributions computed from the dataset. Furthermore, it is not clear why histogram values go above 1, as the distribution appears to be discretized.
Please state your overall opinion of the paper

probably reject (4)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

While the approach seems novel and interesting, the methods are not clearly described and some details appear to be omitted. The results section is not well organized. The paper would greatly benefit from restructuring to make details clearer and including missing details regarding training.
What is the ranking of this paper in your review stack?

5
Number of papers in your stack

5
Reviewer confidence

Very confident

Review #3

Please describe the contribution of the paper

This paper proposes a deep Neural Dynamic Bayesian Network to detect sleep spindles using EEG Data. This model can be applied without preprocessing of data and thus makes the model appealing to real-time applications and provides interpretable hyperparameters that correspond to known clinical criteria.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.

It covers theoretical foundations for hyperparameter settings. Robustness against artifacts Evaluation on three datasets
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.

Why Parekh’s model is showing high MCC in some subjects? if you can investigate that part also and compare your model behavior at those instances.
Please rate the clarity and organization of this paper

Very Good
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

Public dataset is used.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html

A very detailed theoretical work is proposed. You can also propose this network with different pre-processing steps and can describe in which condition our model can fail and little pre-processing is required. Some places are filled with typos and missing clarity. Like non–linear generalized t likelihoods
Please state your overall opinion of the paper

Probably accept (7)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?

This paper seems a significant theoretical work.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

5
Reviewer confidence

Not Confident

Review #4

Please describe the contribution of the paper

The paper presents a probabilistic framework for single-channel EEG in an application of sleep spindles. The evaluation is performed on three public datasets and the presented model is able to surpass the previous detectors.
Please list the main strengths of the paper; you should write about a novel formulation, an original way to use data, demonstration of clinical feasibility, a novel application, a particularly strong evaluation, or anything else that is a strong aspect of this work. Please provide details, for instance, if a method is novel, explain what aspect is novel and why this is interesting.
- The paper presents a novel deep generative model, Deep Neural Dynamic Bayesian Network (DNDBN), for sleep spindles detection. The proposed generative model considers Normal-Gamma distribution for parametrizing the observation and allows time-series segmentation into different regimes.
- The presented method is validated on three datasets and achieves interesting results. The experimental evaluation demonstrates the benefit of the proposed method on different fronts, including better detection and uncertainty quantification.
Please list the main weaknesses of the paper. Please provide details, for instance, if you think a method is not novel, explain why and provide a reference to prior work.
- The paper is poorly written and is below par for the MICCAI submission. From the perspective of motivating for the method to describing the results, in general, the paper is difficult to understand.
- The provided experiments are interesting but somewhat insufficient to give the full merits of how the proposed method is initially motivated. For instance, the paper is argued to be “transparent”, “audible,” and “generalizable”, but experimentation to support them all seems lacking.
Please rate the clarity and organization of this paper

Poor
Please comment on the reproducibility of the paper. Note, that authors have filled out a reproducibility checklist upon submission. Please be aware that authors are not required to meet all criteria on the checklist - for instance, providing code and data is a plus, but not a requirement for acceptance

The paper is not good on the reproducibility aspect.
Please provide detailed and constructive comments for the authors. Please also refer to our Reviewer’s guide on what makes a good review: https://miccai2021.org/en/REVIEWER-GUIDELINES.html
- The clarity of the paper has been an issue for me. I encourage authors to improve the presentation of the paper. It is clear that there are benefits of the proposed framework but how and why authors reach such a method is largely lacking. This has become even more critical as the related work section is missing. For example, even a simple thing like why y_n is posed as the Normal is not discussed. I think some of the technical parts can be relegated to the supplementary portion, and the main text should be clear and sharp to motivate readers toward the method.
- The argument of not using the variational method has been associated with “amortization gap” and “posterior collapse”. Have authors tried the variational method and face these challenges? Otherwise, there are a lot of works within the medical imaging community where the variational method is successfully employed. These challenges seem more specific to certain problems and applications.
- Why are previous methods, against which the current method is compared against, not described nor discussed? It is difficult to understand what aspect of the current method helps to achieve the presented results.
- Fig 1 is not clear. The diagram for the deep network is poorly drawn.
Please state your overall opinion of the paper

borderline reject (5)
Please justify your recommendation. What were the major factors that led you to your overall score for this paper?
- Poor paper quality.
- Novel generative model to tackle an interesting domain.
What is the ranking of this paper in your review stack?

3
Number of papers in your stack

6
Reviewer confidence

Confident but not absolutely certain

Primary Meta-Review

Please provide your assessment of this work, taking into account all reviews. Summarize the key strengths and weaknesses of the paper and justify your recommendation. In case you deviate from the reviewers’ recommendations, explain in detail the reasons why. In case of an invitation for rebuttal, clarify which points are important to address in the rebuttal.

While two of the reviewers criticized the methods presentation and overall paper clarity, I agree with the other two reviewers that this work is novel and interesting. Therefore, I think it is a good addition to MICCAI. With that said, the authors should carefully edit their paper to fix the noted clarity issues prior to submitting the final version.
What is the ranking of this paper in your stack? Use a number between 1 (best paper in your stack) and n (worst paper in your stack of n papers).

3

Author Feedback

We thank the reviewers for their comments and feedback. Next, we will address some of their questions, concerns, or overall comments:

REVIEWER 1:

Amount of hyperparameters: As many other deep learning architectures, the number of hyperparameters in our observation model is moderately large. However, learning is regularized via early-stopping and several parallel runs. On the other hand, the latent model hyperparameters are fairly interpretable (number of dynamical regimes—two in this application—and maximum sleep spindle duration). The other model parameters (A, pi, lambda) are learned from data via maximum likelihood.

Number of cases is small: The second set of results (DREAMS Subjects and Patients datasets) provides a much larger n than the first set of results.

Statistical analysis and discussion: To be added in final version

REVIEWER 2:

Reproducibility: Code will be shared via github

Restructuring of methods and results section: To be added in final version

Table 2 is not very informative: It is true that NLL is model-dependent. However, we believe Table 2 is important for two reasons: it compares expert scorings based on our generative model (e.g., expert2’s criterion is closer to the generative story told by the proposed model), and it highlights the fact that the unsupervised settings do not deviate too much from the supervised counterparts (i.e., our model initialization is principled)

Table 3 seems to show very low accuracies: Table 3 refers to sleep spindle densities, not accuracies. The fact that stage N2 has larger densities is another validation for the model.

HMM model was not added to Table 1 due to space constraints and very low MCC (close to chance levels)

Figure 3B comment: y axis is probability density, not probability of an event in a sample space

REVIEWER 3:

Low MCC in some subjects: This is common in sleep spindles detectors (either classic or probabilistic). We believe it has to do with the inherent inter/intra-subject variability of EEG on top of the lack of consensus amongst experts. It is worth revisiting in future work.

REVIEWER 4:

Comments about the paper being transparent and “audible”: Our contribution is definitely more transparent than classic black-box deep learning frameworks. It might not be as transparent as classic, threshold-based sleep spindles detectors, but we believe our proposed model is a good middle ground. We think the reviewer meant “auditable”, not “audible”

Reproducibility: The rest of the reviewers agree our models/methods/data are public and reproducible

Methods not discussed: Some sleep spindles detectors were not fully described due to space constraints.

META-REVIEWER:

The final version of the manuscript will include some restructuring

back to top

Deep Neural Dynamic Bayesian Networks applied to EEG sleep spindles modeling