Introduction

I here provide an algorithm to automatically detect NREM periods given a sequence of sleep stages. I developed this algorithm at the laboratory to facilitate a tedious process. Finding its use of formal language theory interesting, I decided to make it public. The problem at hand is as follows.

Given sleep stages $s_1, \ldots, s_n$, where $s_i$ the sleep stage of the $i$th epoch in a night of sleep, we wish to detect $(a)$ which of these epochs correspond to a NREM period and, if possible, $(b)$ how many such periods exist. A bit more formally, given such sequence, we wish to produce a sequence of $k$ vectors $\varrho_1, \ldots, \varrho_k$, such that if $\varrho_i = (\varrho_i^1, \ldots, \varrho_i^z)$, then $\varrho_{i}^{j}$ is the $j$th epoch in the $i$th NREM period.

Such output would readily satisfy points $(a)$ and $(b)$, and thus provide a complete characterization of the NREM periods in a night of sleep.

Of course, whatever semantic such algorithm could have depends entirely on that of NREM period. In other words, a satisfactory definition of NREM period must be provided. We follow one that is standard in sleep science, provided by Feinberg & Floyed and also discussed extensively in the work of Dijk. Namely, we say a sequence of sleep stages corresponds to a NREM period if and only if:

The sequence contains subsequences of NREM stages not interrupted by more than $5$ minutes of other stages (including wakefulness).
The total amount of NREM sleep in the sequence is $15$ minutes or more.
The sequence is terminated by either $a.$ 5 minutes or more of REM or $b.$ $5$ minutes or more of wakefulness.

As the reader may see, the sequence of NREM stages is not required to be continuous. In other words, between any two stages in a NREM period there may be stages which do not correspond to NREM sleep. These should be ignored unless they last for five minutes or more-i.e. unless they qualify as terminating sequences, as specified in the items above.

A special consideration is that, for the first and last NREM periods in a night of sleep, the restriction that terminating sequences last $\geq 5$ minutes is not imposed - i.e. a single REM or wake stage is considered to terminate the period. This small caveat is not considered in the formal algorithm presented below, so as to keep it general, but is trivial to make effective in a practical implementation - as it actually is in the implementation I refer to at the end of this entry.

The algorithm

We define a sleep stage as an element of the alphabet $\Sigma = \{ 1, \ldots, 6 \}$. We understand $1, \ldots, 4$ as the usual stages, $5$ as REM and $6$ as wakefulness. Let $n, m \in \mathbb{N}$ be parameters in our algorithm, with $n$ specifying the number of epochs which make up for $15$ minutes and $m$ the number which makes up for $5$ minutes.

Let $\mathcal{L} = [(5+6)^* (2+3+4)^* ]^* \subseteq \Sigma^*$. Define $\mathcal{L}'\subseteq \mathcal{L}$ as the set of words containing at least $n$ symbols in ${ 2, 3, 4 }$ and s.t. no other symbols occur repeatedly $m$ or more times. Trivially, $\mathcal{L} = \mathcal{L}' \cup \overline{\mathcal{L}'}$; i.e. $\mathcal{L}'$ induces a trivial partition of $\mathcal{L}$.

Then $\alpha = s_1 \ldots s_r$, with $s_i \in \Sigma$, can be subjected to the following decomposition:

$$\alpha = \psi_1 \phi_1 \psi_2 \phi_2 \ldots \psi_k \phi_k \psi_{k+1}$$

where $\phi_i = \varphi_i (5^m5^* + 6^m6^*)$ and $\varphi_i \in \mathcal{L}$. Such a decomposition readily provides the number of NREM periods in the EEG (i.e. $k$), insofar as the $i$th NREM period is by definition a sub-word of $\phi_i$.

The previous decomposition of $\alpha$ inspires the following effective procedure, which allows for the detection of each $\phi_i$.

In $s$, replace the occurrences of $1$ with $0$ and the occurrences of $2, 3, 4$ with $1$.
Let $\omega_1, \ldots, \omega_q$ be the sub-words of $s$ which consist of repeated symbols.
For each $\omega_1, \ldots, \omega_q$, if $0$ or $1$ occur in $\omega$ do nothing; if $|\omega| \geq m$ do nothing; otherwise, replace $\omega$ with $0^{|\omega|}$.
Output $\omega := \omega_1 \ldots \omega_q$

An example of each step of this procedure with $m = 3$ over the word $123331166555$ is

$$ 123331166555 \to 011110066555 \to 0111100 66555 \to 011110000555 $$

The outputted word, $011110000555$, reveals crucial information about NREM sleep. The $1$s correspond to periods of NREM stages (only one in this simple example), the $0$s to stages which should be ignored, and sequences of $5$ or $6$ (of length $\geq m$) to sequences which mark the end of a NREM period. It is trivial to extract from this output the indexes of all occurrences of $1$, each of which corresponds to an epoch within the NREM sleep.

The effective procedure is easily implemented using a regular expression which corresponds to the decomposition given above. The algorithm is $O(r)$, since its implementation iterates repeatedly over the characters of $\alpha$.

A specific implementation of this algorithm, in the Julia language, is provided in my EEG Toolkit package for EEG analysis. Given a vector (sequence) of sleep stages, it outputs a vector of vectors result, s.t. result[i] contains the epochs which corresopnd to the $i$th NREM period.