EEGToolkit.jl
Computational EEG analysis with emphasis in sleep neuroscience.
Developed at the Laboratory for the Study of Sleep Slow-wave activity
The Gods of the earth and sea
Sought thro' Nature to find this Tree,
But their search was all in vain:
There grows one in the Human Brain.— William Blake
This package has three aims:
- Simplicity
- Transparency
- Efficiency
Simplicity means that a person with little programming background should be able to use it, at least with the documentation at hand. Transparency means that any methodology implemented by the package should be accessible enough so as to be reported in a scientific paper. Efficiency means that large EEGs (e.g. sleep EEGs) should be processed and analyzed in minutes or less.
Transparency affects primarily power spectral analysis (PSA). Most packages don't report how PSA is done. Proprietary software is typically even more obscure. Combined with the fact that PSA is not standardized and may be computed in several many ways, this makes it very difficult to compare and rest results.
This package is free software—free as in freedom. You are free to use the code as you wish and for any purpose. You are free to study the code and change it to make it do what you wish. You are free to redistribute copies of this package to help others. You are free to distribute copies of any modified version of this package.
Proprietary software hinders the liberty of its users. In science, it obscures the scientific process and makes replication and collaboration difficult. If you are a scientist, use free software whenever possible.
Package Features
- Loading and processing EEG data
- EEG visualization
- Sleep stage handling
- NREM Period detection
- Power spectral analysis
- Spindle detection algorithms
Time series
EEGToolkit.TimeSeries
— TypeA struct representing time series data.
Fields
x::Vector{<:AbstractFloat}
: Time series data.fs::Integer
: Sampling rate.secs::AbstractFloat
: Duration in seconds.mins::AbstractFloat
: Duration in minutes.hours::AbstractFloat
: Duration in hours.epoch_length::Integer
: Length in seconds understood to comprise an epoch (defaults to 30).subepoch_length::Integer
: Length in seconds understood to comprise an epoch (defaults to 30).
EEGToolkit.segment
— Functionsegment(v::Vector{T}, L::Int; overlap::Union{<:AbstractFloat,Integer}=0, symmetric=false) where {T}
Splits a vector v
into segments of length L
with an overlap overlap
expressed as a fraction of L. The overlap
defaults to 0
(no overlap). Returns a vector $v$ of vectors - i.e. Vector{Vector{T}}
- with $\vec{v_i}$ the $i$th segment in the split.
The function always attempts to capture the whole vector, even if the final split is not of length L. For example,
> x = [1, 2, 3, 4, 5, 6, 7, 8, 9, 0]
> segment(x, 5)
2-element Vector{Vector{Int64}}:
[1, 2, 3, 4, 5]
[6, 7, 8, 9, 0]
> segment(x, 7)
2-element Vector{Vector{Int64}}:
[1, 2, 3, 4, 5, 6, 7]
[8, 9, 0]
Set symmetric=true
to ensure that, if this occurs, the last split is dropped.
> segment(x, 3; symmetric=true)
3-element Vector{Vector{Int64}}:
[1, 2, 3]
[4, 5, 6]
[7, 8, 9]
If L
is equal to the segment length, segment
raises a warning and returns a vector with only the original vector: [v]
. The return value ensures type-safety but the warning is raised because splitting a vector over its length is potentially a programming mistake.
segment(ts::TimeSeries, L::Int; kargs...)
Wrapper to segment the vector ts.x
in the time series ts
.
EEGToolkit.epoch
— Functionepoch(signal::TimeSeries, n::Integer)
Returns a vector [x₁, …, xₖ]
with all values xᵢ
corresponding to the n
th epoch in the signal.
epoch(signal::TimeSeries, n::Integer, m::Integer)
Returns a vector [x₁, …, xₖ]
with all indexes corresponding to epochs n, n+1, …, m
of the EEG. The default sampling rate is used to compute the indexes.
EEGToolkit.gen_time_domain
— Functiongen_time_domain(fs::Integer, s::Union{Integer, AbstractFloat}, e::Union{Integer, AbstractFloat})
Generates a vector of Time objects representing the time instances $t_1, \ldots, t_n$ in a signal with with a given sampling rate $f_s$ from second $s$ to second $e$. For instance, $f_s = 500$, $s = 10, e = 11$ would map to $10.002, 10.004, \ldots, 10.998, 11$.
julia> gen_time_domain2(500, 10, 11)
500-element Vector{Time}:
00:00:10.002
00:00:10.004
00:00:10.006
00:00:10.008
00:00:10.01
00:00:10.012
⋮
00:00:10.992
00:00:10.994
00:00:10.996
00:00:10.998
00:00:11
julia> gen_time_domain2(500, 600, 6000)
2700000-element Vector{Time}:
00:10:00.002
00:10:00.004
⋮
01:39:59.998
01:40:00
gen_time_domain(signal::TimeSeries, s::Union{AbstractFloat,Integer}, e::Union{AbstractFloat,Integer})
Generates a vector of Time objects representing the time instances $t_1, \ldots, t_n$ in a TimeSeries
signal from epoch $s$ to epoch $e$.
gen_time_domain(signal::TimeSeries, s::Union{AbstractFloat,Integer}, e::Union{AbstractFloat,Integer})
Generates a vector of Time objects representing the time instances $t_1, \ldots, t_n$ in a TimeSeries
signal, starting at 00:00:init
where init
is $\frac{1}{f_s}$.
EEGToolkit.plot_ts
— Functionplot_ts(ts::TimeSeries, s::Integer, e::Integer; norm=false, ylab="Amplitude (uV)")
Plots TimeSeries
from epoch s
to epoch e
. The series many be normalized.
plot_ts(ts::TimeSeries, s::Integer; kargs...)
Plots TimeSeries
at epoch s
.
plot_ts(ts::TimeSeries; norm=false, ylab="Amplitude (uV)")
Plots TimeSeries
. The series may be normalized.
EEGToolkit.seconds_to_time
— Functionseconds_to_time(seconds::Union{AbstractFloat, Integer})
Helper function: maps a time in seconds to a Time object.
EEG
EEGToolkit.EEG
— TypeA struct for the EEG data type.
Fields
signals::Dict{String, TimeSeries}
: A dictionary mapping signal labels (strings) to arrays of floating-point values.staging::Vector{String}
: A vector of stage labels corresponding to each epoch.id::String
: An identifier for the EEG.
Constructors
EEG(file::String; epoch_length::Integer=30, staging::Vector{String}=[""], id::String="") -> EEG
: Instantiates an EEG
from an EDF file (file
).
Example
staging_vector = CSV.read("path/to/stage_data/eeg_staging.csv") # A vector with a stage per each epoch in the EEG
eeg_data = EEG("path/to/edf_data/data.edf"; staging=staging_vector)
EEGToolkit.filter_by_stage
— Functionfunction filter_by_stage(eeg::EEG, channel::String, stages::Vector)
Returns all portions of an EEG channel in a given stage of the staging vector.
EEGToolkit.remove_channel!
— Functionremove_channel!(eeg::EEG, channel::String)
Removes a channel from the EEG.
remove_channel!(eeg::EEG, channel::String)
Removes a list of channels from the EEG.
EEGToolkit.plot_eeg
— Functionplot_eeg(eeg::EEG, s::Integer, e::Integer; channels::Vector{String}=[""], spacing::AbstractFloat=1.5)
Plots EEG channels from epoch s
to epoch e
. Specific channels may be selected with the channels
karg. The spacing
argument is an added factor in the normalization of the EEG signals - the vertical distance between each signal in the plot grows proportionally to spacing
.
EEGToolkit.artifact_reject
— Functionartifact_reject(signal::TimeSeries, anom_matrix::Matrix)
An anomaly matrix $A \in \mathbb{N}^{n \times 2}$ is a matrix holds epoch-subepoch pairs that contain artifacts in a TimeSeries
. Each row of $(n, m)$ of $A$ denotes that the $n$th epoch contained an artifact within sub-epoch $m$.
This function takes a TimeSeries
signal and an anomaly matrix $A$. It removes from the signal the epoch-subepoch pairs containing artifacts. In particular, this function returns a Vector{Vector{T}}
with T<:AbstractFloat
. Thus, if result
holds the return value of this function, result[i]
contains what is left from the i
th epoch after removing its contaminated sub-epochs. It is possible that result[i]
is empty.
artifact_reject(signal::TimeSeries, anom_vector::Vector{Integer})
An anomaly vector $\vec{x} \in \mathbb{N}^{n}$ is a sorted vector whose values are those epochs in an TimeSeries
that contain anomalies or artifacts. This function segments the TimeSeries and filters out all epochs containing artifacts.
NREM Period detection
NREM period definition
Following Feinberg & Floyed and Dijk, a NREM period is a sequence of epochs satisfying the following conditions:
- It starts with stages 2, 3 or 4.
- It contains at least 15 minutes of stages 2, 3 or 4 in total.
- It ends with 5 or more minutes of REM, or with 5 or more minutes of wakefulness.
Epochs in the sequence are allowed to contain occurrences of REM sleep or wakefulness in between, as long as the duration of this occurrences is less than 5 minutes. But the epochs corresponding to these occurrences will not be part of the NREM period. For example, in a stage sequence of the form
... - 10m of stage two - 1m of REM - 5m of stage three - 5m of REM - ...
the NREM period consists of the first 10 minutes of stage 2 and the 5 minutes of stage 3, ignoring the 1 minute of REM in-between them.
Importantly, the restriction that ending REM periods must last at least 5 minutes is not imposed when detecting the first and the last NREM period in a night of sleep.
NREM detection algorithm
Let $n$ be the number of epochs corresponding to $15$ minutes and $m$ the number of epochs corresponding to $5$ minutes. (In 30 second epochs, $n = 30, m = 10$).
The algorithm assumes that the staging
field of an EEG
has been set to a vector $\vec{s}$ that contains only the strings $1, \ldots, 6, ?$ (with $5$ marking REM, $6$ wakefulness, $?$ unknown/unstaged).
The algorithm works by mapping $\vec{s}$ to $\alpha = s_1 \ldots s_q$ a word over the language generated by $\Sigma = \{1, \ldots, 6, ?\}$.
Observe that the language $[(5+6)^*(2+3+4)^*]^*$ is partitioned into $U$ and $U’$, where $U$ is the set of words containing at least $n$ symbols $2, 3, 4$ where neither $5$ nor $6$ occur consecutively $m$ times. Then $\alpha$ can be decomposed into
\[\alpha = \psi_1 \phi_1 \psi_2 \phi_2 \ldots \psi_k \phi_k \psi_{k+1}\]
where $\phi_i = \varphi_i (5^m5^* + 6^m6^*)$ and $\varphi_i \in U$. Such a decomposition readily provides the number of NREM periods in the EEG (i.e. $k$). Furthermore, the epochs which comprise these periods are easily inferable from the decomposition.
EEGToolkit.nrem
— Functionnrem(staging::Vector, n::Integer=30, m::Integer=10)
Finds the k
underlying NREM periods in a staging vector. Returns a vector of vectors V
s.t. the i
th vector in V
contains the epochs which comprise the i
th NREM period. Thus, the length of V
is k
the number of NREM periods.
The staging
field of the EEG must have been set to a vector containing only the symbols 1, …, 6, ?
where 5
denotes REM, 6
denotes wakefulness, and ?
denotes unscored/unstaged.
nrem(eeg::EEG, n::Integer=30, m::Integer=10)
Finds the k
underlying NREM periods in the staging vector of an EEG.
Spindle detection
This package implements two spindle detection algorithms discussed in O'Reilly and Nielsen (2015). We give a brief overview of them here but refer to their original publications for further detail.
EEGToolkit.sigma_index
— Functionfunction sigma_index(x::Vector{<:AbstractFloat}, fs::Integer)
The $\sigma$-index algorithm (Huupponen et al., 2007) find abnormally high amplitude values in the spindle frequency band. Per each 1 second window of the EEG, it computes
- the maximum amplitude in the spindle frequency band, which we call $S_{max}$
- the average amplitude in the low alpha and theta frequencies, which we call
$\alpha_{mean}, \theta_{mean}$
- the maximum alpha amplitude $\alpha_{max}$
The $\sigma$-index of each window is defined to be zero if $\alpha_max > S_{max}$, and otherwise
\[f(S_{max}, \alpha_{mean}, \phi_{mean}) = \frac{2S_{max}}{ \alpha_{mean} + \theta_{mean} } \]
Higher values are indicative of a higher spindle probability. The rejection threshold recommended in the original paper is $\lambda = 4.5$.
EEGToolkit.relative_spindle_power
— Functionrelative_spindle_power(x::Vector{<:AbstractFloat}, fs::Integer)
The Relative Spindle Power (RSP) algorithm (Devuyst et al., 2011) also detects abnormal values along the spindle frequency band. For every 1 second window, the amplitude spectrum $S(t)$ is computed, and the RSP is defined as
\[RSP(t) = \frac{\int_{11}^{16} S(t, f) df}{\int_{0.5}^{40} S(t, f) df}\]
This definition is more intelligible than the that of the $\sigma$-index, insofar as it represents the ratio of the total power in the spindle band with respect to the total power in the $[0.5, 40]$ frequency range. It is evident by definition that $0 \leq RSP \leq 1$. Higher values are indicative of a higher spindle probability (it should be clear that $RSP$ is not a probability itself). The rejection threshold recommended in the original paper is $\lambda = 0.22$.
Power spectral analysis
EEGToolkit.AmplitudeSpectrum
— TypeStructure for amplitude spectrum estimations. Estimations are by default one sided, with frequencies ranging from [0, fₛ/2].
The formula used is
\[\frac{2|H(f)|}{\sum_i w_i}\]
with $w_i$ a Hanning window.
Fields
freq::Vector{<:AbstractFloat}
: Frequency range of the spectrumspectrum::Vector{<:AbstractFloat}
: Estimated spectral amplitudeformula::String
: A string representation of the formula used for the estimation.
Constructors
AmplitudeSpectrum(x::Vector{<:AbstractFloat}, sampling_rate::Integer, pad::Integer)
: Computes a direct PSD over a signal x
with a given sampling_rate
.
EEGToolkit.PSD
— TypeStructure for PSD estimations. Estimations are by default one sided, with frequencies ranging from [0, fₛ/2].
The default formula is
\[\frac{2|H(f)|^2}{\zeta \sum_i w_i^2}\]
with $w_i$ a Hanning window and $\zeta$ a normalization factor which defaults to $1$.
Barlett or Welch's mehtod can be used, where the formula becomes
\[\frac{1}{M \varphi} \sum_i^M \left[ \frac{2|H_i(f)|^2}{ \sum_i w_i^2} \right]\]
where $w_1, \ldots, w_n$ a Hanning window, $M$ the number of segments, $H_i(f)$ the FFT of the $i$th segment of the signal, and $\varphi$ a normalization factor defaulting to 2 * seg_length
.
Fields
freq::Vector{<:AbstractFloat}
: Frequency range of the spectrumspectrum::Vector{<:AbstractFloat}
: Estimated spectral density in dB.method::String
: Estimation method usedformula::String
: A string representation of the formula used for the estimation.
Constructors
PSD(x::Vector{<:AbstractFloat}, fs::Integer; pad::Integer=0, norm_factor=1, dB=false)
: Computes a direct PSD over a signalx
with sampling ratefs
. The signal may be padded to an optional lengthpad
. An optional normalization factornorm_factor
may be used. SetdB
to true to transform the spectrum to decibels.PSD(x::Vector, fs::Int, seg_length::Int; overlap::Union{ <:AbstractFloat, Integer }=0.5, normalization::Union{ <:AbstractFloat, Integer } = -1, pad::Integer=0, dB=false)
: Splits the signalx
into segments of lengthseg_length
with anoverlap
in [0, 1) (defaults to 0.5). The overlap is understood to be a fraction of the segment length. PSD is estimated within and averaged across all segments. The estimation is normalized with anormalization
that defaults to $2Mk'$, whereM
is the number of segments andk'
is the number of samples in each segment (i.e.seg_length
). Settingoverlap
to zero equates to using Barlett's method. Settingoverlap
greater than zero equates to using Welch's method.PSD(ts::TimeSeries; kargs...)
: Wrapper to apply the first constructor to a TimeSeries signal.PSD(ts::TimeSeries, seg_length::Integer; kargs...)
: Wrapper to apply the second constructor (Welch or Barlett's method) to a TimeSeries signal.
EEGToolkit.plot_psd
— Functionplot_psd(psd::PSD; freq_lim=30.0)
Plot a PSD with x-axis being frequency and y-axis being estimated power spectrum.
EEGToolkit.Spectrogram
— TypeA spectrogram is a matrix $S^{M \times F}$ where $M$ is the number of windows in the windowing of a signal and $F$ is the length of the spectrum vector in any given window (i.e. the frequency resolution). It is useful to observe spectral changes in time or to compute the spectrum of time-regions of interest (e.g. only NREM periods in a sleep EEG). The information available in direct PSD can be inferred from the spectrogram with ease.
For instance, let $f_1, f_2, \ldots, f_k$ be a strictly increasing sequence of frequencies. Assume these frequencies correspond to the column indexes $c_1, c_2, \ldots, c_k$ of $S$. Then the mean power in the frequency range $[f_1, f_k]$ is
\[\frac{1}{M} \sum_{i=1}^{M}\left[\frac{1}{c_k - c_1}\sum_{j=c_1}^{c_k} S_{ij}\right] = \frac{1}{M\big(c_k - c_1\big)}\sum_{i=1}^{M}\sum_{j=c_1}^{c_k} S_{ij}\]
In this package, mean power in a frequency range is computed with the mean_band_power
function.
Fields
time::Vector
: Time domainfreq::Vector{<:AbstractFloat}
: Frequency domainspectrums::Matrix{<:AbstractFloat}
: Power spectrum. Rows are time and columns are frequency; the value inspectrums[row, freq]
is the power at time windowrow
for frequencyfreq
.segment_length::Integer
: Length of each segment in time.
Constructors
Spectrogram(segs::Vector{Vector{T}}, psd_function::Function; dB = false) where {T<:AbstractFloat}
: Given a sequence of windows $w_1, \ldots, w_k$ contained in thesegs
argument, computes the PSD within each window using a custompsd_function
.Spectrogram(signal::Vector{<:AbstractFloat}, window_length::Integer, psd_function::Function; overlap::Union{AbstractFloat, Integer}=0, dB=false)
: Splits a signal into (potentially overlapping) segments of lengthwindow_length
and computes theSpectrogram
over this windowing using the first constructor. A custompsd_function
is used within each window. Symmetry is enforced over the split signal, meaning that if the last segment is of length not equal to the rest, it is dropped. Thus, all windows are of equal length.function Spectrogram(ts::TimeSeries, window_length::Integer, psd_function::Function; kargs...)
: Wrapper constructor for aTimeSeries
object.
EEGToolkit.plot_spectrogram
— Functionplot_spectrogram(spec::Spectrogram, freq_lim::AbstractFloat=30.0, type::Int=1, color=:nipy_spectral)
Plots a spectogram spec
either in 2d (type = 1
) or 3d (type = 2
). An optional frequency limit (freq_lim
) may be set (defaults to 30Hz). The color palette color
may be set; defaults to nipy_spectral
.
EEGToolkit.freq_band
— Functionfreq_band(spec::Union{PSD,AmplitudeSpectrum}, lower::AbstractFloat, upper::AbstractFloat)
Given a PSD
or AmplitudeSpectrum
, returns a Vector{<:AbstractFloat}
with the powers within the frequency band [lower, upper]
.
freq_band(spec::Spectrogram, lower::AbstractFloat, upper::AbstractFloat, window::Integer)
Given a Spectrogram
, returns a Vector{<:AbstractFloat}
with the powers within a frequency band [lower, upper]
of a specific window (row of the spectrogram).
freq_band(spec::Spectrogram, lower::AbstractFloat, upper::AbstractFloat)
Given a Spectrogram
, returns a Matrix{<:AbstractFloat}
with the powers within a frequency band [lower, upper] across all time windows.
EEGToolkit.mean_band_power
— Functionfreq_band(spec::Spectrogram, lower::AbstractFloat, upper::AbstractFloat, window::Integer)
Given a `, returns the mean power in a given frequency band
[lower, upper]`. This function effectively computes
\[\frac{1}{M\big(c_k - c_1\big)}\sum_{i=1}^{M}\sum_{j=c_1}^{c_k} S_{ij}\]
freq_band(spec::Spectrogram, lower::AbstractFloat, upper::AbstractFloat, window::Integer)
Given a PSD
, returns the mean power in a given frequency band [lower, upper]
.
Helpers
EEGToolkit.next_power_of_two
— Functionnext_power_of_two(n::Int)
Given an integer $n$, finds the least $m = 2^k$ s.t. $m \geq n$.
EEGToolkit.zero_pad
— Functionzero_pad(v::Vector{T}, desired_length::Integer) where {T<:AbstractFloat}
Zero-pads a numeric vector v
to a desired_length
Examples
NREM delta power
This is an example script for computing the mean $\delta$ (delta) power in each of the NREM periods of a sleep EEG. We will use the C3 channel.
# First, import the package
using EEGToolkit
# Assuming we have the stage data in a .csv and we have some function
# to read CSVs (e.g. from the CSV package)
staging = some_function_to_read_csv("my_staging_data.csv")
# We read an EEG that has channels C3-A2 and F3-A1. We assume the CSV had a
# column called STAGES with the stages of each epoch.
eeg = EEG(edf_file, staging.STAGES)
# We extract the TimeSeries object corresponding to C3-A2
signal = eeg.signals["C3-A2"]
# Detect the NREM periods
nrems = nrem(eeg)
# Split the C3 signal into 30-second windows (not-overlapping).
epochs = segment(signal, signal.fs * 30)
# PSD function to be used within each window in the spectrograms
psd = x -> PSD(x, signal.fs, signal.fs * 5)
mean_delta_powers = []
for nrem_period in nrems
# Extract the portion of the signal corresponding to this NREM period
# This is a vector of vectors [vector_1, ..., vector_k], with the ith
# vector being the ith epoch in this NREM period.
nrem_epochs = epochs[nrem_period]
# Compute spectrogram with each window being an epoch of this nrem period.
spec = Spectrogram(nrem_epochs, nrem_signal.fs*30, psd)
# Compute mean power in delta band (0.5 to 3.9 Hz) from the spectrogram.
δ = mean_band_power(spec, 0.5, 3.9)
# Store the result in the mean_delta_powers list.
push!(mean_delta_powers, δ)
end
# Now the ith element in `mean_delta_powers` is the mean delta power
# of the ith NREM period.