Julia Reference
DatasetManager.TrialConditions
— TypeTrialConditions(conditions, labels; [required, types, defaults, subject_fmt])
Define the names of experimental conditions
(aka factors) and the possible labels
within each condition. Conditions are determined from the absolute path of potential sources.
subject
is a reserved condition name for the unique identifier (ID) of individual subjects/participants in the dataset. If :subject
is not explicitly included in conditions
, it will be inserted at the beginning of conditions
. The format of the subject
identifier can be specified in labels
or using the keyword argument subject_fmt
.
Arguments
conditions
is a collection of condition names (eg(:medication, :dose)
) in the order they must appear in the file paths of trial sourceslabels
must have a key-value pair for each condition. The value(s) for each key define the acceptable labels for each condition. Levels may be defined using a:- String
- Regex
old => transf [=> new]
, whereold
may be a Regex or one/multiple String(s), wheretransf
may be aString
,Function
, or aSubstitutionString
(only ifold
is a Regex), and wherenew
is a Regex- Array of any combination of the preceding.
labels
which are not included inconditions
will be ignored.
Keyword arguments
required=conditions
: The conditions which every trial is required to have.types=Dict(conditions .=> String)
: The types that each condition should be parsed asdefaults=Dict{Symbol,Any}()
: Default conditions to set when a given condition is not matched. Defaults can given for required conditions. If a condition is not required, has no default, and is not matched, it will not be included as a condition for a source.subject_fmt=r"Subject (?<subject>\d+)?"
: The Regex pattern used to match the trial's subject ID. If:subject
is present inlabels
, that definition will take precedence.
Examples
julia> labels = Dict(
:subject => r"(?<=Patient )\d+",
:group => ["Placebo" => "Control", "Group A", "Group B"],
:posture => r"(sit|stand)"i => lowercase,
:cue => r"cue[-_](fast|slow)" => s"\\1 cue" => r"(fast|slow) cue");
julia> conds = TrialConditions((:subject,:group,:posture,:cue), labels;
types=Dict(:subject => Int));
DatasetManager.DataSubset
— TypeDataSubset(name, source::Union{Function,<:AbstractSource}, dir, pattern; [dependent=false])
Describes a subset of source
data files found within a folder dir
which match pattern
(using glob syntax). The name
of the DataSubset will be used in findtrials
as the source name in a Trial.
Some sources described by a DataSubset may not be relevant as standalone/independent Trials (e.g. maximal voluntary contraction "trials", when collecting EMG data, are typically only relevant to movement trials for that specific subject/session of a data collection, but are not useful on their own). Dependent sources (eg dependent=true
) will not create new trials in findtrials
and will only be added to pre-existing trials when the required conditions and a "condition" with the same name as the DataSubset's name
exists. The matched "condition" will be used in findtrials!
as the source name in corresponding Trials.
If source
is a function, it must accept a file path and return a Source.
See also: Source
, TrialConditions
, findtrials
, findtrials!
Examples
julia> DataSubset("events", Source{Events}, "/path/to/subset", "Subject [0-9]*/events/*.tsv")
DataSubset("events", Source{Events}, "/path/to/subset", "Subject [0-9]*/events/*.tsv")
DataSubsets for dependent sources
julia> labels = Dict(
:subject => r"(?<=Patient )\d+",
:session => r"(?<=Session )\d+",
:mvic => r"mvic_[rl](bic|tric)", # Defines possible MVIC "trial" names
);
julia> # Only :subject and :session are required conditions (for matching existing trials)
julia> conds = TrialConditions((:subject,:session,:mvic), labels; required=(:subject,:session,));
julia> # Note the DataSubset name matches the "condition" name in `labels`
julia> subsets = [
DataSubset("mvic", Source{C3DFile}, c3dpath, "Subject [0-9]*/Session [0-2]/*.c3d"; dependent=true)
];
julia> findtrials!(trials, subsets, conds)
DatasetManager.Trial
— TypeTrial{ID}(subject::ID, name::String, [conditions::Dict{Symbol}, sources::Dict{String}])
Characterizes a single instance of data collected from a specific subject
. The Trial has a name
, and may have one or more conditions
which describe experimental conditions and/or subject specific charateristics which are relevant to subsequent analyses. A Trial may have one or more complementary sources
of data (e.g. simultaneous recordings from separate equipment stored in separate files, supplementary data for a primary data source, etc).
Examples
julia> trial1 = Trial(1, "baseline", Dict(:group => "control", :session => 2))
Trial{Int64}
Subject: 1
Name: baseline
Conditions:
:group => "control"
:session => 2
No sources
DatasetManager.subject
— Functionsubject(trial::Trial{ID}) -> ID
subject(seg::Union{Segment,SegmentResult}) -> ID
Get the subject identifier of a Trial
, Segment
, or SegmentResult
.
DatasetManager.conditions
— Functionconditions(trial::Trial{ID}) -> Dict{Symbol,Any}
conditions(seg::Union{Segment,SegmentResult}) -> Dict{Symbol}
Get the conditions of a Trial
, Segment
, or SegmentResult
.
DatasetManager.sources
— Functionsources(trial::Trial{ID}) -> Dict{String,AbstractSource}
Get the sources for trial
DatasetManager.hassubject
— Functionhassubject(trial, sub) -> Bool
Test if the subject ID for trial
is equal to sub
hassubject(sub) -> Bool
Create a function that tests if the subject ID of a trial is equal to sub
, i.e. a function equivalent to t -> hassubject(t, sub)
.
DatasetManager.hassource
— Functionhassource(trial, src::String) -> Bool
hassource(trial, srctype::S) where {S<:AbstractSource} -> Bool
hassource(trial, src::Regex) -> Bool
Check if trial
has a source with key or type matching src
.
Examples
julia> trial1 = Trial(1, "baseline", Dict(), Dict("model" => Source{Nothing}()));
julia> hassource(trial1, "model")
true
julia> hassource(trial1, Source{Nothing})
true
julia> hassource(trial1, r"test*")
false
hassource(src) -> Bool
Create a function that tests if a trial has the source src
, i.e. a function equivalent to t -> hassource(t, src)
.
Examples
julia> trial1 = Trial(1, "baseline", Dict(), Dict("model" => Source{Nothing}()));
julia> trial2 = Trial(2, "baseline");
julia> filter(hassource("model"), [trial1, trial2])
1-element Vector{Trial{Int64}}:
Trial(1, "baseline", 0 conditions, 1 source)
DatasetManager.hascondition
— Functionhascondition(trial, (condition [=> value])...) -> Bool
Test if trial
has condition
, or that condition
matches value
. Specifying value
is optional. Multiple conditions and/or condition pairs can be given which all must be true to match. value
can be a single level, multiple acceptable levels, or a predicate function.
Examples
julia> trial = Trial(1, "baseline", Dict(:group => "control", :session => 2));
julia> hascondition(trial, :group)
true
julia> hascondition(trial, :group => "A")
false
julia> hascondition(trial, :group => ["control", "A"])
true
julia> hascondition(trial, :group => "A", :session => 1)
false
julia> hascondition(trial, :group => ["control", "A"], :session => >=(2))
true
hascondition((condition => value)...) -> Bool
Create a function that tests if a trial has the given condition
(s)/value
(s), i.e. a function equivalent to t -> hascondition(t, conditions...)
.
Examples
julia> trial1 = Trial(1, "baseline", Dict(:group => "control", :session => 2));
julia> trial2 = Trial(2, "baseline", Dict(:group => "A", :session => 1));
julia> filter(hascondition(:group => "A"), [trial1, trial2])
1-element Vector{Trial{Int64}}:
Trial(2, "baseline", 2 conditions, 0 sources)
DatasetManager.getsource
— Functiongetsource(trial, name::String) -> Source
getsource(trial, src::S) where {S<:AbstractSource} -> Source
getsource(trial, name::String => src::Type{<:AbstractSource}) -> Source
getsource(trial, pattern::Regex) -> Vector{Source}
Return a source from trial
with the requested name
or src
. When the both name
and src
are given as a pair, a source with name
will be searched for first, and if not found, a source of type src
will be searched for. When src
(as an <:AbstractSource
) is given, only a source of type src
may be present, otherwise an error will be thrown.
If a Regex pattern
is given, multiple sources may be returned.
DatasetManager.findtrials
— Functionfindtrials(subsets, conditions; <keyword arguments>) -> Vector{Trial}
Find all the trials matching conditions
which can be found in subsets
.
Keyword arguments:
ignorefiles::Union{Nothing, Vector{String}}=nothing
: A list of files, given in the form of an absolute path, that are in any of thesubsets
folders which are to be ignored.debug=false
: Show files that did not match (all) the required conditionsverbose=false
: Show files that did match all required conditions whendebug=true
maxlogs=50
: Maximum number of files per subset to show when debugging
See also: Trial
, findtrials!
, DataSubset
, TrialConditions
DatasetManager.findtrials!
— Functionfindtrials!(trials, subsets, conditions; <keyword arguments>) -> Vector{Trial}
Find more trials and/or additional sources for existing trials.
For DataSubsets in subsets
which are dependent, candidate source files must have the required conditions and have a "condition" matching the DataSubset name.
See also: findtrials
, Trial
, DataSubset
, TrialConditions
DatasetManager.summarize
— Functionsummarize([io,] trials; [verbosity=5, ignoreconditions])
Summarize a vector of Trial
s.
Examples
julia> summarize(trials)
Subjects:
└ 10: "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
Trials:
├ 40 trials
└ Trials per subject:
└ 4: 10 subjects (100%)
Conditions:
├ Observed levels:
│ ├ stim => ["placebo", "stim"]
│ └ session => [1, 2]
└ Unique level combinations observed: 4 (full factorial)
stim │ session │ # trials
─────────┼─────────┼──────────
placebo │ 1 │ 10
stim │ 1 │ 10
placebo │ 2 │ 10
stim │ 2 │ 10
Sources:
└ "events" => Source{GaitEvents}, 40 trials (100%)
DatasetManager.analyzedataset
— Functionanalyzedataset(f, trials, Type{<:AbstractSource}; [enable_progress, show_errors, threaded]) -> Vector{SegmentResult}
Map function f
over all trials
(multi-threaded by default) and return the SegmentResult
. If f
errors for a given trial, the SegmentResult
for that trial will be empty (no results), and the error will be rethrown along with the trial which caused the error.
Keyword arguments
enable_progress=true
: Enable the progress metershow_errors=true
: Show trials and their errorsthreaded=true
DatasetManager.stack
— Functionstack(rs::Vector{SegmentResult}, conds; [variables])
Compile the results into a stacked, long form DataFrame
DatasetManager.write_results
— Functionwrite_results(filename, df, conditions; [variables, format, archive])
Write the results in df
(generated by DatasetManager.stack
) to file at filename
. format
may be :wide
or :long
. If archive == true
, if filename
already exists, it will be moved to "$(filename).bak"
before writing the new results to filename
.
conditions
and variables
specifies which conditions or variables, respectively, should be included in the output.
DatasetManager.export_trials
— Functionexport_trials([f,] trials, dir[, sources])
Export (copy) sources
in trials
to outdir
. When left unspecified, sources
is set to all unique sources found in trials
. Optionally can be given a function f
, which must accept 2 arguments (a trial and a src which is of eltype(sources)
), to control the names of the exported data. The default behavior exports all sources to dir
with no subdirectories, using the naming schema "$trial.subject_$srcname_basename(sourcepath)" (pseudo-code).
Examples
julia> export_trials(trials, pwd()) do trial, source
"$(subject(trial))_$(conditions(trial)[:group]).$(srcext(source))"
end
Sources
DatasetManager.AbstractSource
— TypeAbstractSource
The abstract supertype for custom sources. Implement a subtype of AbstractSource
if Source{S}
is not sufficient for your source requirements (e.g. your source has additional information besides the path, such as encoding parameters, compression indicator, etc, that needs to be associated with each instance).
Extended help
AbstractSource
interface requirements
All subtypes of AbstractSource
must:
- have a
path
field or extend thesourcepath
function. - have at least these two constructor methods:
- empty constructor
- single argument constructor accepting a string of an absolute path
AbstractSource
subtypes should:
- have a
readsource
method
AbstractSource
subtypes may implement these additional methods to improve user experience and/or enable additional functionality:
readsegment
generatesource
(if enablingrequiresource!
generation)dependencies
(if defining ageneratesource
method)srcext
srcname_default
DatasetManager.Source
— TypeSource{S}([path])
A standard/default source, where S
can be an existing type or a singleton type defined for dispatch purposes. When path
is omitted, a unique temporary file path is used.
Examples
julia> Source{Missing}()
Source{Missing}("/tmp/jl_ruMeMy")
julia> struct RandomDev; end
julia> Source{RandomDev}("/dev/urandom")
Source{RandomDev}("/dev/urandom")
DatasetManager.readsource
— Functionreadsource(src::S; kwargs...) where {S <: AbstractSource}
Read the source data from file.
DatasetManager.sourcepath
— Functionsourcepath(src) -> String
Return the absolute path to the src
.
DatasetManager.requiresource!
— Functionrequiresource!(trial, name::String; [force, deps, kwargs...]) -> nothing
requiresource!(trial, name::Regex; <keyword arguments>) -> nothing
requiresource!(trial, src::S; <keyword arguments>) where {S<:AbstractSource} -> nothing
requiresource!(trial, ::Type{S}; <keyword arguments>) where {S<:AbstractSource} -> nothing
requiresource!(trial, name::String => src; <keyword arguments>) -> nothing
Require src
to be exist for trial
. Generate src
if it does not exist, or throw an error if src
is not present and cannot be generated. When src
is a Regex
, deps
is constrained to UnknownDeps
, regardless of the keyword argument. Unused keyword arguments will be passed on to the generatesource
method for src
.
Keyword arguments
force=false
: Force generatingsrc
, even if it already existsdeps=dependencies(src)
:deps
can be manually specified as aname => src::AbstractSource
pair per dependency if there exist multiple sources that would match a dependent source type.kwargs...
are passed on to thegeneratesource
function forsrc
.
Examples
julia> dependencies(Source{Events})
(Source{C3DFile},)
julia> requiresource!(trial, Source{Events})
julia> requiresource!(trial, Source{Events}; force=true, deps=("mainc3d" => Source{C3DFile}))
DatasetManager.generatesource
— Functiongeneratesource(trial, src::Union{S,Type{S}}, deps; kwargs...) where S <: AbstractSource -> newsrc::typeof(src)
Generate source src
using dependent sources deps
from trial
. Returns a source of the same type as src
, but is not required to be exactly equal to src
(i.e. a different sourcepath(newsrc)
is acceptable). Keyword arguments named force
and deps
should not be used in generatesource
methods since they are used in requiresource!
and will not be passed on to the generatesource
method.
DatasetManager.dependencies
— Functiondependencies(src::Union{S,Type{S}}) where {S<:AbstractSource} -> Tuple{<:AbstractSource}
Get the sources that src
depends on to be generated by generatesource
. Defaults to UnknownDeps
, which prevents automatically generating src
.
Extended help
In some cases, it may be preferable to leave the dependencies for custom source undefined if e.g. the source is a container of some form that may contain data generated and depending an unknown number/type of sources.
DatasetManager.srcext
— Functionsrcext(src::Type{S}) where {S<:AbstractSource} -> String
srcext(src::S) where {S<:AbstractSource} -> String
Return the file extension for src
. If src
is an AbstractSource
subtype, the default extension for that subtype will be returned; if src
is an AbstractSource
instance, the returned extension will be the actual extension for that src
, regardless of whether it is the same as the default extension for that src
type.
Extended help
When extending this function for a custom source, only define a method for the source type, e.g. srcext(::Type{MySource})
. The period (.) should be included in the extension as the first letter.
DatasetManager.srcname_default
— Functionsrcname_default(src::Union{S,Type{S}}) where {S<:AbstractSource} -> String
Get the default name for a source of type S
Segments
DatasetManager.Segment
— TypeSegment(trial, src::Union{S,Type{S},String}; [start, finish, conditions])
Describes a portion of a source in trial
from time start
to finish
with segment specific conditions
, if applicable.
If src
is a String or AbstractSource, it must refer to a source that exists in trial
. start
and finish
are used in readsegment
to trim time from the beginning and/or end of the data read from src
. start
and finish
default to the beginning and end, respectively, of the source/trial. start
must be before finish
, but they are otherwise only validated during readsegment
.
Any conditions present in trial
will be merged into the conditions for the segment. Note: if src
is a <:AbstractSource
instance, it will not be added to the trial's sources.
Example:
julia> t = Trial(1, "intervention", Dict(:group => "control"), Dict("events" => Source{Events}("/path/to/file")));
julia> seg = Segment(t, "events")
Segment{Source{Events},Int64}
Trial: Trial(1, "intervention", 1 conditions, 1 source)
Source: Source{Events}("/path/to/file")
Time: beginning to end
Conditions: (same as parent trial)
:group => "control"
julia> seg = Segment(t, Source{Events}("/new/events/file"); start=0.0, finish=10.0, conditions=Dict(:stimulus => "sham"))
Segment{Source{Events},Int64}
Trial: Trial(1, "intervention", 1 conditions, 1 source)
Source: Source{Events}("/new/events/file")
Time: 0.0 to 10.0
Conditions:
:stimulus => "sham"
:group => "control"
DatasetManager.readsegment
— Functionreadsegment(seg::Segment{S}; warn=true, kwargs...) where S <: AbstractSource
If defined for S
, returns the portion of seg.source
from seg.start
to seg.finish
. Otherwise, equivalent to readsource
(i.e. no trimming of time-series occurs). Warns by default if the main method is called.
DatasetManager.SegmentResult
— TypeSegmentResult(segment::Segment, results::Dict{Symbol)
Contains the results of any analyses performed on the trial segment in a Dict
.
Example:
segresult = SegmentResult(seg, Dict(:avg => 3.5))
DatasetManager.trial
— Functiontrial(seg::Union{Segment,SegmentResult}) -> Trial
Return the parent Trial
for the given Segment
or SegmentResult
DatasetManager.segment
— FunctionGet the segment of a SegmentResult
DatasetManager.source
— Functionsource(seg::Union{Segment,SegmentResult}) -> AbstractSource
Return the source for the parent Trial
of the given Segment
or SegmentResult
DatasetManager.results
— FunctionGet the results of a SegmentResult
DatasetManager.resultsvariables
— Functionresultsvariables(sr::Union{SegmentResult,Vector{SegmentResult}})
Get the unique variables for SegmentResult
s.