Introduction
These documents describe the libraries and applications
in the
event-data-model
repository.
These include the following:
fact-models
-
The
fact-models
directory contains tools for defining data models as Dhall types as well as Rust and Haskell libraries with tools for working with these data models in other applications. dhall-codelist
-
A Dhall package containing types and utilities for working with codelists.
See TODO: link to docs.
Motivation
To understand the data models, it may help to have an intuition of how the data models fit into the event data theory. Though the theory may have application in many scientific domains, the origin of the event data theory is rooted in epidemiology study design. Time is an essential element in all epidemiological study designs. Often a study defines an index: the time from which a study subject is observed for outcomes of interest. An event contains on two bits of information: what occurred and when that what occurred. For example, in order to determine whether a subject should be flagged as having a covariate of diabetes diagnosis (the feature), we need to do know that a subject (a) was diagnosed with diabetes and (b) that the diagnosis occurred before the study’s index time.
Many schemas for health data standardize
different types of what occurred into a relational database.
You can think of each model in fact-models
as a database schema.
The event data theory works for any model.
In essence, an event is simply an interval of time paired with a context,
where the context contains information about what occurred in the interval.
Specifically, a context carries facts about what occurred,
concepts (or tags) based on the facts that tell us what the event means,
and may also carry information about origin of the event from a source dataset.
The models defined in fact-models
can be used within an event’s context.
Key Definitions
- Fact
-
A
Fact
is metadata attached to an event. Typically aFact
will allow for distinguishing one event from another for the purpose of classification and analysis. The only limitation is that it must be expressible as a Dhall type. For example, aFact
may contain otherFact
s. See how to define a newFact
for details. - Model
-
A
Model
is a placeholder for some choice of one among a selection ofFacts
. It is implemented as a sum type in Dhall, with a differentFact
for each 'variant', meaning possible value, of theModel
. For example, aClaimsModel
can denoteDemographic
, orDeath
or any of the otherFact
s listed among its possibilities. AModel
should not contain otherModel
s. In that sense aModel
is at the top level of a collection ofFact
s, defining all possibleFact
s that might be used as metadata for events in some particular application.
External Definitions
The following are defined outside the event-data-model
collection of packages, in asclepias
. Since they are frequently relevant to this documentation, their definitions are provided here as well.
- Event
-
An
Event
is aContext
with an associated time interval.
Concretely, an Event
is a wrapper around the interval-algebra
package’s
PairedInterval
type:
newtype Event t m a= MkEvent ( PairedInterval (Context t m) a )
- Context
-
A
Context
contains up to three types of information:-
A
TagSet
(required) -
Fact
s about the event (required) -
Metadata on the source of the event (optional)
-
An example of a context is below.
data Context t m = MkContext
{ -- | the 'TagSet' of a @Context@
getTagSet :: TagSet t (1)
-- | the facts of a @Context@.
, getFacts :: m (2)
-- | the 'Source' of @Context@
, getSource :: Maybe Source (3)
}
1 | a TagSet ; |
2 | Fact s about the event whose shape and possible values
are determined by the Model type m ; |
3 | optionally, information on the source of the event data. |