Event Data Theory
event-data-model
project.
Definitions
Event
An Event
is a Context
with an associated time interval.
Concretely, an Event
is a wrapper around the interval-algebra
package’s
PairedInterval
type:
newtype Event t m a= MkEvent ( PairedInterval (Context t m) a )
Context
A Context
contains up to three types of information:
-
A
TagSet
(required) -
Fact
s about the event (required) -
Metadata on the source of the event (optional)
An example of a context is below.
data Context t m = MkContext
{ -- | the 'TagSet' of a @Context@
getTagSet :: TagSet t (1)
-- | the facts of a @Context@.
, getFacts :: m (2)
-- | the 'Source' of @Context@
, getSource :: Maybe Source (3)
}
1 | a TagSet ; |
2 | Fact s about the event whose shape and possible values
are determined by the Model type m ; |
3 | optionally, information on the source of the event data. |
Tags
A TagSet
is a collection labels ("tags") that summarize the information in the data Model
, as given by the Fact
s of that model attached to a particular event. Each Event
has an associated TagSet
, which can contain multiple unique tags.
A Fact
is a collection of metadata about an Event
, which serves to classify for the purpose of building study cohorts and analyses. A Model
for a given study defines which particular Fact
s are allowed to bee used in that study.
Fact
s can contain a lot of information. For example a Fact
to denote a medical insurance claim could contain procedure codes, cost information and provider information.
The tags in a TagSet
synthesize the Fact
s associated with an Event
to provide a kind of short-hand, useful when programming for or reasoning about the groups of Event
s relevant a study.
For example, "diabetes_treatment" and "in_hospital" are possible tags that could be used to summarize a Fact
with medical claim information. A tag might synthesize information across multiple Fact
s in a single Model
by, say, combining claims information with an ICD9
code to label an Event
as "diabetes_diagnosis".
At present, tags are created by manually specifying how the Fact
s of a Model
should be summarized, typically using the helper functions in the notionate
package in a project-specific concepts.dhall
file.
Facts and Model
See the event-data-model
documentation for a definition of Facts
and Model
. Note that Fact
and Model
defined there are types defined in the Dhall programming language. The programmer using asclepias
must manually define in Haskell code a type m
within the Context c m
that corresponds to the Model
for a given project.
Source
The Source
type stores information on the origins of the data.
This allows for increased traceability of the data from source to final analysis.
A Source
type has the following definition:
---
data Source = MkSource
{ column :: Maybe T.Text
, file :: Maybe T.Text
, row :: Maybe Integer
, table :: T.Text
, database :: T.Text
}
deriving (Eq, Show, Generic)
---
column
is the column name of the source data.
file
is the path of the source data.
row
is the row number of the source data.
table
is the name of the source data table.
database
refers to the name of the data source.
For example "Optum" or "Medicaid"
Example
Here is an example of an Event
:
data SillySchema =
A Int
| B Text
| C
| D
deriving (Show, Eq, Generic, Data)
instance FromJSON SillySchema where
parseJSON = genericParseJSON
(defaultOptions
{ sumEncoding = TaggedObject { tagFieldName = "domain"
, contentsFieldName = "facts"
}
}
)
type SillyEvent1 a = Event Text SillySchema a
The SillyEvent
type is a project-specific synonym for an Event
where
the TagSet
is Text
.
SillySchema
is the Model
, a Haskell sum type, with a different Fact
given by each of its possible value types
and the Interval
type is any valid type a
, such as Int
.
This structure provides a high degree of flexibility in defining new structures for study-specific cohort definitions and analysis.
See the
event-data-model
documentation for details.