Event Data Theory

event-data-model project.

Definitions

Event

An Event is a Context with an associated time interval.

Concretely, an Event is a wrapper around the interval-algebra package’s PairedInterval type:

newtype Event t m a= MkEvent ( PairedInterval (Context t m) a )

Context

A Context contains up to three types of information:

  1. A TagSet (required)

  2. Fact s about the event (required)

  3. Metadata on the source of the event (optional)

An example of a context is below.

data Context t m = MkContext
  { -- | the 'TagSet' of a @Context@
    getTagSet :: TagSet t (1)
    -- | the facts of a @Context@.
  , getFacts  :: m (2)
    -- | the 'Source' of @Context@
  , getSource :: Maybe Source (3)
  }
1 a TagSet;
2 Fact s about the event whose shape and possible values are determined by the Model type m;
3 optionally, information on the source of the event data.

Tags

A TagSet is a collection labels ("tags") that summarize the information in the data Model, as given by the Fact s of that model attached to a particular event. Each Event has an associated TagSet, which can contain multiple unique tags.

A Fact is a collection of metadata about an Event, which serves to classify for the purpose of building study cohorts and analyses. A Model for a given study defines which particular Fact s are allowed to bee used in that study.

Fact s can contain a lot of information. For example a Fact to denote a medical insurance claim could contain procedure codes, cost information and provider information.

The tags in a TagSet synthesize the Fact s associated with an Event to provide a kind of short-hand, useful when programming for or reasoning about the groups of Event s relevant a study.

For example, "diabetes_treatment" and "in_hospital" are possible tags that could be used to summarize a Fact with medical claim information. A tag might synthesize information across multiple Fact s in a single Model by, say, combining claims information with an ICD9 code to label an Event as "diabetes_diagnosis".

At present, tags are created by manually specifying how the Fact s of a Model should be summarized, typically using the helper functions in the notionate package in a project-specific concepts.dhall file.

Facts and Model

See the event-data-model documentation for a definition of Facts and Model. Note that Fact and Model defined there are types defined in the Dhall programming language. The programmer using asclepias must manually define in Haskell code a type m within the Context c m that corresponds to the Model for a given project.

Source

The Source type stores information on the origins of the data. This allows for increased traceability of the data from source to final analysis.

A Source type has the following definition:

---
data Source = MkSource
  { column   :: Maybe T.Text
  , file     :: Maybe T.Text
  , row      :: Maybe Integer
  , table    :: T.Text
  , database :: T.Text
  }
  deriving (Eq, Show, Generic)
---

column is the column name of the source data. file is the path of the source data. row is the row number of the source data. table is the name of the source data table. database refers to the name of the data source. For example "Optum" or "Medicaid"

Example

Here is an example of an Event:

data SillySchema =
    A Int
  | B Text
  | C
  | D
  deriving (Show, Eq, Generic, Data)

instance FromJSON SillySchema where
  parseJSON = genericParseJSON
    (defaultOptions
      { sumEncoding = TaggedObject { tagFieldName      = "domain"
                                   , contentsFieldName = "facts"
                                   }
      }
    )

type SillyEvent1 a = Event Text SillySchema a

The SillyEvent type is a project-specific synonym for an Event where the TagSet is Text.

SillySchema is the Model, a Haskell sum type, with a different Fact given by each of its possible value types

and the Interval type is any valid type a, such as Int.

This structure provides a high degree of flexibility in defining new structures for study-specific cohort definitions and analysis.

See the event-data-model documentation for details.