Introduction

These documents describe the libraries and applications in the event-data-model repository.

These include the following:

fact-models

The fact-models directory contains tools for defining data models as Dhall types as well as Rust and Haskell libraries with tools for working with these data models in other applications.

dhall-codelist

A Dhall package containing types and utilities for working with codelists.

See TODO: link to docs.

Motivation

To understand the data models, it may help to have an intuition of how the data models fit into the event data theory. Though the theory may have application in many scientific domains, the origin of the event data theory is rooted in epidemiology study design. Time is an essential element in all epidemiological study designs. Often a study defines an index: the time from which a study subject is observed for outcomes of interest. An event contains on two bits of information: what occurred and when that what occurred. For example, in order to determine whether a subject should be flagged as having a covariate of diabetes diagnosis (the feature), we need to do know that a subject (a) was diagnosed with diabetes and (b) that the diagnosis occurred before the study’s index time.

Many schemas for health data standardize different types of what occurred into a relational database. You can think of each model in fact-models as a database schema. The event data theory works for any model. In essence, an event is simply an interval of time paired with a context, where the context contains information about what occurred in the interval. Specifically, a context carries facts about what occurred, concepts (or tags) based on the facts that tell us what the event means, and may also carry information about origin of the event from a source dataset. The models defined in fact-models can be used within an event’s context.

Key Definitions

Fact

A Fact is metadata attached to an event. Typically a Fact will allow for distinguishing one event from another for the purpose of classification and analysis. The only limitation is that it must be expressible as a Dhall type. For example, a Fact may contain other Fact s. See how to define a new Fact for details.

Model

A Model is a placeholder for some choice of one among a selection of Facts. It is implemented as a sum type in Dhall, with a different Fact for each 'variant', meaning possible value, of the Model. For example, a ClaimsModel can denote Demographic, or Death or any of the other Fact s listed among its possibilities. A Model should not contain other Model s. In that sense a Model is at the top level of a collection of Fact s, defining all possible Fact s that might be used as metadata for events in some particular application.

External Definitions

The following are defined outside the event-data-model collection of packages, in asclepias. Since they are frequently relevant to this documentation, their definitions are provided here as well.

Event

An Event is a Context with an associated time interval.

Concretely, an Event is a wrapper around the interval-algebra package’s PairedInterval type:

newtype Event t m a= MkEvent ( PairedInterval (Context t m) a )
Context

A Context contains up to three types of information:

  1. A TagSet (required)

  2. Fact s about the event (required)

  3. Metadata on the source of the event (optional)

An example of a context is below.

data Context t m = MkContext
  { -- | the 'TagSet' of a @Context@
    getTagSet :: TagSet t (1)
    -- | the facts of a @Context@.
  , getFacts  :: m (2)
    -- | the 'Source' of @Context@
  , getSource :: Maybe Source (3)
  }
1 a TagSet;
2 Fact s about the event whose shape and possible values are determined by the Model type m;
3 optionally, information on the source of the event data.