Event Data Theory

The terms "theory" and "model" are borrowed from the notion of a Lawvere theory. [1]

Definitions

Event

An Event is a Context (what happened) with an associated time interval (when it happened). Concretely, an Event is a wrapper around the interval-algebra package’s PairedInterval type:

newtype Event t m a = MkEvent ( PairedInterval (Context t m) a )

Context

A Context contains up to three types of information:

  1. A tag set (required)

  2. Facts about the event (required)

  3. Metadata on the source of the event (optional)

A tag is a set of labels that give meaning to the events of interest. For example, "diabetes diagnosis", "birth day", "in hospital" are all possible tags, that together might define the study tag set.

An example of a context is below.

data Context t m = MkContext
  { -- | the 'TagSet' of a @Context@
    getTagSet :: TagSet t (1)
    -- | the facts of a @Context@.  
  , getFacts  :: m (2)
    -- | the 'Source' of @Context@
  , getSource :: Maybe Source (3)
  }
1 a set of TagSet, or labels, which can be used to identify events in a collection;
2 facts about the event whose shape and possible values are determined by the schema type m;
3 (optionally) data about the provenance of the event in a Source object.

Facts

Facts are the data of interest for a particular event. The schema of the facts data is dynamic and is passed to the object as a parameter.

Source

Event Model

Passing in specific parameters m and c to Context creates a new event model.

An example of an event model is below.

data SillySchema =
    A Int
  | B Text
  | C
  | D
  deriving (Show, Eq, Generic, Data)

instance FromJSON SillySchema where
  parseJSON = genericParseJSON
    (defaultOptions
      { sumEncoding = TaggedObject { tagFieldName      = "domain"
                                   , contentsFieldName = "facts"
                                   }
      }
    )

type SillyEvent1 a = Event Text SillySchema a

The SillyEvent type is a synonym for an Event where the tag set is Text,

the facts are of shape SillySchema, and the Interval type is any valid type a.

The type parameter m provides a high degree of flexibility in defining new event models. The m type represents the schema, or shape, of an event’s data and can be a nearly arbitrary type composed of sum and product types. Often, the m type will be a sum type of "domains" where each domain is a group of facts relevant to a given domain. The schema of NoviSci’s standard EDM is organized around this idea.


1. We use the terms informally to give the sense that a model of events is an instance of the theory. We have not checked that the event data theory actually is a universal algebra.