Event Data Theory
The event-data-theory package of asclepias provides the types and functions
for defining models of event data.
The terms "theory" and "model" are borrowed from the notion of a
Lawvere theory.
[1]
| Having a basic understanding how to read Haskell’s types will be useful in reading this document. Online resources such as Real World Haskell are excellent for learning. |
What is an event
Abstractly, we define an event as an object which contains information about
when something happened and
what something happened.
Concretely, an Event is a wrapper around the interval-algebra package’s
PairedInterval type:
newtype Event c m a = MkEvent ( PairedInterval (Context c m) a )
The what part of the pair is a Context d c,
which is described further below.
The when part is an Interval a.
The Interval type is described further in
the interval-algebra documentation.
You can find more information about intervals
and how to use them there.
Since an Event is an instance of the
Intervallic typeclass,
almost anything you can do with Interval types
you can also do with Event types.
Event Contexts
An event’s Context contains three types of information:
data Context c m = MkContext
{ -- | the 'Concepts' of a @Context@
getConcepts :: Concepts c (1)
-- | the facts of a @Context@.
, getFacts :: m (2)
-- | the 'Source' of @Context@
, getSource :: Maybe Source (3)
}
| 1 | a set of Concepts, or tags,
which can be used to identify events in a collection; |
| 2 | facts about the event whose shape and possible values
are determined by the schema type d; |
| 3 | (optionally) data about the provenance of the event in a Source object. |
Filling in and making the type parameters d and c concrete
is what creates a new event model.
The Concepts type c will generally be
an ennumerated set of tag variants
(such as data MyProjectTags = Diabetes | BirthDay | InHospital | ...)
or simply Text.
Ennumerated tags are preferred over Text
as users then have some type safety around concepts.
One cannot misspell a concept or use an undefined concept, for example.
The type parameter d provides
a high degree of flexibility in defining new event models.
The d type represents the schema, or shape,
of an event’s data and
can be a nearly arbitrary type
composed of sum and product types.
Often, the d type will be a sum type of "domains"
where each domain is a group of facts relevant to a given domain.
The schema of NoviSci’s standard
EDM
is organized around this idea.
Marshalling event data
Events are generally produced by some process outside of asclepias
that extracts and transforms a data source into a sequence of events.
NoviSci’s standard
EDM
represents event data
(plus additional extra information sometimes used in other applications)
as a JSON array,
where
each line
in a file is a valid EventLine.
The event-data-theory EventLine type corresponds to this JSON array
and is used as the primary way of marshalling data into an Event.
The EventDataTheory.EventLines module provides several utilities
for decoding events from eventlines.
The parseEventLinesL function, for example,
converts a ByteString of new-line delimed JSON
into a pair of [String] (containing any parse error messages)
and [(SubjectID, Event c m a)],
a list of Subject ID/event pairs.
Defining new models
New event models are defined by providing concrete types for Event c m a
(especially d and c),
as in this example from the package’s test suite:
data SillySchema =
A Int
| B Text
| C
| D
deriving (Show, Eq, Generic, Data)
instance FromJSON SillySchema where
parseJSON = genericParseJSON
(defaultOptions
{ sumEncoding = TaggedObject { tagFieldName = "domain"
, contentsFieldName = "facts"
}
}
)
type SillyEvent1 a = Event Text SillySchema a
The SillyEvent type is a synonym for an Event where
the concepts are Text,
the facts are of shape SillySchema,
and the Interval type is any valid type a.
Typeclasses for component types
The schema (d) type for an Event must an instance of
Eq, Show, Generic, and FromJSON typeclasses.
The
DeriveGeneric
language extension makes deriving the Generic instance trivial,
as in the code above.
At this time, users do need to provide the FromJSON instance,
and the boilerplate in the example above should work in most cases.
The concept (c) type for an Event must an instance of
Eq, Show, Typeable, and FromJSON typeclasses.
Making c Generic will also make it Typeable,
so in most cases simply deriving (Eq, Show, Generic)
and a stock FromJSON instance
is sufficient for the concept type.
Testing models
The event-data-theory packages provides a few utilities for testing
a new model.
These can be found in the EventDataTheory.Test module,
which is not included in the main set of exported modules.
The eventDecodeTests and eventDecodeFailTests functions, for example, test for
successful parsing and successful failed parsing (respectively)
of EventLine d c a
into the corresponding Event c m a.
These functions take a directory path as an argument.
Each file ending .jsonl in that directory should contain
a single EventLine as JSON
to be tested.
See the test directory and EventDataTheory.TheoryTest module
in this package for examples.