Event Data Theory
The event-data-theory
package of asclepias
provides the types and functions
for defining models of event data.
The terms "theory" and "model" are borrowed from the notion of a
Lawvere theory.
[1]
Having a basic understanding how to read Haskell’s types will be useful in reading this document. Online resources such as Real World Haskell are excellent for learning. |
What is an event
Abstractly, we define an event as an object which contains information about
when something happened and
what something happened.
Concretely, an Event
is a wrapper around the interval-algebra
package’s
PairedInterval
type:
newtype Event c m a = MkEvent ( PairedInterval (Context c m) a )
The what part of the pair is a Context d c
,
which is described further below.
The when part is an Interval a
.
The Interval
type is described further in
the interval-algebra
documentation.
You can find more information about intervals
and how to use them there.
Since an Event
is an instance of the
Intervallic
typeclass,
almost anything you can do with Interval
types
you can also do with Event
types.
Event Contexts
An event’s Context
contains three types of information:
data Context c m = MkContext
{ -- | the 'Concepts' of a @Context@
getConcepts :: Concepts c (1)
-- | the facts of a @Context@.
, getFacts :: m (2)
-- | the 'Source' of @Context@
, getSource :: Maybe Source (3)
}
1 | a set of Concepts , or tags,
which can be used to identify events in a collection; |
2 | facts about the event whose shape and possible values
are determined by the schema type d ; |
3 | (optionally) data about the provenance of the event in a Source object. |
Filling in and making the type parameters d
and c
concrete
is what creates a new event model.
The Concepts
type c
will generally be
an ennumerated set of tag variants
(such as data MyProjectTags = Diabetes | BirthDay | InHospital | ...
)
or simply Text
.
Ennumerated tags are preferred over Text
as users then have some type safety around concepts.
One cannot misspell a concept or use an undefined concept, for example.
The type parameter d
provides
a high degree of flexibility in defining new event models.
The d
type represents the schema, or shape,
of an event’s data and
can be a nearly arbitrary type
composed of sum and product types.
Often, the d
type will be a sum type of "domains"
where each domain is a group of facts relevant to a given domain.
The schema of NoviSci’s standard
EDM
is organized around this idea.
Marshalling event data
Events are generally produced by some process outside of asclepias
that extracts and transforms a data source into a sequence of events.
NoviSci’s standard
EDM
represents event data
(plus additional extra information sometimes used in other applications)
as a JSON array,
where
each line
in a file is a valid EventLine
.
The event-data-theory
EventLine
type corresponds to this JSON array
and is used as the primary way of marshalling data into an Event
.
The EventDataTheory.EventLines
module provides several utilities
for decoding events from eventlines.
The parseEventLinesL
function, for example,
converts a ByteString
of new-line delimed JSON
into a pair of [String]
(containing any parse error messages)
and [(SubjectID, Event c m a)]
,
a list of Subject ID/event pairs.
Defining new models
New event models are defined by providing concrete types for Event c m a
(especially d
and c
),
as in this example from the package’s test suite:
data SillySchema =
A Int
| B Text
| C
| D
deriving (Show, Eq, Generic, Data)
instance FromJSON SillySchema where
parseJSON = genericParseJSON
(defaultOptions
{ sumEncoding = TaggedObject { tagFieldName = "domain"
, contentsFieldName = "facts"
}
}
)
type SillyEvent1 a = Event Text SillySchema a
The SillyEvent
type is a synonym for an Event
where
the concepts are Text
,
the facts are of shape SillySchema
,
and the Interval
type is any valid type a
.
Typeclasses for component types
The schema (d
) type for an Event
must an instance of
Eq
, Show
, Generic
, and FromJSON
typeclasses.
The
DeriveGeneric
language extension makes deriving the Generic
instance trivial,
as in the code above.
At this time, users do need to provide the FromJSON
instance,
and the boilerplate in the example above should work in most cases.
The concept (c
) type for an Event
must an instance of
Eq
, Show
, Typeable
, and FromJSON
typeclasses.
Making c
Generic
will also make it Typeable
,
so in most cases simply deriving (Eq, Show, Generic)
and a stock FromJSON
instance
is sufficient for the concept type.
Testing models
The event-data-theory
packages provides a few utilities for testing
a new model.
These can be found in the EventDataTheory.Test
module,
which is not included in the main set of exported modules.
The eventDecodeTests
and eventDecodeFailTests
functions, for example, test for
successful parsing and successful failed parsing (respectively)
of EventLine d c a
into the corresponding Event c m a
.
These functions take a directory path as an argument.
Each file ending .jsonl
in that directory should contain
a single EventLine
as JSON
to be tested.
See the test
directory and EventDataTheory.TheoryTest
module
in this package for examples.