This document contains specifications for the event data model (EDM), a data format used by applications in NoviSci’s data pipelines.
Caution
|
This document is under development |
EDM motivation and basics
To understand the EDM schema, it may help to have an intuition of its rationale. Though the model may have application in many scientific domains, the origin of the EDM is rooted in epidemiology study design. Time is an essential element in all epidemiological study designs. Often a study defines an index: the time from which a study subject is observed for outcomes of interest. Given the primacy of time in epidemiology, all data in the EDM has an an associated time. The EDM focuses on two bits of information: what occurred and when that what occurred. For example, in order to determine whether a subject should be flagged as having a covariate of diabetes diagnosis (the feature), we need to do know that a subject (a) was diagnosed with diabetes and (b) that the diagnosis occurred before the study’s index time.
Many schemas for health data standardize different types of what occurred into different tables. The EDM organizes the same information into a single structure: an event. In essence, an event is simply an interval of time paired with a context, where the context contains information about what occurred in the interval. Specifically, a context carries facts about what occurred, concepts (or tags) based on the facts that tell us what the event means, and may also carry information about origin of the event from a source dataset. The "shape" of the facts are determined by a domain; that is, a domain determines which facts the event may contain.
This document specifies facts and domains as well as a JSON format for transferring events between applications.
Examples of applications and/or libraries that use the EDM include:
Note
|
Applications may internally represent events in a different structure than described herein. |
Properties of Facts
-
May contain both product types and sum types.
-
Must not be recursive.
Tip
|
While there is no limit on the nesting of types, a fact’s information content is meant to be small. |
As an example, the following is a valid fact type (written in Dhall):
< foo : <this | that | other >
| bar : { it : Text, has : < requirements : { are : Text } | norequirements > }
| baz : Natural
>
The bar
variant in this example contains multiple levels. Nesting beyond 1 or 2 levels is discouraged. The example above demonstrates a top-level sum-type fact; a top-level product-type fact might look like:
{ foo : <this | that | other >
, bar : { it : Text, has : < requirements : { are : Text } | norequirements > }
, baz : Natural
}
The difference between sum and product types in this case being that the information content of a sum-type value would contain exactly one of foo
, bar
, or baz
, while value of the product-type would contain all three.
Properties of Domains
Domains collect relevant facts into a shared group and represented as a product type (i.e. key/value pair) of facts.
Organizing principles
The specifications are organized around the following principles:
Dhall types are the source of shapes
Dhall types are designed so that there is an idiomatic representation in statically-typed language with algebraic data types. For example, the code
fact’s type is:
{ code : Text, codebook : (../enum/codebook.dhall).Type }
- Rust
-
enum Codebook { ICD9 , ICD10 // etc } struct Code { code : String, codebook : Option<Codebook> }
- Haskell
-
data Codebook = ICD9 | ICD10 -- etc data Code = Code { code :: Text , codebook :: Maybe Codebook }
The type does not specify valid strings for the code
slot.
JSON schema is maintained as a best faith representation
A JSON Schema representation is kept "alongside" the Dhall. Considering the code
fact again, code.dhall
contains:
let fact = ../utils/fact.dhall
in { Type = { code : Text, codebook : Optional (../enum/codebook.dhall).Type } (1)
, jsonSchema = (2)
fact.makeJSONSchema
(Some [ "code" ])
( toMap
{ code = fact.property.stringPattern "^[A-Za-z0-9]+\$"
, codebook =
fact.property.enum (../enum/codebook.dhall).jsonSchema
}
)
}
-
The code type.
-
The
fact.makeJSONSchema
function returns the corresponding JSON Schema that corresponds to theType
, so that runningdhall-to-json <<< '(./schema/fact/code.dhall).jsonSchema'
returns:
{
"additionalProperties": false,
"properties": {
"code": {
"pattern": "^[A-Za-z0-9]+$",
"type": "string"
},
"codebook": {
"enum": [ ... ],
"type": "string"
}
},
"required": [
"code"
],
"type": "object"
}
Note
|
At this time, a Dhall-to-JSON Schema application or library does not exist, which means the Dhall Type and jsonSchema are manually kept in sync. Humans make errors, so if you find a difference between the Dhall Type and JSON Schema, please open an issue.
|
Versioning
Versioning of the EDM loosely follows semantic versioning (MAJOR.MINOR), where you increment:
MAJOR version when you make incompatible API changes,
MINOR version when you add functionality in a backwards compatible manner
In particular, adding new facts or domains typically only requires a MINOR update, while changing the shape of existing facts or domains should trigger a MAJOR update. Unlike semantic versioning 2.0.0, the EDM does not have patch version.
Schema
Facts
Cost
The cost fact is used to capture cost information about an event.
- Dhall
-
{ allowed : Optional < Double | Text > , category : Optional Text , charge : Optional < Double | Text > , cost : Optional < Double | Text > , description : Optional Text , transaction : Optional Text }
- JSON Schema
-
additionalProperties: false properties: allowed: oneOf: - type: string - type: number - type: "null" category: type: string charge: oneOf: - type: string - type: number - type: "null" cost: oneOf: - type: string - type: number - type: "null" description: type: string transaction: type: string type: object
Claim
The claim fact is used to capture claim information about an event. This would most commonly be used in insurance claim data.
- Dhall
-
{ id : Text , index : Optional Integer , procedure : Optional Text , type : Optional Text }
- JSON Schema
-
additionalProperties: false properties: id: type: string index: oneOf: - type: integer - type: "null" procedure: oneOf: - type: string - type: "null" type: oneOf: - type: string - type: "null" required: - id type: object
Plan
The plan fact determines the health care plan under which an event occurred. The fields are based on fields commonly found in insurance claim data.
- Dhall
-
{ benefit : < List : List Text | Singleton : Text > , exchange : < Group | IndFederal | IndState | Medicaid | Medicare | None | ThirdParty | Unknown > , group_id : Optional Text , plan_id : Optional Text , subscriber_id : Optional Text , subscriber_relationship : Optional Text }
- JSON Schema
-
additionalProperties: false properties: benefit: oneOf: - type: string - items: type: string type: array exchange: enum: - Group - IndFederal - IndState - Medicaid - Medicare - None - ThirdParty - Unknown type: string group_id: type: string plan_id: type: string subscriber_id: type: string subscriber_relationship: type: string required: - exchange type: object
labValue
The labValue fact is either a text representation of a lab value or a numeric value.
Note
|
The code identifying the kind of lab would be put in an domain’s code fact. |
- Dhall
-
< labNumber : { number : Double, units : Optional Text } | labText : { text : Text, units : Optional Text } >
- JSON Schema
-
additionalProperties: false oneOf: - required: - text - required: - number properties: number: type: number text: type: string units: type: string type: object
Demo
The demo fact captures demographic information.
- Dhall
-
{ field : < BirthDate | BirthYear | County | CountyFIPS | Ethnicity | Gender | GeoAdiNatRank | GeoAdiStateRank | GeoPctAmIndian | GeoPctAsian | GeoPctBlack | GeoPctHispanic | GeoPctMutli | GeoPctOther | GeoPctWhite | GeoType | Race | RaceCodes | Region | State | UrbanRural | Zipcode > , info : < List : List (Optional Text) | Singleton : Optional Text > }
- JSON Schema
-
additionalProperties: false properties: field: enum: - BirthDate - BirthYear - County - CountyFIPS - Ethnicity - Gender - GeoAdiNatRank - GeoAdiStateRank - GeoPctAmIndian - GeoPctAsian - GeoPctBlack - GeoPctHispanic - GeoPctMutli - GeoPctOther - GeoPctWhite - GeoType - Race - RaceCodes - Region - State - UrbanRural - Zipcode type: string info: oneOf: - oneOf: - type: "null" - type: string - items: oneOf: - type: "null" - type: string type: array required: - field - info type: object
Code
The code fact captures information about an event’s code, such as a ICD9/10 diagnosis code or NDC medication code.
- Dhall
-
{ code : Text , codebook : < CDT | CPT | HCPCS | ICD10 | ICD10PC | ICD9 | ICD9PC | LOINC | NABSP | NDC | NDC9 | UB92 | US_STATE | medicaid_cat > }
- JSON Schema
-
additionalProperties: false properties: code: pattern: "^[A-Za-z0-9]+$" type: string codebook: enum: - CDT - CPT - HCPCS - ICD10 - ICD10PC - ICD9 - ICD9PC - LOINC - NABSP - NDC - NDC9 - UB92 - US_STATE - medicaid_cat type: string required: - code - codebook type: object
Provider
The provider fact determines an event’s health care provider. The fields are based on the National Provider Identifier (NPI) schema.
-
provider_id
: NPI number (Required) -
provider_type
: individual or organization -
taxonomy
: a CMS taxonomy code
- Dhall
-
{ provider_id : Text, provider_type : Optional Text, taxonomy : Optional Text }
- JSON Schema
-
additionalProperties: false properties: provider_id: type: string provider_type: oneOf: - type: string - type: "null" taxonomy: oneOf: - type: string - type: "null" required: - provider_id type: object
Location
The location fact determines the location that an event occurred.
- Dhall
-
< Inpatient | Outpatient >
- JSON Schema
-
additionalProperties: false properties: location: enum: - Inpatient - Outpatient type: string type: object
Fill
The fill fact may be used to provide prescription fill information for medications.
Note
|
The code identifying the kind of medication would be put in a domain’s code fact. |
- Dhall
-
{ days_supply : Optional Integer , quantity : Optional Double , strength : Optional Text }
- JSON Schema
-
additionalProperties: false properties: days_supply: oneOf: - type: integer - type: "null" quantity: oneOf: - type: number - type: "null" strength: oneOf: - type: string - type: "null" type: object
Domains
Lab
The facts of the lab domain.
- Dhall
-
{ code : { code : Text , codebook : < CDT | CPT | HCPCS | ICD10 | ICD10PC | ICD9 | ICD9PC | LOINC | NABSP | NDC | NDC9 | UB92 | US_STATE | medicaid_cat > } , cost : Optional { allowed : Optional < Double | Text > , category : Optional Text , charge : Optional < Double | Text > , cost : Optional < Double | Text > , description : Optional Text , transaction : Optional Text } , location : Optional < Inpatient | Outpatient > , value : < labNumber : { number : Double, units : Optional Text } | labText : { text : Text, units : Optional Text } > }
- JSON Schema
-
additionalProperties: false properties: code: additionalProperties: false properties: code: pattern: "^[A-Za-z0-9]+$" type: string codebook: enum: - CDT - CPT - HCPCS - ICD10 - ICD10PC - ICD9 - ICD9PC - LOINC - NABSP - NDC - NDC9 - UB92 - US_STATE - medicaid_cat type: string required: - code - codebook type: object cost: additionalProperties: false properties: allowed: oneOf: - type: string - type: number - type: "null" category: type: string charge: oneOf: - type: string - type: number - type: "null" cost: oneOf: - type: string - type: number - type: "null" description: type: string transaction: type: string type: object location: additionalProperties: false properties: location: enum: - Inpatient - Outpatient type: string type: object value: additionalProperties: false oneOf: - required: - text - required: - number properties: number: type: number text: type: string units: type: string type: object required: - code - value type: object
Diagnosis
The facts of the diagnosis domain.
- Dhall
-
{ claim : Optional { id : Text , index : Optional Integer , procedure : Optional Text , type : Optional Text } , code : { code : Text , codebook : < CDT | CPT | HCPCS | ICD10 | ICD10PC | ICD9 | ICD9PC | LOINC | NABSP | NDC | NDC9 | UB92 | US_STATE | medicaid_cat > } , cost : Optional { allowed : Optional < Double | Text > , category : Optional Text , charge : Optional < Double | Text > , cost : Optional < Double | Text > , description : Optional Text , transaction : Optional Text } , location : Optional < Inpatient | Outpatient > , provider : Optional { provider_id : Text , provider_type : Optional Text , taxonomy : Optional Text } }
- JSON Schema
-
additionalProperties: false properties: claim: additionalProperties: false properties: id: type: string index: oneOf: - type: integer - type: "null" procedure: oneOf: - type: string - type: "null" type: oneOf: - type: string - type: "null" required: - id type: object code: additionalProperties: false properties: code: pattern: "^[A-Za-z0-9]+$" type: string codebook: enum: - CDT - CPT - HCPCS - ICD10 - ICD10PC - ICD9 - ICD9PC - LOINC - NABSP - NDC - NDC9 - UB92 - US_STATE - medicaid_cat type: string required: - code - codebook type: object cost: additionalProperties: false properties: allowed: oneOf: - type: string - type: number - type: "null" category: type: string charge: oneOf: - type: string - type: number - type: "null" cost: oneOf: - type: string - type: number - type: "null" description: type: string transaction: type: string type: object location: additionalProperties: false properties: location: enum: - Inpatient - Outpatient type: string type: object provider: additionalProperties: false properties: provider_id: type: string provider_type: oneOf: - type: string - type: "null" taxonomy: oneOf: - type: string - type: "null" required: - provider_id type: object required: - code - location type: object
Eligibility
The facts of the eligibility domain.
- Dhall
-
{ plan : { benefit : < List : List Text | Singleton : Text > , exchange : < Group | IndFederal | IndState | Medicaid | Medicare | None | ThirdParty | Unknown > , group_id : Optional Text , plan_id : Optional Text , subscriber_id : Optional Text , subscriber_relationship : Optional Text } }
- JSON Schema
-
additionalProperties: false properties: plan: additionalProperties: false properties: benefit: oneOf: - type: string - items: type: string type: array exchange: enum: - Group - IndFederal - IndState - Medicaid - Medicare - None - ThirdParty - Unknown type: string group_id: type: string plan_id: type: string subscriber_id: type: string subscriber_relationship: type: string required: - exchange type: object required: - plan type: object
Enrollment
The facts of the enrollment domain.
- Dhall
-
{ plan : { benefit : < List : List Text | Singleton : Text > , exchange : < Group | IndFederal | IndState | Medicaid | Medicare | None | ThirdParty | Unknown > , group_id : Optional Text , plan_id : Optional Text , subscriber_id : Optional Text , subscriber_relationship : Optional Text } }
- JSON Schema
-
additionalProperties: false properties: plan: additionalProperties: false properties: benefit: oneOf: - type: string - items: type: string type: array exchange: enum: - Group - IndFederal - IndState - Medicaid - Medicare - None - ThirdParty - Unknown type: string group_id: type: string plan_id: type: string subscriber_id: type: string subscriber_relationship: type: string required: - exchange type: object required: - plan type: object
Claim
The facts of the claim domain.
- Dhall
-
{ claim : { id : Text , index : Optional Integer , procedure : Optional Text , type : Optional Text } , cost : Optional { allowed : Optional < Double | Text > , category : Optional Text , charge : Optional < Double | Text > , cost : Optional < Double | Text > , description : Optional Text , transaction : Optional Text } , location : Optional < Inpatient | Outpatient > , provider : Optional { provider_id : Text , provider_type : Optional Text , taxonomy : Optional Text } }
- JSON Schema
-
additionalProperties: false properties: claim: additionalProperties: false properties: id: type: string index: oneOf: - type: integer - type: "null" procedure: oneOf: - type: string - type: "null" type: oneOf: - type: string - type: "null" required: - id type: object cost: additionalProperties: false properties: allowed: oneOf: - type: string - type: number - type: "null" category: type: string charge: oneOf: - type: string - type: number - type: "null" cost: oneOf: - type: string - type: number - type: "null" description: type: string transaction: type: string type: object location: additionalProperties: false properties: location: enum: - Inpatient - Outpatient type: string type: object provider: additionalProperties: false properties: provider_id: type: string provider_type: oneOf: - type: string - type: "null" taxonomy: oneOf: - type: string - type: "null" required: - provider_id type: object required: - claim type: object
Medication
The facts of the medication domain.
- Dhall
-
{ claim : Optional { id : Text , index : Optional Integer , procedure : Optional Text , type : Optional Text } , code : { code : Text , codebook : < CDT | CPT | HCPCS | ICD10 | ICD10PC | ICD9 | ICD9PC | LOINC | NABSP | NDC | NDC9 | UB92 | US_STATE | medicaid_cat > } , cost : Optional { allowed : Optional < Double | Text > , category : Optional Text , charge : Optional < Double | Text > , cost : Optional < Double | Text > , description : Optional Text , transaction : Optional Text } , fill : Optional { days_supply : Optional Integer , quantity : Optional Double , strength : Optional Text } , location : Optional < Inpatient | Outpatient > , provider : Optional { provider_id : Text , provider_type : Optional Text , taxonomy : Optional Text } }
- JSON Schema
-
additionalProperties: false properties: claim: additionalProperties: false properties: id: type: string index: oneOf: - type: integer - type: "null" procedure: oneOf: - type: string - type: "null" type: oneOf: - type: string - type: "null" required: - id type: object code: additionalProperties: false properties: code: pattern: "^[A-Za-z0-9]+$" type: string codebook: enum: - CDT - CPT - HCPCS - ICD10 - ICD10PC - ICD9 - ICD9PC - LOINC - NABSP - NDC - NDC9 - UB92 - US_STATE - medicaid_cat type: string required: - code - codebook type: object cost: additionalProperties: false properties: allowed: oneOf: - type: string - type: number - type: "null" category: type: string charge: oneOf: - type: string - type: number - type: "null" cost: oneOf: - type: string - type: number - type: "null" description: type: string transaction: type: string type: object fill: additionalProperties: false properties: days_supply: oneOf: - type: integer - type: "null" quantity: oneOf: - type: number - type: "null" strength: oneOf: - type: string - type: "null" type: object location: additionalProperties: false properties: location: enum: - Inpatient - Outpatient type: string type: object provider: additionalProperties: false properties: provider_id: type: string provider_type: oneOf: - type: string - type: "null" taxonomy: oneOf: - type: string - type: "null" required: - provider_id type: object required: - code - location type: object
Demographics
The facts of the demographics domain.
- Dhall
-
{ demo : { field : < BirthDate | BirthYear | County | CountyFIPS | Ethnicity | Gender | GeoAdiNatRank | GeoAdiStateRank | GeoPctAmIndian | GeoPctAsian | GeoPctBlack | GeoPctHispanic | GeoPctMutli | GeoPctOther | GeoPctWhite | GeoType | Race | RaceCodes | Region | State | UrbanRural | Zipcode > , info : < List : List (Optional Text) | Singleton : Optional Text > } }
- JSON Schema
-
additionalProperties: false properties: demo: additionalProperties: false properties: field: enum: - BirthDate - BirthYear - County - CountyFIPS - Ethnicity - Gender - GeoAdiNatRank - GeoAdiStateRank - GeoPctAmIndian - GeoPctAsian - GeoPctBlack - GeoPctHispanic - GeoPctMutli - GeoPctOther - GeoPctWhite - GeoType - Race - RaceCodes - Region - State - UrbanRural - Zipcode type: string info: oneOf: - oneOf: - type: "null" - type: string - items: oneOf: - type: "null" - type: string type: array required: - field - info type: object required: - demo - location type: object
Procedure
The facts of the procedure domain.
- Dhall
-
{ claim : Optional { id : Text , index : Optional Integer , procedure : Optional Text , type : Optional Text } , code : { code : Text , codebook : < CDT | CPT | HCPCS | ICD10 | ICD10PC | ICD9 | ICD9PC | LOINC | NABSP | NDC | NDC9 | UB92 | US_STATE | medicaid_cat > } , cost : Optional { allowed : Optional < Double | Text > , category : Optional Text , charge : Optional < Double | Text > , cost : Optional < Double | Text > , description : Optional Text , transaction : Optional Text } , location : Optional < Inpatient | Outpatient > , provider : Optional { provider_id : Text , provider_type : Optional Text , taxonomy : Optional Text } }
- JSON Schema
-
additionalProperties: false properties: claim: additionalProperties: false properties: id: type: string index: oneOf: - type: integer - type: "null" procedure: oneOf: - type: string - type: "null" type: oneOf: - type: string - type: "null" required: - id type: object code: additionalProperties: false properties: code: pattern: "^[A-Za-z0-9]+$" type: string codebook: enum: - CDT - CPT - HCPCS - ICD10 - ICD10PC - ICD9 - ICD9PC - LOINC - NABSP - NDC - NDC9 - UB92 - US_STATE - medicaid_cat type: string required: - code - codebook type: object cost: additionalProperties: false properties: allowed: oneOf: - type: string - type: number - type: "null" category: type: string charge: oneOf: - type: string - type: number - type: "null" cost: oneOf: - type: string - type: number - type: "null" description: type: string transaction: type: string type: object location: additionalProperties: false properties: location: enum: - Inpatient - Outpatient type: string type: object provider: additionalProperties: false properties: provider_id: type: string provider_type: oneOf: - type: string - type: "null" taxonomy: oneOf: - type: string - type: "null" required: - provider_id type: object required: - code - location type: object
Building and publishing the JSON schema
The schema
folder contains all the schema fragments used to build up the complex event schema. The build process uses the package build_edm
script to bundle the schema fragments into a single self-contained JSON schema.
The generated schemas are identified with the current version (coming from package.json) and are assumed to be published under the following path:
https://docs.novisci.com/schema/event-data-model/[version]/[schema].json
The following command will copy the built schemas under dist
to the S3 bucket containing the files for docs.novisci.com
. The command assumes the current user has permission to write to the bucket and invalidate the CloudFront cache.
npm run publish