The cohort_file is a data.frame with
one-record-per-id. The purpose of this file to provide the
nsSank package with overall id-level information. Required
variables include a record id, and index variable. Optional variables
include baseline variables that can be used to filter the Sankeys,
censor dates, and absorbing state dates. “Absorbing states” are states
that ids enter and can not transition out of (death being a
primary example). All ids in the event file
should have a corresponding id-level record in the cohort
file.
#> # A tibble: 10,000 × 9
#> patient_id index_date censor_date discontinue_date death_date sex prior_mi
#> <int> <date> <date> <date> <date> <fct> <fct>
#> 1 1 2010-06-25 2012-03-08 2010-07-18 NA Male No Prio…
#> 2 2 2010-06-07 2011-08-22 NA 2010-09-25 Female No Prio…
#> 3 3 2010-06-13 2013-02-19 2011-12-15 NA Female Prior MI
#> 4 4 2010-06-08 2011-06-18 2010-08-01 NA Male No Prio…
#> 5 5 2010-06-18 2011-12-26 2013-02-26 NA Female No Prio…
#> 6 6 2010-06-19 2010-11-02 2010-12-29 NA Male No Prio…
#> 7 7 2010-06-30 2010-08-30 NA NA Female No Prio…
#> 8 8 2010-06-13 2012-02-27 2011-09-19 NA Male Prior MI
#> 9 9 2010-06-20 2011-06-10 NA 2010-07-26 Male No Prio…
#> 10 10 2010-06-10 2011-01-29 2011-01-16 NA Male No Prio…
#> # … with 9,990 more rows, and 2 more variables: age <int>, age_cat <fct>
The event_file is a long data.frame with
one record per id, state, and start/stop days. The purpose of this file
is to provide event data for each id to the Sankey package, so that
nsSank can determine what state(s) each id is in at each
time point.
<id_var>: some sort of (typically patient)
record identifier. Not all identifiers in the cohort_file
need to be present in the event_file – if an identifier in
the cohort file has no records in the event_file, that
patient will be considered in the empty state until the end of follow-up
or censored.
start: the start date or day of the
event/state.
end: the end date or day of the
event/state.
state: the event of interest corresponding to
start/end times. Typically a treatment (i.e. PCSK9i,
Ezetimibe), but can take on other forms.
#> # A tibble: 4 × 5
#> patient_id start_date end_date is_a is_b
#> <dbl> <date> <date> <lgl> <lgl>
#> 1 1 2010-01-01 2010-01-31 TRUE FALSE
#> 2 1 2010-02-01 2010-03-31 TRUE TRUE
#> 3 2 2010-01-01 2010-01-31 FALSE TRUE
#> 4 2 2010-03-01 2010-04-30 FALSE TRUE
convert_tagged_cdf is a helper function to convert
tagged CDFs to the events format needed for nsSank. The
tagged CDF that is passed through should only have tags for the states
of interest. A cohort is passed through in order to use the index date
to create relative time for each of the states.
By default, rmv_prefix = "is_".
events <- nsSank::convert_tagged_cdf(cdf)
events
#> # A tibble: 5 × 4
#> patient_id start end state
#> <dbl> <date> <date> <chr>
#> 1 1 2010-01-01 2010-01-31 a
#> 2 1 2010-02-01 2010-03-31 a
#> 3 1 2010-02-01 2010-03-31 b
#> 4 2 2010-01-01 2010-01-31 b
#> 5 2 2010-03-01 2010-04-30 bstockpile_events is useful for reducing the size of the
data if there are many overlapping like events. If the same type of
event overlaps or touches (gap = 1L), then the events are
combined and the start and end days are updated accordingly.
gap is used to identify allowable gaps of time between the
same type of event, but is not applied universally (if a universal
allowable discontinuation time is wanted, that can be added to
end or end_date prior to stockpiling)
With the normal gap applies this reduces down to:
nsSank::stockpile_events(events)
#> # A tibble: 4 × 4
#> # Groups: patient_id, state [3]
#> patient_id state start end
#> <dbl> <chr> <date> <date>
#> 1 1 a 2010-01-01 2010-03-31
#> 2 1 b 2010-02-01 2010-03-31
#> 3 2 b 2010-01-01 2010-01-31
#> 4 2 b 2010-03-01 2010-04-30Or can change the gap: for example, gap = 5L:
nsSank::stockpile_events(events, gap = 30L)
#> # A tibble: 3 × 4
#> # Groups: patient_id, state [3]
#> patient_id state start end
#> <dbl> <chr> <date> <date>
#> 1 1 a 2010-01-01 2010-03-31
#> 2 1 b 2010-02-01 2010-03-31
#> 3 2 b 2010-01-01 2010-04-30The sunburst requires ordered events, regardless of time. The
ansible function converts data in an event-time format
(e.g., the CDF, where each row uniquely identifies a single event with
start and stop times) to an interval format, where, within a patient,
time is separated into mutually exclusive time intervals that capture
all events that happened in that interval.
ansible requires event data converted from the CDF. By
default ansible will apply stockpile_events to
the data in order to function properly. If the data are already
stockpiled, can indicate using stockpile = F to skip this
step.
Below, we convert the example data from above to the
ansible format. By default, gap = 1L is
applied.
data <- nsSank::ansible(events)
data
#> # A tibble: 4 × 4
#> patient_id start end state
#> <dbl> <date> <date> <list>
#> 1 1 2010-01-01 2010-01-31 <chr [1]>
#> 2 1 2010-02-01 2010-03-31 <chr [2]>
#> 3 2 2010-01-01 2010-01-31 <chr [1]>
#> 4 2 2010-03-01 2010-04-30 <chr [1]>Time is split into the exclusive intervals where events happen – from
2010-02-01 - 2010-03-30, the A and
B states overlapped for patient 1:
Converting data to this format allows us to (1) filter once on single time to get all events associated with that time, and (2) uniquely order the pattern of events.