The cohort_file
is a data.frame
with
one-record-per-id. The purpose of this file to provide the
nsSank
package with overall id-level information. Required
variables include a record id, and index variable. Optional variables
include baseline variables that can be used to filter the Sankeys,
censor dates, and absorbing state dates. “Absorbing states” are states
that id
s enter and can not transition out of (death being a
primary example). All id
s in the event
file
should have a corresponding id-level record in the cohort
file.
#> # A tibble: 10,000 × 9
#> patient_id index_date censor_date discontinue_date death_date sex prior_mi
#> <int> <date> <date> <date> <date> <fct> <fct>
#> 1 1 2010-06-16 2010-07-23 2010-10-19 NA Female No Prio…
#> 2 2 2010-06-07 2011-08-07 2011-03-04 NA Male No Prio…
#> 3 3 2010-06-02 2011-01-28 2011-08-21 NA Female No Prio…
#> 4 4 2010-06-04 2010-07-02 2010-07-24 NA Female No Prio…
#> 5 5 2010-06-27 2011-07-11 NA NA Male No Prio…
#> 6 6 2010-06-10 2011-06-30 2011-03-08 NA Female No Prio…
#> 7 7 2010-06-06 2011-04-13 NA NA Female No Prio…
#> 8 8 2010-06-15 2011-10-23 NA 2011-01-07 Female Prior MI
#> 9 9 2010-06-29 2011-04-07 2012-03-26 NA Female No Prio…
#> 10 10 2010-06-25 2011-06-01 NA NA Male No Prio…
#> # … with 9,990 more rows, and 2 more variables: age <int>, age_cat <fct>
The event_file
is a long data.frame
with
one record per id, state, and start/stop days. The purpose of this file
is to provide event data for each id to the Sankey package, so that
nsSank
can determine what state(s) each id is in at each
time point.
<id_var>
: some sort of (typically patient)
record identifier. Not all identifiers in the cohort_file
need to be present in the event_file
– if an identifier in
the cohort file has no records in the event_file
, that
patient will be considered in the empty state until the end of follow-up
or censored.
start
: the start date or day of the
event/state.
end
: the end date or day of the
event/state.
state
: the event of interest corresponding to
start/end times. Typically a treatment (i.e. PCSK9i
,
Ezetimibe
), but can take on other forms.
#> # A tibble: 4 × 5
#> patient_id start_date end_date is_a is_b
#> <dbl> <date> <date> <lgl> <lgl>
#> 1 1 2010-01-01 2010-01-31 TRUE FALSE
#> 2 1 2010-02-01 2010-03-31 TRUE TRUE
#> 3 2 2010-01-01 2010-01-31 FALSE TRUE
#> 4 2 2010-03-01 2010-04-30 FALSE TRUE
convert_tagged_cdf
is a helper function to convert
tagged CDFs to the events format needed for nsSank
. The
tagged CDF that is passed through should only have tags for the states
of interest. A cohort is passed through in order to use the index date
to create relative time for each of the states.
By default, rmv_prefix = "is_"
.
events <- nsSank::convert_tagged_cdf(cdf)
events
#> # A tibble: 5 × 4
#> patient_id start end state
#> <dbl> <date> <date> <chr>
#> 1 1 2010-01-01 2010-01-31 a
#> 2 1 2010-02-01 2010-03-31 a
#> 3 1 2010-02-01 2010-03-31 b
#> 4 2 2010-01-01 2010-01-31 b
#> 5 2 2010-03-01 2010-04-30 b
stockpile_events
is useful for reducing the size of the
data if there are many overlapping like events. If the same type of
event overlaps or touches (gap = 1L
), then the events are
combined and the start and end days are updated accordingly.
gap
is used to identify allowable gaps of time between the
same type of event, but is not applied universally (if a universal
allowable discontinuation time is wanted, that can be added to
end
or end_date
prior to stockpiling)
With the normal gap applies this reduces down to:
nsSank::stockpile_events(events)
#> # A tibble: 4 × 4
#> # Groups: patient_id, state [3]
#> patient_id state start end
#> <dbl> <chr> <date> <date>
#> 1 1 a 2010-01-01 2010-03-31
#> 2 1 b 2010-02-01 2010-03-31
#> 3 2 b 2010-01-01 2010-01-31
#> 4 2 b 2010-03-01 2010-04-30
Or can change the gap: for example, gap = 5L
:
nsSank::stockpile_events(events, gap = 30L)
#> # A tibble: 3 × 4
#> # Groups: patient_id, state [3]
#> patient_id state start end
#> <dbl> <chr> <date> <date>
#> 1 1 a 2010-01-01 2010-03-31
#> 2 1 b 2010-02-01 2010-03-31
#> 3 2 b 2010-01-01 2010-04-30
The sunburst requires ordered events, regardless of time. The
ansible
function converts data in an event-time format
(e.g., the CDF, where each row uniquely identifies a single event with
start and stop times) to an interval format, where, within a patient,
time is separated into mutually exclusive time intervals that capture
all events that happened in that interval.
ansible
requires event data converted from the CDF. By
default ansible
will apply stockpile_events
to
the data in order to function properly. If the data are already
stockpiled, can indicate using stockpile = F
to skip this
step.
Below, we convert the example data from above to the
ansible
format. By default, gap = 1L
is
applied.
data <- nsSank::ansible(events)
data
#> # A tibble: 4 × 4
#> patient_id start end state
#> <dbl> <date> <date> <list>
#> 1 1 2010-01-01 2010-01-31 <chr [1]>
#> 2 1 2010-02-01 2010-03-31 <chr [2]>
#> 3 2 2010-01-01 2010-01-31 <chr [1]>
#> 4 2 2010-03-01 2010-04-30 <chr [1]>
Time is split into the exclusive intervals where events happen – from
2010-02-01 - 2010-03-30
, the A
and
B
states overlapped for patient 1:
Converting data to this format allows us to (1) filter once on single time to get all events associated with that time, and (2) uniquely order the pattern of events.