sankey_transition_counts generates a tibble with 4 columns: from, to, transition, and count.
The from column indicates the state a patient was in at the start of a transition period,
to indicates the state they were in at the end of that period (which maybe the same state),
transition labels the period (e.g., "0 to 45"), and count gives thenumber of patients
with that specific from-to combination in that period.
Usage
sankey_transition_counts(
events,
cohort,
id_var = "patient_id",
index_var = "index_date",
stages = NULL,
stage_labels = NULL,
weight = FALSE,
censor_vars = NULL,
absorbing_vars = NULL,
none = "None",
gofl_formula = NULL,
collapse_states = NULL,
collapse_levels = NULL,
select_event = "combine",
force = FALSE,
from_tv_meds = FALSE,
med_names_list = NULL,
tx_name = "tx_name",
tx_start = "tx_start",
tx_stop = "tx_stop",
on_med_tx_end = FALSE,
med_levels = NULL
)Arguments
- events
A long format tibble with patient id, state (i.e. treatment/medication), and start/stop columns. Its purpose is to provide event data for each id to the Sankey package, so that
nsSankcan determine what state(s) each id is in at each time point.- cohort
A wide-format tibble with one-record-per-id that has at minimum 2 columns: a patient id column and a patient index date column. See "Date Overview" vignette for more details.
- id_var
A string specifying the column name in
eventsholding patient IDs (default value:"patient_id")- index_var
A string specifying the column name in
cohortholding patients' index dates (default value:"index_date")- stages
A numeric vector specifying time points (days from index date) at which to evaluate medication status. If
NULL(default value), time points are automatically determined from treatment start/stop dates in the data. Example:c(0, 45, 90)checks medication status at baseline, day 45, and day 90.- stage_labels
A character vector with one label per stage. These labels are used to relabel each numeric stage in the
stagesargument (e.g.,stage_labels = c("Baseline", "Day 45", "Day 90"))- weight
Logical. When
TRUE, applies inverse probability of censoring weights (IPCW) to account for patients censored before the final stage. Rather than raw observed counts, all outputs reflect IPCW-adjusted estimates that better represent the full cohort. Requirescensor_varsto be specified. Default value isFALSE.- censor_vars
A named character vector. Variable names in
cohortthat indicate censoring date. Names are used as state labels (e.g.,censor_vars = c("Censored" = "censor_date")). If unnamed, all censoring states are grouped under "Censored".- absorbing_vars
A named character vector. Variables in
cohortthat indicate the date of absorbing states. Names are used as state labels (e.g.,absorbing_vars = c("Death" = "death_date")).- none
A string. Label for the "empty" (no event) state. Default is
"None".- gofl_formula
A formula. Stratifies the Sankey by grouping variables using
gofl. Default isNULL(e.g.,gofl_formula = ~ sex * age_cat).- collapse_states
A named list or named vector. Controls how patients who are on multiple treatments simultaneously are represented. As a named list (e.g.,
collapse_states = list("Both Treatments" = c("A", "B"))), you explicitly label co-occurring treatment combinations with a custom name. As a named vector (e.g.,c("A" = "a", "B" = "b")), you assign a priority order — patients on multiple treatments are assigned whichever state appears first in the vector. Default value isNULL.- collapse_levels
A named list. Use this when a treatment in your events data appears at multiple intensity levels (e.g., low, moderate, high dose) and you want to treat all levels as a single state. For example,
collapse_levels = list(statin = c("low_intensity_statin", "moderate_intensity_statin", "high_intensity_statin"))combines all statin intensities into onestatinstate. Default value isNULL.- select_event
A string. How overlapping events are selected at each time point.
"combine"(default) returns all events;"first"takes the earliest;"last"takes the most recent.- force
logical. Force transition calculations even if they exceed size guidelines. Default
FALSE.- from_tv_meds
logical. Indicates whether
eventsis already in time-varying format (output ofcreate_time_varying_data). IfFALSE(default), raw events data is expected- med_names_list
A character vector specifying which medications from the
tx_namecolumn to include in the analysis. Any medication not in this vector will be ignored.- tx_name
A string specifying the column name in
eventsholding treatment names (default value:"tx_name")- tx_start
A string specifying the column name in
eventswith treatment start dates (default value:"tx_start")- tx_stop
A string specifying the column name in
eventswith treatment stop dates (default value:"tx_stop")- on_med_tx_end
A logical value indicating whether a patient is considered on treatment on their
tx_stopdate. IfTRUE,tx_stopis the last day ON treatment (inclusive); ifFALSE,tx_stopis the first day OFF treatment (exclusive). (Default value:FALSE)- med_levels
A named list of character vectors specifying levels for leveled medications or treatments (e.g.,
med_levels = list(statin = c("low_intensity_statin", "moderate_intensity_statin","high_intensity_statin"))). Any medication name listed here (e.g. statin) must be present inmed_names_listMedications not listed here but present inmed_names_listare treated as binary (on/off).