plot_sankey uses events and cohort data to generate in an R Sankey diagram, allowing users to
explore treatment pathways without formatting data for Target RWE platform.
Usage
plot_sankey(
events,
cohort,
id_var = "patient_id",
index_var = "index_date",
stages = NULL,
stage_labels = NULL,
weight = FALSE,
censor_vars = NULL,
absorbing_vars = NULL,
none = "None",
gofl_formula = NULL,
collapse_states = NULL,
collapse_levels = NULL,
select_event = "combine",
force = FALSE,
from_tv_meds = FALSE,
med_names_list = NULL,
tx_name = "tx_name",
tx_start = "tx_start",
tx_stop = "tx_stop",
on_med_tx_end = FALSE,
med_levels = NULL
)Arguments
- events
A long format tibble with patient id, state (i.e. treatment/medication), and start/stop columns. Its purpose is to provide event data for each id to the Sankey package, so that
nsSankcan determine what state(s) each id is in at each time point.- cohort
A wide-format tibble with one-record-per-id that has at minimum 2 columns: a patient id column and a patient index date column. See "Date Overview" vignette for more details.
- id_var
A string specifying the column name in
eventsholding patient IDs (default value:"patient_id")- index_var
A string specifying the column name in
cohortholding patients' index dates (default value:"index_date")- stages
A numeric vector specifying time points (days from index date) at which to evaluate medication status. If
NULL(default value), time points are automatically determined from treatment start/stop dates in the data. Example:c(0, 45, 90)checks medication status at baseline, day 45, and day 90.- stage_labels
A character vector with one label per stage. These labels are used to relabel each numeric stage in the
stagesargument (e.g.,stage_labels = c("Baseline", "Day 45", "Day 90"))- weight
Logical. When
TRUE, applies inverse probability of censoring weights (IPCW) to account for patients censored before the final stage. Rather than raw observed counts, all outputs reflect IPCW-adjusted estimates that better represent the full cohort. Requirescensor_varsto be specified. Default value isFALSE.- censor_vars
A named character vector. Variable names in
cohortthat indicate censoring date. Names are used as state labels (e.g.,censor_vars = c("Censored" = "censor_date")). If unnamed, all censoring states are grouped under "Censored".- absorbing_vars
A named character vector. Variables in
cohortthat indicate the date of absorbing states. Names are used as state labels (e.g.,absorbing_vars = c("Death" = "death_date")).- none
A string. Label for the "empty" (no event) state. Default is
"None".- gofl_formula
A formula. Stratifies the Sankey by grouping variables using
gofl. Default isNULL(e.g.,gofl_formula = ~ sex * age_cat).- collapse_states
A named list or named vector. Controls how patients who are on multiple treatments simultaneously are represented. As a named list (e.g.,
collapse_states = list("Both Treatments" = c("A", "B"))), you explicitly label co-occurring treatment combinations with a custom name. As a named vector (e.g.,c("A" = "a", "B" = "b")), you assign a priority order — patients on multiple treatments are assigned whichever state appears first in the vector. Default value isNULL.- collapse_levels
A named list. Use this when a treatment in your events data appears at multiple intensity levels (e.g., low, moderate, high dose) and you want to treat all levels as a single state. For example,
collapse_levels = list(statin = c("low_intensity_statin", "moderate_intensity_statin", "high_intensity_statin"))combines all statin intensities into onestatinstate. Default value isNULL.- select_event
A string. How overlapping events are selected at each time point.
"combine"(default) returns all events;"first"takes the earliest;"last"takes the most recent.- force
logical. Force transition calculations even if they exceed size guidelines. Default
FALSE.- from_tv_meds
logical. Indicates whether
eventsis already in time-varying format (output ofcreate_time_varying_data). IfFALSE(default), raw events data is expected- med_names_list
A character vector specifying which medications from the
tx_namecolumn to include in the analysis. Any medication not in this vector will be ignored.- tx_name
A string specifying the column name in
eventsholding treatment names (default value:"tx_name")- tx_start
A string specifying the column name in
eventswith treatment start dates (default value:"tx_start")- tx_stop
A string specifying the column name in
eventswith treatment stop dates (default value:"tx_stop")- on_med_tx_end
A logical value indicating whether a patient is considered on treatment on their
tx_stopdate. IfTRUE,tx_stopis the last day ON treatment (inclusive); ifFALSE,tx_stopis the first day OFF treatment (exclusive). (Default value:FALSE)- med_levels
A named list of character vectors specifying levels for leveled medications or treatments (e.g.,
med_levels = list(statin = c("low_intensity_statin", "moderate_intensity_statin","high_intensity_statin"))). Any medication name listed here (e.g. statin) must be present inmed_names_listMedications not listed here but present inmed_names_listare treated as binary (on/off).