To facilitate the partition level data processing, the updates to the
sankey workflow allow for the subject-level calculations to be done in a
separate step from the sankey creation. The only difference now between
the old data formats of the cohort
and events
file is that no relative time needs to be calculated–as long as dates
are supplied the package can figure it out.
First, the id level data are created used
sankey_id_events
or sankey_id_ansible
. The
only difference is the second argument – sankey_id_events
takes in a data.frame
in long format with a start date/ end
date associated with each event. sankey_id_ansible
requires
that the second argument is a data.frame
in ansible format
(the same format required by sunbursts).
If you aren’t making a sunburst, might as well use
sankey_id_events
so you don’t have to deal with the
overhead of converting all the data, which can take awhile.
The idea is that when the cohorts are being created, creating the sankey subject level data can be done on smaller chunks of the data before the necessary summary at the end.
sankey_id_events(cohort,
evts,
stages,
id_var = "patient_id",
index_var = "index_date",
states = NULL,
select_event = "combine")
sankey_id_ansible(cohort,
data,
stages,
id_var = "patient_id",
index_var = "index_date",
states = NULL,
select_event = "combine")
This is to help facilitate running on partitions on the id-level
first, before the summarizing step in sankey_list_maker
.
sunburst_id_data
requires a cohort, an ansible
style data.frame, and which “states” are wanted. It returns a list of
the resulting data.frame and the states, so it can directly be fed into
sunburst_maker
.
Below is an example of the format of the event data, with a start/end date with a corresponding state:
events <- nsSank::convert_tagged_cdf(cdf)
events %>%
head() %>%
rmarkdown::paged_table()
Below is an example of the format of cohort data. An id-level
variable, index date variable, censor date variable,
death_date
, and any potential stratification variables.
cohort %>%
head() %>%
rmarkdown::paged_table()
Both sankey_id_events
and sankey_id_ansible
return a list that contains the id-level data, the identified states,
and the identified stages (timepoints). In the id-level data frame,
states and combinations of states are represented as base2 integers.
sankey_id_data <- nsSank::sankey_id_events(cohort, events, states = c("a", "b", "c"), stages = c(0, 90, 180))
You can indicate whether you want the function to choose the first state, last state, or combine overlapping events with . The default is to combine. You can see there aren’t any overlapping events, since we are choosing the most recent prior event instead of combining.
choose_last <- nsSank::sankey_id_events(cohort, events, states = c("a", "b", "c"), stages = c(0, 90, 180), select_event = "last")
Once you have the id-level data, you can summarize and create your sankey list for a PHR. This returns a list of length one, with options needed for the sankey.
sankey_list <- nsSank::sankey_list_maker(sankey_id_data, cohort)
You can use the collapse_states
argument to specify how
you want overlapping states presented (or for reasons such as changing
the order of the states, or the labels.) The none
argument
additionally labels the empty state.
collapse_list <- list(
"Treatment A" = list("a"),
"Treatment B" = list("b"),
"Treatment C" = list("c"),
"Multiple" = list("a & b", "a & c", "b & c", "a & b & c")
)
sankey_list_collapse <- nsSank::sankey_list_maker(sankey_id_data, cohort, collapse_states = collapse_list, none = "No Treatment")
sankey_list_collapse[[1]]$states
#> [1] "Treatment A" "Treatment B" "Treatment C" "Multiple" "No Treatment"
If using version 0.2.4
or later, you can specify
collapse_states
as a named vector–this will assign a
hierarchy to the combined states and order accordingly:
collapse_vec <- c(
"Treatment B" = "b",
"Treatment A" = "a",
"Treatment C" = "c"
)
sankey_list_collapse <- nsSank::sankey_list_maker(sankey_id_data, cohort, collapse_states = collapse_vec, none = "No Treatment")
sankey_list_collapse[[1]]$states
#> [1] "Treatment B" "Treatment A" "Treatment C" "No Treatment"
You can also use the stage_labels
argument to specify
how you want the time points labelled on the sankey.
sankey_list_collapse_stages <- nsSank::sankey_list_maker(sankey_id_data,
cohort,
collapse_states = collapse_list,
stage_labels = c("Index", "90 Days", "180 Days"))
sankey_list_collapse[[1]]$stages
#> [1] "0" "90" "180"
If you want to add filtering to the PHR Sankeys, you can use the
golf_formula
argument to specify with what variables and
how you want to be able to filter.
For example, in the cohort
we have two categorical
variables we can use: prior_mi
and
age_cat
.
sankey_list_filtered <- nsSank::sankey_list_maker(sankey_id_data,
cohort,
collapse_states = collapse_list,
stage_labels = c("Index", "90 Days", "180 Days"),
gofl_formula = ~ prior_mi*age_cat)
That results in a list of lists. The first list var_info
contains descriptive information for the levels of the grouping
variables.
The second list dt
contains a list of sankey data for
each level.
prior_mi
has 2 levels and age_cat
has 3
levels, so we should expect 12 different filter combinations.
purrr::map(sankey_list_filtered$dt, ~.x[1:2])
#> [[1]]
#> [[1]]$prior_mi
#> [1] NA
#>
#> [[1]]$age_cat
#> [1] NA
#>
#>
#> [[2]]
#> [[2]]$prior_mi
#> [1] "No Prior MI"
#>
#> [[2]]$age_cat
#> [1] NA
#>
#>
#> [[3]]
#> [[3]]$prior_mi
#> [1] "Prior MI"
#>
#> [[3]]$age_cat
#> [1] NA
#>
#>
#> [[4]]
#> [[4]]$prior_mi
#> [1] NA
#>
#> [[4]]$age_cat
#> [1] "(0,40]"
#>
#>
#> [[5]]
#> [[5]]$prior_mi
#> [1] NA
#>
#> [[5]]$age_cat
#> [1] "(40,75]"
#>
#>
#> [[6]]
#> [[6]]$prior_mi
#> [1] NA
#>
#> [[6]]$age_cat
#> [1] "(75,90]"
#>
#>
#> [[7]]
#> [[7]]$prior_mi
#> [1] "No Prior MI"
#>
#> [[7]]$age_cat
#> [1] "(0,40]"
#>
#>
#> [[8]]
#> [[8]]$prior_mi
#> [1] "No Prior MI"
#>
#> [[8]]$age_cat
#> [1] "(40,75]"
#>
#>
#> [[9]]
#> [[9]]$prior_mi
#> [1] "No Prior MI"
#>
#> [[9]]$age_cat
#> [1] "(75,90]"
#>
#>
#> [[10]]
#> [[10]]$prior_mi
#> [1] "Prior MI"
#>
#> [[10]]$age_cat
#> [1] "(0,40]"
#>
#>
#> [[11]]
#> [[11]]$prior_mi
#> [1] "Prior MI"
#>
#> [[11]]$age_cat
#> [1] "(40,75]"
#>
#>
#> [[12]]
#> [[12]]$prior_mi
#> [1] "Prior MI"
#>
#> [[12]]$age_cat
#> [1] "(75,90]"