To facilitate the partition level data processing, the updates to the
sankey workflow allow for the subject-level calculations to be done in a
separate step from the sankey creation. The only difference now between
the old data formats of the `cohort`

and `events`

file is that no relative time needs to be calculated–as long as dates
are supplied the package can figure it out.

First, the id level data are created used
`sankey_id_events`

or `sankey_id_ansible`

. The
only difference is the second argument – `sankey_id_events`

takes in a `data.frame`

in long format with a start date/ end
date associated with each event. `sankey_id_ansible`

requires
that the second argument is a `data.frame`

in ansible format
(the same format required by sunbursts).

If you aren’t making a sunburst, might as well use
`sankey_id_events`

so you don’t have to deal with the
overhead of converting all the data, which can take awhile.

The idea is that when the cohorts are being created, creating the sankey subject level data can be done on smaller chunks of the data before the necessary summary at the end.

```
sankey_id_events(cohort,
evts,
stages,
id_var = "patient_id",
index_var = "index_date",
states = NULL,
select_event = "combine")
sankey_id_ansible(cohort,
data,
stages,
id_var = "patient_id",
index_var = "index_date",
states = NULL,
select_event = "combine")
```

This is to help facilitate running on partitions on the id-level
first, before the summarizing step in `sankey_list_maker`

.
`sunburst_id_data`

requires a cohort, an `ansible`

style data.frame, and which “states” are wanted. It returns a list of
the resulting data.frame and the states, so it can directly be fed into
`sunburst_maker`

.

Below is an example of the format of the event data, with a start/end date with a corresponding state:

```
events <- nsSank::convert_tagged_cdf(cdf)
events %>%
head() %>%
rmarkdown::paged_table()
```

Below is an example of the format of cohort data. An id-level
variable, index date variable, censor date variable,
`death_date`

, and any potential stratification variables.

```
cohort %>%
head() %>%
rmarkdown::paged_table()
```

Both `sankey_id_events`

and `sankey_id_ansible`

return a list that contains the id-level data, the identified states,
and the identified stages (timepoints). In the id-level data frame,
states and combinations of states are represented as base2 integers.

`sankey_id_data <- nsSank::sankey_id_events(cohort, events, states = c("a", "b", "c"), stages = c(0, 90, 180))`

You can indicate whether you want the function to choose the first state, last state, or combine overlapping events with . The default is to combine. You can see there aren’t any overlapping events, since we are choosing the most recent prior event instead of combining.

`choose_last <- nsSank::sankey_id_events(cohort, events, states = c("a", "b", "c"), stages = c(0, 90, 180), select_event = "last")`

Once you have the id-level data, you can summarize and create your sankey list for a PHR. This returns a list of length one, with options needed for the sankey.

`sankey_list <- nsSank::sankey_list_maker(sankey_id_data, cohort)`

You can use the `collapse_states`

argument to specify how
you want overlapping states presented (or for reasons such as changing
the order of the states, or the labels.) The `none`

argument
additionally labels the empty state.

```
collapse_list <- list(
"Treatment A" = list("a"),
"Treatment B" = list("b"),
"Treatment C" = list("c"),
"Multiple" = list("a & b", "a & c", "b & c", "a & b & c")
)
sankey_list_collapse <- nsSank::sankey_list_maker(sankey_id_data, cohort, collapse_states = collapse_list, none = "No Treatment")
sankey_list_collapse[[1]]$states
#> [1] "Treatment A" "Treatment B" "Treatment C" "Multiple" "No Treatment"
```

If using version `0.2.4`

or later, you can specify
`collapse_states`

as a named vector–this will assign a
hierarchy to the combined states and order accordingly:

```
collapse_vec <- c(
"Treatment B" = "b",
"Treatment A" = "a",
"Treatment C" = "c"
)
sankey_list_collapse <- nsSank::sankey_list_maker(sankey_id_data, cohort, collapse_states = collapse_vec, none = "No Treatment")
sankey_list_collapse[[1]]$states
#> [1] "Treatment B" "Treatment A" "Treatment C" "No Treatment"
```

You can also use the `stage_labels`

argument to specify
how you want the time points labelled on the sankey.

```
sankey_list_collapse_stages <- nsSank::sankey_list_maker(sankey_id_data,
cohort,
collapse_states = collapse_list,
stage_labels = c("Index", "90 Days", "180 Days"))
sankey_list_collapse[[1]]$stages
#> [1] "0" "90" "180"
```

If you want to add filtering to the PHR Sankeys, you can use the
`golf_formula`

argument to specify with what variables and
how you want to be able to filter.

For example, in the `cohort`

we have two categorical
variables we can use: `prior_mi`

and
`age_cat`

.

```
sankey_list_filtered <- nsSank::sankey_list_maker(sankey_id_data,
cohort,
collapse_states = collapse_list,
stage_labels = c("Index", "90 Days", "180 Days"),
gofl_formula = ~ prior_mi*age_cat)
```

That results in a list of lists. The first list `var_info`

contains descriptive information for the levels of the grouping
variables.

The second list `dt`

contains a list of sankey data for
each level.

`prior_mi`

has 2 levels and `age_cat`

has 3
levels, so we should expect 12 different filter combinations.

```
purrr::map(sankey_list_filtered$dt, ~.x[1:2])
#> [[1]]
#> [[1]]$prior_mi
#> [1] NA
#>
#> [[1]]$age_cat
#> [1] NA
#>
#>
#> [[2]]
#> [[2]]$prior_mi
#> [1] "No Prior MI"
#>
#> [[2]]$age_cat
#> [1] NA
#>
#>
#> [[3]]
#> [[3]]$prior_mi
#> [1] "Prior MI"
#>
#> [[3]]$age_cat
#> [1] NA
#>
#>
#> [[4]]
#> [[4]]$prior_mi
#> [1] NA
#>
#> [[4]]$age_cat
#> [1] "(0,40]"
#>
#>
#> [[5]]
#> [[5]]$prior_mi
#> [1] NA
#>
#> [[5]]$age_cat
#> [1] "(40,75]"
#>
#>
#> [[6]]
#> [[6]]$prior_mi
#> [1] NA
#>
#> [[6]]$age_cat
#> [1] "(75,90]"
#>
#>
#> [[7]]
#> [[7]]$prior_mi
#> [1] "No Prior MI"
#>
#> [[7]]$age_cat
#> [1] "(0,40]"
#>
#>
#> [[8]]
#> [[8]]$prior_mi
#> [1] "No Prior MI"
#>
#> [[8]]$age_cat
#> [1] "(40,75]"
#>
#>
#> [[9]]
#> [[9]]$prior_mi
#> [1] "No Prior MI"
#>
#> [[9]]$age_cat
#> [1] "(75,90]"
#>
#>
#> [[10]]
#> [[10]]$prior_mi
#> [1] "Prior MI"
#>
#> [[10]]$age_cat
#> [1] "(0,40]"
#>
#>
#> [[11]]
#> [[11]]$prior_mi
#> [1] "Prior MI"
#>
#> [[11]]$age_cat
#> [1] "(40,75]"
#>
#>
#> [[12]]
#> [[12]]$prior_mi
#> [1] "Prior MI"
#>
#> [[12]]$age_cat
#> [1] "(75,90]"
```