Estimating transition probabilities • nsSank

Motivation

The NoviSci Sankeys present all possible flows members of a population may take through a set of states at multiple stages. In many cases, we use this application to depict patient treatment transitions over time, where the “states” are the set of possible treatments, and “stages” are the set of time points at which we cross-sectionally evaluate the proportion of each state in the patient population.

For example, if there are three time points \(t = \{1, 2, 3\}\) and two treatment states \(a_t\in\{ a_1, a_2\}\), all possible flows can be enumerated as the following:

\(t=1\)	\(t=2\)	\(t=3\)
\(a_1\)	\(a_1\)	\(a_1\)
\(a_1\)	\(a_2\)	\(a_1\)
\(a_1\)	\(a_1\)	\(a_2\)
\(a_1\)	\(a_2\)	\(a_2\)
\(a_2\)	\(a_1\)	\(a_1\)
\(a_2\)	\(a_2\)	\(a_1\)
\(a_2\)	\(a_1\)	\(a_2\)
\(a_2\)	\(a_2\)	\(a_2\)

When all state information is available for all stages for the entire population, the flow sizes (the number of members on a particular treatment transition path) can be calculated directly by counting the number of members in each possible flow. In R, this is can be as simple as group_by(stage, state) %>% tally.

However, oftentimes we are missing at least some member data from censoring due to disenrollment from a health plan, consent withdrawal, or other reasons causing loss of follow-up. Instead of discarding these members from the population used in calculating the flows, we want to use all the information provided up until the point they are censored.

To do this, we make three key assumptions. Let \(A_t\) be the state at stage \(t\), \(t \in \{1, 2, 3, ..., m\}\), with possible values \(a_t\in \{ a_1, a_2,...\}\). First, we make a Markov assumption that given the state at time \(t-1\), \(A_{t-1}\), the state at \(t\), \(A_t\),is conditionally independent of the states at all previous time points. Using this assumption, we can factor the probability of a given treatment flow as

\[\begin{align} P(\bar{A}_m=\bar{a}_m)&=P(A_1=a_1)\prod_{t=2}^{m}P(A_t=a_t|\bar{A}_{t-1}=\bar{a}_{t-1}) \\ \\ & =P(A_1=a_1)\prod_{t=2}^{m}{P(A_t=a_t|A_{t-1}=a_{t-1})}. \end{align}\]

where \(\bar{A}_t\) is the state history through \(t\), \(\{A_1,A_2,…,A_t\}\). This allows us to focus on estimating the probabilities of each transition between states at two successive stages.

Second, we make a similar Markov assumption related to censoring. Specifically, we assume that conditional on \(A_{t-1}\) and being uncensored at \(t-1\), the probability of censoring on or after \(t\) is independent of all past states. This means that \(P(\Delta_t=1|\bar{A}_{t-1}=\bar{a}_{t-1},\Delta_{t-1}=1)=P(\Delta_t=1|A_{t-1}=a_{t-1},\Delta_{t-1}=1)\).

Finally, we assume that, conditional on \(A_t\) and being uncensored at \(t\), censoring on or after \(t+1\) is independent of all states after \(t\). Formally, this assumption can be stated as \(\Delta_{t+1} \perp \!\!\! \perp A_{t+1},A_{t+2},...|A_{t},\Delta_t=1\). To account for members being censored between stages \(t\) and \(t+1\), uncensored members are assigned weights calculated from the inverse probability of being uncensored at stage \(t+1\) given the state at stage \(t\).

Transition probabilities

We estimate the probability of being on treatment \(a\) at the first time point as: \[ \hat{P}(A_1=a_1) = \frac{1}{N}\sum_{i=1}^N\frac{\Delta_{1i}I(A_{1i}=a_1)}{\hat{P}(\Delta_1=1)}\]

where \(\Delta_{1}=I(C>1)\) is an indicator that the member was uncensored at \(t=1\) , \(C\) is the time of censoring, and \(\hat{P}(\Delta_1=1)\) is the probability of being uncensored at the first stage. In most cases it is reasonable to expect that no censoring occurs prior to the first time point and this estimator simplifies to \(\hat{P}(A_1=a_1) = \frac{1}{N}\sum_{i=1}^NI(A_{1i}=a_1)\).

We estimate the remaining conditional probabilities of being in a state given the history of states as follows: \[\begin{align} \hat{P}(A_t=a_t|\bar{A}_{t-1}=\bar{a}_{t-1}) &= \hat{P}(A_t=a_t|A_{t-1}=a_{t-1}) \\ \\ &= \frac{\hat{P}(A_t=a_t,A_{t-1}=a_{t-1})}{\hat{P}(A_{t-1}=a_{t-1})} \\ \\ &= \frac{\frac{1}{N}\sum^N_{i=1}\Delta_{ti}I(A_{ti}=a)I(A_{(t-1)i}=a_{t-1})/\hat{P}(\Delta_t=1|\bar{A}_{t-1}=\bar{a}_{t-1})}{\frac{1}{N}\sum^N_{i=1}\Delta_{(t-1)i}I(A_{(t-1)i}=a_{t-1})/\hat{P}(\Delta_{t-1}=1|\bar{A}_{t-2}=\bar{a}_{t-2})} \end{align}\]

where the first equality holds by our Markov assumption for states, the second equality holds by the definition of conditional probability, and third equality holds by the definition of our inverse probability of censoring weighted estimator.

Next, we note the following:

\[\begin{align} P(\Delta_t=1|\bar{A}_{t-1}) &= P(\Delta_t=1,\Delta_{t-1}=1,...,\Delta_{1}=1|\bar{A}_{t-1}=\bar{a}_{t-1}) \\ \\ &= \prod_{k=1}^{t}{P(\Delta_k=1|\bar{A}_{k-1}=\bar{a}_{k-1},\Delta_{k-1}=1)} \\ \\ &= \prod_{k=1}^{t}{P(\Delta_k=1|A_{k-1}=a_{k-1},\Delta_{k-1}=1)} \end{align}\]

where the first equality holds by the definition of \(\Delta\), the second equality holds by the censoring independence assumption, and the last equality holds by the Markov assumption for censoring. The estimator thus reduces to:

\[ \hat{P}(A_t=a_t|\bar{A}_{t-1}=\bar{a}_{t-1}) = \frac{\frac{1}{N}\sum^N_{i=1}\Delta_{ti}I(A_{ti}=a)I(A_{(t-1)i}=a_{t-1})}{\frac{1}{N}\sum^N_{i=1}\Delta_{(t-1)i}I(A_{(t-1)i}=a_{t-1})}\times\frac{1}{\hat{P}(\Delta_t=1|A_{t-1}=a_{1-1},\Delta_{t-1}=1)} \\ \\ \]

The probability \(\hat{P}(\Delta_t=1|A_{t-1}=a_{1-1},\Delta_{t-1}=1)\) is estimated using a simple logistic model, and since the state at the previous stage is the only covariate it is equivalent to estimating the proportion. That is:

\[\hat{P}(\Delta_t=1|A_{t-1}=a_{t-1},\Delta_{t-1}=1) = \frac{\sum^N_{i=1}\Delta_{ti}I(A_{(t-1)i}=a_{t-1})}{\sum^N_{i=1}\Delta_{(t-1)i}I(A_{(t-1)i}=a_{t-1})}\]

Substituting this in simplifies the conditional probability estimator to:

\[\hat{P}(A_t=a_t|A_{t-1}=a_{t-1}) =\frac{\sum^N_{i=1}\Delta_{ti}I(A_{ti}=a)I(A_{(t-1)i}=a_{t-1})}{\sum^N_{i=1}\Delta_{ti}I(A_{(t-1)i}=a_{t-1})}\]

In this way, we can estimate the weighted transition probabilities between two successive states independently, and then multiply the estimated probabilities together to find the probability of all of the flows.

\[ \hat{P}(\bar{A}_t=\bar{a}_t) = \frac{1}{N}\sum_{i=1}^N\frac{\Delta_{1i}I(A_{1i}=a_1)}{\hat{P}(\Delta_1=1)}\prod^m_{t=2} \frac{\sum^N_{i=1}\Delta_{ti}I(A_{ti}=a)I(A_{(t-1)i}=a_{t-1})}{\sum^N_{i=1}\Delta_{ti}I(A_{(t-1)i}=a_{t-1})}\]

nsSank calculations

As an example, consider a Sankey where the only states \(a_t\) are A, B, and None and we are interested in 3 time points: 0, 180, 365 \((t=1,2,3)\)

The first step in the Sankeys returns a patient-level data.frame with a column for each time point and the state for each patient at each time point. Also included is the censor time and death time (if applicable).

#> # A tibble: 10,000 × 11
#>    patient_id time_0 time_180 time_365 index_date death_date censor_date
#>         <int>  <int>    <int>    <int> <date>          <dbl>       <dbl>
#>  1          1      2        4        4 2010-06-10         NA         125
#>  2          2      2        2        4 2010-06-16         NA         310
#>  3          3      2        2        4 2010-06-03         NA         296
#>  4          4      1        4        4 2010-06-22         NA          13
#>  5          5      2        4        4 2010-06-08         NA           6
#>  6          6      2        2        2 2010-06-07         NA         378
#>  7          7      0        4        4 2010-06-28         NA          79
#>  8          8      2        2        2 2010-06-14         NA        1409
#>  9          9      2        4        4 2010-06-15        206          30
#> 10         10      2        4        4 2010-06-18         NA          24
#> # … with 9,990 more rows, and 4 more variables: absorbing <dbl>,
#> #   absorbing_label <chr>, censor <dbl>, censor_label <chr>

Then, at each stage, we filter the results to the non-censored members (censor_day \(> t\)) and calculate the probability of each state at \(t\) conditional on the state at \(t-1\).

Below is the structure \(list(P(A_1), P(A_2|A_1), P(A_3|A_2))\) calculated for all states:

#> $time_0
#> # A tibble: 3 × 2
#>   time_0 trans_prob
#>    <int>      <dbl>
#> 1      0      0.181
#> 2      1      0.137
#> 3      2      0.682
#> 
#> $time_180
#> # A tibble: 12 × 3
#> # Groups:   time_0 [3]
#>    time_0 time_180 trans_prob
#>     <int>    <int>      <dbl>
#>  1      0        0      0.176
#>  2      0        1      0.145
#>  3      0        2      0.539
#>  4      0        3      0.140
#>  5      1        0      0.190
#>  6      1        1      0.147
#>  7      1        2      0.524
#>  8      1        3      0.138
#>  9      2        0      0.151
#> 10      2        1      0.112
#> 11      2        2      0.607
#> 12      2        3      0.130
#> 
#> $time_365
#> # A tibble: 7 × 3
#> # Groups:   time_180 [4]
#>   time_180 time_365 trans_prob
#>      <int>    <int>      <dbl>
#> 1        0        2     0.931 
#> 2        0        3     0.0688
#> 3        1        2     0.941 
#> 4        1        3     0.0587
#> 5        2        2     0.911 
#> 6        2        3     0.0889
#> 7        3        3     1

Finally, we combine the above data.frames by joining on the matching treatment at each stage, and thereby generating the probabilities of all of the treatment pathways. The size of each flow is calculated by multiplying the probability of the flow by the overall N (in this case 10000).

#> Warning in dplyr::full_join(prev, nxt, by = c(by_name)): Each row in `x` is expected to match at most 1 row in `y`.
#> Each row in `x` is expected to match at most 1 row in `y`.
#> ℹ Row 1 of `x` matches multiple rows.
#> ℹ If multiple matches are expected, set `multiple = "all"` to silence this
#>   warning.
#> # A tibble: 21 × 5
#>    time_0 time_180 time_365 trans_prob  size
#>     <int>    <int>    <int>      <dbl> <dbl>
#>  1      0        0        2    0.0296  296. 
#>  2      0        0        3    0.00219  21.9
#>  3      0        1        2    0.0246  246. 
#>  4      0        1        3    0.00153  15.3
#>  5      0        2        2    0.0886  886. 
#>  6      0        2        3    0.00865  86.5
#>  7      0        3        3    0.0254  254. 
#>  8      1        0        2    0.0243  243. 
#>  9      1        0        3    0.00179  17.9
#> 10      1        1        2    0.0190  190. 
#> # … with 11 more rows

sum(flows$size)
#> [1] 10000
sum(flows$trans_prob)
#> [1] 1