estimator_check.Rmd
To assess the validity of the estimators in the
causalRisk
package, we compare them to a variety of
widely-used estimators from standard R packages. Note that because most
of the functionality in the causalRisk
package is not
implemented in existing packages, we can only assess the validity of
some of the functions.
causalRisk
agrees with a Kaplan-Meier
estimator with a single source of right censoring and no treatment.
Using the Kaplan-Meier function in the survival
pacakge,
we estimate the cumulative risk of mortality by 20 months:
test_data = example1
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
standard = survfit(Surv(Ymin,D)~1, data = test_data)
100*round(1-summary(standard, times=c(20))$surv, 5)
## [1] 45.886
Using the causalRisk
package, we should get an estimated
cumulative risk with similar reported precision.
models = specify_models(identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(example1, models,
label = c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round = 3)
causalRisk
agrees with a Kaplan-Meier
estimator with a single source of right censoring and a 2-level
treatment.
Using the Kaplan-Meier function in the survival
pacakge,
we estimate the cumulative risk at 20 months as:
standard = survfit(Surv(Ymin,D) ~ Statin, data = test_data)
100*round(1 - summary(standard, times = c(20))$surv, 5)
## [1] 34.879 49.293
Using causalRisk
we should get the same estimated
cumulative risk within reported precision:
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(example1, models,
label=c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round =3)
causalRisk
agrees with a Kaplan-Meier
estimator with a single source of right censoring and a 2-level
treatment and maximum follow-up set.
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(example1, models, tau = 21,
label=c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round = 3)
causalRisk
agrees with a Kaplan-Meier
estimator with a single source of right censoring and a 3-level
treatment.
Using the Kaplan-Meier function in the survival
pacakge,
we estimate the cumulative risk at 20 months:
standard = survfit(Surv(Ymin, D) ~ StatinPotency, data = test_data)
100*round(1 - summary(standard, times = c(20))$surv, 5)
## [1] 34.879 49.670 48.927
Using causalRisk
we should get the same estimated
cumulative risk within reported precision.
models = specify_models(identify_treatment(StatinPotency),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj3 = estimate_ipwrisk(example1, models,
label = c("Kaplan-Meier check"))
make_table2(obj3, risk_time = 20, risk_round = 3)
causalRisk
agrees with an
inverse-probability of treatment weighted Kaplan-Meier estimator with a
2-level treatment.
We assess the performance of causalRisk
and an IP
weighted Kaplan-Meier function in the survival
package in a
setting where they should both give consistent estimates (a null effect)
in a simulated example with confounding but no non-informative
censoring.
test_data = sim_func1(n=100000, eta1 = 0, eta3 = 0, seed = 123)
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
x = model.matrix(~DxRisk, data = test_data)
psout = glmnet::glmnet(x, test_data$Statin,
family="multinomial",
type.multinomial="grouped", lambda = 0)
ps = predict(psout, newx = x, type = "response", s = 0)[, 2,1]
iptw = (test_data$Statin=="Treat") / ps + (test_data$Statin=="Control") / (1-ps)
for(i in 1:2)
iptw =
ifelse(test_data$Statin == levels(test_data$Statin)[i],
iptw * length(iptw) /
sum(iptw[test_data$Statin == test_data$Statin[i]]), iptw)
standard = survfit(Surv(Ymin, D) ~ 1 + strata(Statin), data = test_data,
type="kaplan-meier", weight = iptw)
100 * round(1 - summary(standard, times = c(20))$surv, 5)
## [1] 58.229 58.358
Using causalRisk
we should get a result that is very
close and also at the null.
models = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(test_data, models,
times = c(20),
wt_type = "n",
label=c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round = 3)
causalRisk
with a 3-level treatment and
confounding.
Using the same simulated data, we assess the performance of
causalRisk
when treatment has three levels.
Using causalRisk
we should get a result that is also at
the null.
models = specify_models(identify_treatment(StatinPotency, ~DxRisk),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(test_data, models,
times=c(20),
label=c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round = 3)
causalRisk
agrees with a Kaplan-Meier
estimator with two sources of right censoring and a 2-level
treatment.
Using the Kaplan-Meier function in the survival
pacakge,
we estimate the cumulative risk at 20 months:
test_data = example1
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
test_data$Discon,
na.rm = TRUE)
test_data$D = ifelse(test_data$Ymin == test_data$Death, 1, 0)
test_data$D = ifelse(is.na(test_data$D),0,test_data$D)
standard = survfit(Surv(Ymin,D)~Statin, data = test_data)
100*round(1-summary(standard, times=c(20))$surv, 5)
## [1] 35.076 47.469
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_censoring(Discon),
identify_outcome(Death))
obj5 = estimate_ipwrisk(example1, models,
label=c("Kaplan-Meier check"))
make_table2(obj5, risk_time = 20, risk_round = 3)
causalRisk
agrees with a Kaplan-Meier
estimator with two sources of right censoring and a 2-level treatment
when some event times are not observed.
Here the expected behavior is for subjects with all missing outcomes
and censoring times to be censored at \(\tau\). Using the Kaplan-Meier function in
the survival
pacakge, we estimate the cumulative risk at 20
months:
tau = 10
test_data = example1
test_data$Ymin = pmin(test_data$Death,
test_data$Discon,
tau,
na.rm = TRUE)
test_data$D = ifelse(test_data$Ymin == test_data$Death, 1, 0)
test_data$D = ifelse(is.na(test_data$D),0,test_data$D)
standard = survfit(Surv(Ymin,D)~Statin, data = test_data)
100*round(1-summary(standard, times=c(tau))$surv, 5)
## [1] 7.686 15.587
Using causalRisk
we should get the same estimated
cumulative risk within reported precision.
models = specify_models(identify_treatment(Statin),
identify_censoring(Discon),
identify_outcome(Death))
obj5 = estimate_ipwrisk(example1, models, tau = tau,
label=c("Kaplan-Meier check"))
make_table2(obj5, risk_time = 10, risk_round = 3)
causalRisk
agrees with a Kaplan-Meier
estimator with two sources of right censoring and a 2-level treatment
when some event times are not observed and \(\tau\) is not specified
Here the expected behavior is for \(\tau\) to be set to the maximum observed
failure or censoring times and subjects with all missing outcomes and
censoring times to be censored at \(\tau\). Using the Kaplan-Meier function in
the survival
pacakge, we estimate the cumulative risk at 20
months:
tau = max(test_data$Death,
test_data$Discon,
na.rm = TRUE)
test_data = example1
test_data$Ymin = pmin(test_data$Death,
test_data$Discon,
tau,
na.rm = TRUE)
test_data$D = ifelse(test_data$Ymin == test_data$Death, 1, 0)
test_data$D = ifelse(is.na(test_data$D),0,test_data$D)
standard = survfit(Surv(Ymin,D)~Statin, data = test_data)
100*round(1-summary(standard, times=c(10))$surv, 5)
## [1] 7.686 15.587
Using causalRisk
we should get the same estimated
cumulative risk within reported precision.
models = specify_models(identify_treatment(Statin),
identify_censoring(Discon),
identify_outcome(Death))
obj5 = estimate_ipwrisk(example1, models,
label=c("Kaplan-Meier check"))
make_table2(obj5, risk_time = 10, risk_round = 3)
causalRisk
agrees with a Kaplan-Meier
estimator when there are tied event times.
We take the example data and round the death and censoring times to
one decimal place to create many tied event and censoring times. We then
use the Kaplan-Meier function in the survival
pacakge to
estimate the cumulative risk at 20 months:
test_data = example1
test_data$Death = round(test_data$Death,1)
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
standard = survfit(Surv(Ymin, D) ~ Statin, data = test_data)
100*round(1-summary(standard, times = c(20))$surv, 5)
## [1] 35.006 49.361
Using causalRisk
we estimate the same quantity and
should get the same estimate to the reported precision.
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj6 = estimate_ipwrisk(test_data, models,
label=c("Kaplan-Meier check"))
make_table2(obj6, risk_time = 20, risk_round =3)
causalRisk
agrees with a Kaplan-Meier
estimator when there are tied event and censoring times.
We take the example data and round the death times to one decimal
place to create many tied event times. We then use the Kaplan-Meier
function in the survival
pacakge to estimate the cumulative
risk at 20 months:
temp_data3 = example1
temp_data3$Death = round(temp_data3$Death, 1)
temp_data3$EndofEnrollment = round(temp_data3$EndofEnrollment, 1)
temp_data3$Ymin = pmin(temp_data3$Death,
temp_data3$EndofEnrollment,
na.rm = TRUE)
temp_data3$D = ifelse(!is.na(temp_data3$Death), 1, 0)
standard = survfit(Surv(Ymin,D)~Statin, data = temp_data3, type = "fh2")
100*round(1-summary(standard, times=c(20))$surv, 5)
## [1] 34.901 49.294
Using causalRisk
we get estimates that deviate only
slightly:
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj6 = estimate_ipwrisk(temp_data3, models,
label=c("Kaplan-Meier check"))
make_table2(obj6, risk_time = 20, risk_round =3)
causalRisk
correctly handles formulae for
composites censoring variables
The expected behavior is for causalRisk to take the formula associated with the first censoring event that is specified. Therefore the following analyses should return the same results.
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment, ~DxRisk),
identify_censoring(Discon, ~Frailty),
identify_outcome(Death))
obj1 = estimate_ipwrisk(example1, models,
label=c("1"))
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment, ~DxRisk),
identify_censoring(Discon),
identify_outcome(Death))
obj2 = estimate_ipwrisk(example1, models,
label=c("2"))
make_table2(obj1, obj2, risk_time = 20, risk_round =3)
causalRisk
agrees with a
non-parameteric cumulative risk estimator in the presence of a competing
event.
Using the non-parametric cumulative incidence function estimator in
the cmprsk
package, we estimate the cumulative risk of
CVD-related mortality at 20 months:
test_data = example1
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
test_data$D = ifelse(test_data$CvdDeath == 1 & test_data$D == 1, 2, test_data$D)
cr_out = cuminc(test_data$Ymin, test_data$D, cencode=0)
100*round(cmprsk::timepoints(cr_out, 20)$est[2], 5)
## [1] 22.684
Using causalRisk
we should get the same estimated
cumulative risk to the reported precision.
models = specify_models(
identify_censoring(EndofEnrollment),
identify_competing_risk(CvdDeath, event_value = 1),
identify_outcome(Death))
cr_obj = estimate_ipwrisk(example1, models,
label=c("Aalen-Johansen check"))
make_table2(cr_obj, risk_time = 20, risk_round =3)
Using the non-parametric cumulative incidence function estimator in
the cmprsk
package, we estimate the cumulative risk of
CVD-related mortality at 20 months in each treatment group in the
example data:
cr_out = cuminc(test_data$Ymin, test_data$D, test_data$Statin, cencode = 0)
100*round(cmprsk::timepoints(cr_out, 20)$est[c(3,4)], 5)
## [1] 17.661 24.223
Using causalRisk
we should get the same estimated
cumulative risk within reported precision.
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_competing_risk(CvdDeath, event_value = 1),
identify_outcome(Death))
cr_obj2 = estimate_ipwrisk(example1, models,
label=c("Aalen-Johansen check"))
make_table2(cr_obj2, risk_time = 20, risk_round =3)
causalRisk
agrees with a Kaplan-Meier
estimator with a single source of right censoring and a 2-level
treatment and a subgroup specified.
Using the Kaplan-Meier function in the survival
pacakge,
we estimate the cumulative risk at 20 months as:
test_data = example1
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
standard = survfit(Surv(Ymin, D) ~ Statin, data = test_data, subset = DxRisk<1)
100*round(1-summary(standard, times=c(20))$surv, 5)
## [1] 36.673 49.817
Using causalRisk
we should get the same estimated
cumulative risk within reported precision:
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(example1, models,
label=c("Kaplan-Meier check"),
subgroup = DxRisk < 1)
make_table2(obj, risk_time = 20, risk_round =3)
causalRisk
agrees with an
inverse-probability of treatment weighted Kaplan-Meier estimator with a
2-level treatment w/ trimming at the first percentile.
As before, we make censoring non-informative and the treatment
weights are normalized in causalRisk
:
test_data = sim_func1(n=100000, eta1 = 0, eta3 = 0, seed = 123)
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
x = model.matrix(~DxRisk, data = test_data)
psout = glmnet::glmnet(x, test_data$Statin,
family="multinomial",
type.multinomial="grouped", lambda = 0)
ps = predict(psout, newx = x, type = "response", s = 0)[, 2,1]
keeprecs = ifelse(ps > quantile(ps, 0.01) & ps < quantile(ps, 0.99), TRUE, FALSE)
ps = ps[keeprecs]
test_data_trim = test_data[keeprecs,]
table(test_data_trim$Statin)
##
## Control Treat
## 48978 49022
# re-estimate PS
x = model.matrix(~DxRisk, data = test_data_trim)
psout = glmnet::glmnet(x, test_data_trim$Statin,
family="multinomial",
type.multinomial="grouped", lambda = 0)
ps = predict(psout, newx = x, type = "response", s = 0)[, 2, 1]
iptw = (test_data_trim$Statin=="Treat") / ps + (test_data_trim$Statin=="Control") / (1-ps)
standard = survfit(Surv(Ymin, D) ~ Statin,
data = test_data_trim, weight = iptw)
100*round(1-summary(standard, times = c(20))$surv, 5)
## [1] 58.214 58.241
Using causalRisk
we should get a result that has the
same number of patients and the estimators are statistically
similar.
models = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(test_data, models,
wt_type = "n",
trim = 0.01,
label=c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round = 3)
causalRisk
agrees with an
inverse-probability of treatment weighted Kaplan-Meier estimator with a
2-level treatment w/ trimming at the first percentile w/ in a
subgroup
As before, we make censoring non-informative and the treatment
weights are normalized in causalRisk
:
test_data = sim_func1(n = 100000, eta1 = 0, eta3 = 0)
test_data = test_data %>%
filter(DxRisk < 1)
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
x = model.matrix(~DxRisk, data = test_data)
psout = glmnet::glmnet(x, test_data$Statin,
family="multinomial",
type.multinomial="grouped", lambda = 0)
ps = predict(psout, newx = x, type = "response", s = 0)[, 2,1]
keeprecs = ifelse(ps > quantile(ps, 0.01) & ps < quantile(ps, 0.99), TRUE, FALSE)
test_data_trim = test_data %>%
filter(keeprecs)
table(test_data_trim$Statin)
##
## Control Treat
## 34973 47461
# re-estimate PS
x = model.matrix(~DxRisk, data = test_data_trim)
psout = glmnet::glmnet(x, test_data_trim$Statin,
family="multinomial",
type.multinomial="grouped", lambda = 0)
ps = predict(psout, newx = x, type = "response", s = 0)[, 2,1]
iptw = (test_data_trim$Statin=="Treat") / ps + (test_data_trim$Statin=="Control") / (1-ps)
standard = survfit(Surv(Ymin, D) ~ Statin,
subset = DxRisk < 1,
data = test_data_trim,
weight = iptw)
100*round(1-summary(standard, times=c(20))$surv, 5)
## [1] 59.667 59.857
Using causalRisk
we should get a result with the same
number of patients and results in an estimate that is close to the KM
and at the null
models = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(test_data, models,
wt_type = "n",
trim = 0.01,
subgroup = DxRisk < 1,
label = c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round = 3)
causalRisk
with longitudinal data with
time-varying predictors of censoring
Here we use the data set example1_timevar
, which
involves longitudinal data with a time-varying predictor of censoring.
This data set is the same as example1
except that the data
are discretized into two intervals and DxRisk
is made to be
time-varying, by adding a random variate to DxRisk
that has
a small positive expectation. Adjusting for DxRisk
in the
propensity score model and Frailty
in the censoring model
should result in the same estimate as estimate_ipwrisk
applied to example1
:
tv_models = specify_models(identify_interval(start_name = Start,
end_name = Stop),
identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment, ~Frailty),
identify_outcome(Death),
identify_subject(id))
adj_tv = estimate_ipwrisk(example1_timevar,
tv_models,
tau = 100,
label = c("Adjusted-TV"))
make_table2(adj_tv, risk_time = 20, risk_round = 3)
models = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment, ~Frailty),
identify_outcome(Death),
identify_subject(id))
adj = estimate_ipwrisk(example1,
models,
tau = 100,
label=c("Adjusted-baseline"))
make_table2(adj, risk_time = 20, risk_round = 3)
If we now adjust for DxRisk
in the censoring model in
both the time-varying and time-invariant estimator, we expect to that we
should get two estimates that are unbiased, but numerically different
from the previous estimators as well as each other as the inclusion of
both the time-varying and time-invariant DxRisk
will
introduce some additional variability in the estimated censoring
weights.
tv_models = specify_models(identify_interval(start_name = Start,
end_name = Stop),
identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment, ~Frailty + DxRisk),
identify_outcome(Death),
identify_subject(id))
adj_tv = estimate_ipwrisk(example1_timevar,
tv_models,
tau = 100,
label=c("Adjusted-TV"))
make_table2(adj_tv, risk_time = 20, risk_round =3)
models = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment, ~Frailty + DxRisk),
identify_outcome(Death),
identify_subject(id))
adj = estimate_ipwrisk(example1,
models,
tau = 100,
label=c("Adjusted-baseline"))
make_table2(adj, risk_time = 20, risk_round = 3)
causalRisk
with data structured
longitudinally agrees with equivalent data structured
non-longitudinally
Here we use the longitudinal data set example1_timevar
,
which is equivalent to example1
.
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
results1 = estimate_ipwrisk(example1, models, tau = 20, label=c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
results2 = estimate_ipwrisk(example1_timevar, models, tau = 20, label=c("Time-varying"))
make_table2(results1, results2, risk_time = 20)
models = specify_models(identify_subject(id),
identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
results1 = estimate_ipwrisk(example1, models, tau = 20, label=c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_treatment(Statin),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
results2 = estimate_ipwrisk(example1_timevar, models, tau = 20,label=c("Time-varying"))
make_table2(results1, results2, risk_time = 20)
models = specify_models(identify_subject(id),
identify_treatment(Statin),
identify_censoring(EndofEnrollment, ~Frailty),
identify_outcome(Death))
results1 = estimate_ipwrisk(example1, models, tau = 20, label = c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_treatment(Statin),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment, ~Frailty),
identify_outcome(Death))
results2 = estimate_ipwrisk(example1_timevar, models, tau = 20, label = c("Time-varying"))
make_table2(results1, results2, risk_time = 20)
causalRisk
with data structured
longitudinally agrees with equivalent data structured non-longitudinally
when there are two sources or right censoring.
Here we use the longitudinal data set example1_timevar
,
which is equivalent to example1
.
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_censoring(Discon),
identify_outcome(Death))
results1 = estimate_ipwrisk(example1, models, tau = 20, label=c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_censoring(Discon),
identify_outcome(Death))
results2 = estimate_ipwrisk(example1_timevar, models, tau = 20, label=c("Time-varying"))
make_table2(results1, results2, risk_time = 20)
Here again we compare results from the longitudinal data set
example1_timevar
, which is equivalent to
example1
.
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
results1 = estimate_ipwrisk(example1, models, subgroup = Frailty < 0,
tau = 20, label=c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
results2 = estimate_ipwrisk(example1_timevar, models, subgroup = Frailty < 0,
tau = 20, label=c("Time-varying"))
make_table2(results1, results2, risk_time = 20)
When supplied by a user observation weights should be normalized to sum to \(n\). Therefore, if any constant is used as an observation weight, the weighted analysis should be identical to an unweighted analysis.
temp = example1
temp$obs_weight = rep(10,nrow(temp))
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
unweighted = estimate_ipwrisk(temp,
models,
labels = c("Unweighted"))
weighted = estimate_ipwrisk(temp,
models,
labels = c("Weighted"),
wt = obs_weight)
make_table2(unweighted, weighted, risk_time = 20)
The same should be true within a subgroup:
temp = example1
temp$obs_weight = rep(10,nrow(temp))
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
unweighted = estimate_ipwrisk(temp,
models,
subgroup = Frailty < 0,
labels = c("Unweighted"))
weighted = estimate_ipwrisk(temp,
models,
subgroup = Frailty < 0,
labels = c("Weighted"),
wt = obs_weight)
make_table2(unweighted, weighted, risk_time = 20)
Also, if random uniform weights are supplied as observation weights, the weighted analysis should result in an estimator that has the same expectation and variance.
temp = example1
temp$obs_weight = runif(10000)
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
unweighted = estimate_ipwrisk(temp,
models,
labels = c("Unweighted"))
weighted = estimate_ipwrisk(temp,
models,
labels = c("Weighted"),
wt = obs_weight)
make_table2(unweighted, weighted, risk_time = 20)
When applied to longitudinal data, weights should be normalized to sum to \(n\), not the number of rows in a longitudinal dataset.
temp = example1
temp$obs_weight = runif(10000)
temp_timevar = temp %>% dplyr::select(id, obs_weight) %>%
dplyr::right_join(example1_timevar)
## Joining with `by = join_by(id)`
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
weighted = estimate_ipwrisk(temp,
models,
tau = 30,
labels = c("Weighted, time-fixed"),
wt = obs_weight)
models = specify_models(identify_subject(id),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
weighted_long = estimate_ipwrisk(temp_timevar,
models,
tau = 30,
labels = c("Weighted, time-varying"),
wt = obs_weight)
make_table2(weighted, weighted_long, risk_time = 20)
Though there are very few existing implementations of augmented
inverse probability weighted (AIPW) estimators to compare to, the
authors of the reference we used to develop our efficient AIPW estimator
have an R package, riskRegression
, that we can benchmark
against. Here, we provide several comparisons between our results and
the riskRegression
package.
To begin, we compare the results between causalRisk
and
riskRegression
when null models are used for all of the
nuisance models. To improve runtime, we will use a sample of the
example1
dataset included in causalRisk
for
comparison. As can be seen below, the results are extremely similar
between the two packages.
# Sample Data
set.seed(7368)
dat <- example1 %>% sample_frac(.1)
# Conduct analysis with causalRisk
models_null = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
fit_aipw_null = estimate_aipwrisk(dat,
models_null,
fully_efficient = TRUE,
times = 24,
tau_censor = TRUE,
labels = c("Null models"))
make_table2(fit_aipw_null)
# Conduct analysis with riskRegression
dat2 <- dat %>%
mutate(t = pmin(Death, EndofEnrollment, 24, na.rm = TRUE),
outcome = ifelse(is.na(Death), 0, t == Death),
censor = ifelse(is.na(EndofEnrollment), 0, t == EndofEnrollment))
censmod <- survival::coxph(Surv(t, outcome==0) ~ strata(Statin), data = dat2, x = TRUE)
outmod <- survival::coxph(Surv(t, outcome) ~ strata(Statin), data = dat2, x = TRUE)
psmod <- glm(Statin ~ 1, family = binomial(link = "logit"), data =dat2)
result <- ate(event = outmod,
treatment = psmod,
censor = censmod,
estimator = "AIPTW",
data = dat2,
times = 24,
cause = 1,
verbose = FALSE)
result$diffRisk %>% kable()
estimator | time | A | estimate.A | B | estimate.B | estimate | se | lower | upper | p.value |
---|---|---|---|---|---|---|---|---|---|---|
AIPTW | 24 | Control | 0.4285414 | Treat | 0.555398 | 0.1268566 | 0.0563232 | 0.0164652 | 0.237248 | 0.0243034 |
Unfortunately, it is impossible to specify the models for
riskRegression
the same way as in causalRisk
.
In causalRisk
, the censoring and outcome models are fit
seperately in each treatment group. In riskRegression
,
treatment is included as a covariate in the models. The closest we can
get is to stratify the models by treatment for
riskRegression
. With that, the results are still very
similar to causalRisk
.
# Conduct analysis with causalRisk
models_adj = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment, ~DxRisk),
identify_outcome(Death, ~DxRisk))
fit_aipw_adj = estimate_aipwrisk(dat,
models_adj,
fully_efficient = TRUE,
times = 24,
tau_censor = TRUE,
labels = c("Adjusted models"))
make_table2(fit_aipw_adj)
# Conduct analysis with riskRegression
censmod <- survival::coxph(Surv(t, outcome==0) ~ DxRisk + strata(Statin), data = dat2, x = TRUE)
outmod <- survival::coxph(Surv(t, outcome) ~ DxRisk + strata(Statin), data = dat2, x = TRUE)
psmod <- glm(Statin ~ DxRisk, family = binomial(link = "logit"), data =dat2)
result2 <- ate(event = outmod,
treatment = psmod,
censor = censmod,
estimator = "AIPTW",
data = dat2,
times = 24,
cause = 1,
verbose = FALSE)
result2$diffRisk %>% kable()
estimator | time | A | estimate.A | B | estimate.B | estimate | se | lower | upper | p.value |
---|---|---|---|---|---|---|---|---|---|---|
AIPTW | 24 | Control | 0.529929 | Treat | 0.5426873 | 0.0127583 | 0.0837316 | -0.1513527 | 0.1768693 | 0.8788941 |
Again, there is a difference in how riskRegression
and
causalRisk
handle competing risks.
riskRegression
uses a multistate Cox model, while
causalRisk
uses a Fine and Gray model. Nonetheless, the
results remain similar.
# Conduct analysis with causalRisk
models_cr = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment, ~DxRisk),
identify_outcome(Death, ~DxRisk),
identify_competing_risk(CvdDeath))
fit_aipw_cr = estimate_aipwrisk(dat,
models_cr,
fully_efficient = TRUE,
times = 24,
tau_censor = TRUE,
labels = c("Adjusted models"))
make_table2(fit_aipw_cr)
# Setup data
dat3 <- dat2 %>%
mutate(outcome = ifelse(outcome == 0, 0,
ifelse(CvdDeath == 0, 1, 2)),
outcome = factor(outcome))
# Conduct analysis with riskRegression
censmod <- survival::coxph(Surv(t, outcome == 0) ~ DxRisk + strata(Statin), data = dat3, x = TRUE, y = TRUE)
outmod <- riskRegression::CSC(Hist(t, outcome) ~ DxRisk + strata(Statin), data = dat3)
psmod <- glm(Statin ~ DxRisk, family = binomial(link = "logit"), data =dat3)
result3 <- ate(event = outmod,
treatment = psmod,
censor = censmod,
estimator = "AIPTW",
data = dat3,
times = 24,
cause = 2,
verbose = FALSE)
result3$diffRisk %>% kable()
estimator | time | A | estimate.A | B | estimate.B | estimate | se | lower | upper | p.value |
---|---|---|---|---|---|---|---|---|---|---|
AIPTW | 24 | Control | 0.4098683 | Treat | 0.2475148 | -0.1623535 | 0.0905999 | -0.3399261 | 0.0152191 | 0.0731358 |
make_table1
applied to the same models fit to
example1
and example1_timevar
should result in
identical tables.
models = specify_models(identify_subject(id),
identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
time_fixed = estimate_ipwrisk(example1,
models,
tau = 30,
labels = c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_treatment(Statin),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
time_varying = estimate_ipwrisk(example1_timevar,
models,
tau = 30,
labels = c("Time-varying"))
make_table1(time_fixed, DxRisk, Sex, smd = TRUE)
Characteristic1 | Control n=10000.00 |
Treat n=10000.00 |
SMD |
---|---|---|---|
DxRisk, Mean (SD) | 0.74 (0.85) | -0.35 (0.88) | 1.26 |
Sex | 0.02 | ||
Female | 4939 49.39 | 4835 48.35 | |
Male | 5061 50.61 | 5165 51.65 | |
1 All values are N (%) unless otherwise specified |
make_table1(time_varying, DxRisk, Sex, smd = TRUE)
Characteristic1 | Control n=10000.00 |
Treat n=10000.00 |
SMD |
---|---|---|---|
DxRisk, Mean (SD) | 0.74 (0.85) | -0.35 (0.88) | 1.26 |
Sex | 0.02 | ||
Female | 4939 49.39 | 4835 48.35 | |
Male | 5061 50.61 | 5165 51.65 | |
1 All values are N (%) unless otherwise specified |
models = specify_models(identify_subject(id),
identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
time_fixed = estimate_ipwrisk(example1,
models,
tau = 30,
labels = c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_treatment(Statin, ~DxRisk),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
time_varying = estimate_ipwrisk(example1_timevar,
models,
tau = 30,
labels = c("Time-varying"))
make_table1(time_fixed, DxRisk, Sex, smd = TRUE)
Characteristic1 | Control n=9953.53 |
Treat n=9908.35 |
SMD |
---|---|---|---|
DxRisk, Mean (SD) | 0.02 (0.97) | -0.03 (0.97) | 0.05 |
Sex | 0 | ||
Female | 4815 48.37 | 4769 48.13 | |
Male | 5139 51.63 | 5139 51.87 | |
1 All values are N (%) unless otherwise specified |
make_table1(time_varying, DxRisk, Sex, smd = TRUE)
Characteristic1 | Control n=9953.53 |
Treat n=9908.35 |
SMD |
---|---|---|---|
DxRisk, Mean (SD) | 0.02 (0.97) | -0.03 (0.97) | 0.05 |
Sex | 0 | ||
Female | 4815 48.37 | 4769 48.13 | |
Male | 5139 51.63 | 5139 51.87 | |
1 All values are N (%) unless otherwise specified |