estimator_check.RmdTo assess the validity of the estimators in the
causalRisk package, we compare them to a variety of
widely-used estimators from standard R packages. Note that because most
of the functionality in the causalRisk package is not
implemented in existing packages, we can only assess the validity of
some of the functions.
causalRisk agrees with a Kaplan-Meier
estimator with a single source of right censoring and no treatment.
Using the Kaplan-Meier function in the survival pacakge,
we estimate the cumulative risk of mortality by 20 months:
test_data = example1
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
standard = survfit(Surv(Ymin,D)~1, data = test_data)
100*round(1-summary(standard, times=c(20))$surv, 5)## [1] 45.886
Using the causalRisk package, we should get an estimated
cumulative risk with similar reported precision.
models = specify_models(identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(example1, models,
label = c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round = 3)causalRisk agrees with a Kaplan-Meier
estimator with a single source of right censoring and a 2-level
treatment.
Using the Kaplan-Meier function in the survival pacakge,
we estimate the cumulative risk at 20 months as:
standard = survfit(Surv(Ymin,D) ~ Statin, data = test_data)
100*round(1 - summary(standard, times = c(20))$surv, 5)## [1] 34.879 49.293
Using causalRisk we should get the same estimated
cumulative risk within reported precision:
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(example1, models,
label=c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round =3)causalRisk agrees with a Kaplan-Meier
estimator with a single source of right censoring and a 2-level
treatment and maximum follow-up set.
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(example1, models, tau = 21,
label=c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round = 3)causalRisk agrees with a Kaplan-Meier
estimator with a single source of right censoring and a 3-level
treatment.
Using the Kaplan-Meier function in the survival pacakge,
we estimate the cumulative risk at 20 months:
standard = survfit(Surv(Ymin, D) ~ StatinPotency, data = test_data)
100*round(1 - summary(standard, times = c(20))$surv, 5)## [1] 34.879 49.670 48.927
Using causalRisk we should get the same estimated
cumulative risk within reported precision.
models = specify_models(identify_treatment(StatinPotency),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj3 = estimate_ipwrisk(example1, models,
label = c("Kaplan-Meier check"))
make_table2(obj3, risk_time = 20, risk_round = 3)causalRisk agrees with an
inverse-probability of treatment weighted Kaplan-Meier estimator with a
2-level treatment.
We assess the performance of causalRisk and an IP
weighted Kaplan-Meier function in the survival package in a
setting where they should both give consistent estimates (a null effect)
in a simulated example with confounding but no non-informative
censoring.
test_data = sim_func1(n=100000, eta1 = 0, eta3 = 0, seed = 123)
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
x = model.matrix(~DxRisk, data = test_data)
psout = glmnet::glmnet(x, test_data$Statin,
family="multinomial",
type.multinomial="grouped", lambda = 0)
ps = predict(psout, newx = x, type = "response", s = 0)[, 2,1]
iptw = (test_data$Statin=="Treat") / ps + (test_data$Statin=="Control") / (1-ps)
for(i in 1:2)
iptw =
ifelse(test_data$Statin == levels(test_data$Statin)[i],
iptw * length(iptw) /
sum(iptw[test_data$Statin == test_data$Statin[i]]), iptw)
standard = survfit(Surv(Ymin, D) ~ 1 + strata(Statin), data = test_data,
type="kaplan-meier", weight = iptw)
100 * round(1 - summary(standard, times = c(20))$surv, 5)## [1] 58.229 58.358
Using causalRisk we should get a result that is very
close and also at the null.
models = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(test_data, models,
times = c(20),
wt_type = "n",
label=c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round = 3)causalRisk with a 3-level treatment and
confounding.
Using the same simulated data, we assess the performance of
causalRisk when treatment has three levels.
Using causalRisk we should get a result that is also at
the null.
models = specify_models(identify_treatment(StatinPotency, ~DxRisk),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(test_data, models,
times=c(20),
label=c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round = 3)causalRisk agrees with a Kaplan-Meier
estimator with two sources of right censoring and a 2-level
treatment.
Using the Kaplan-Meier function in the survival pacakge,
we estimate the cumulative risk at 20 months:
test_data = example1
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
test_data$Discon,
na.rm = TRUE)
test_data$D = ifelse(test_data$Ymin == test_data$Death, 1, 0)
test_data$D = ifelse(is.na(test_data$D),0,test_data$D)
standard = survfit(Surv(Ymin,D)~Statin, data = test_data)
100*round(1-summary(standard, times=c(20))$surv, 5)## [1] 35.076 47.469
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_censoring(Discon),
identify_outcome(Death))
obj5 = estimate_ipwrisk(example1, models,
label=c("Kaplan-Meier check"))
make_table2(obj5, risk_time = 20, risk_round = 3)causalRisk agrees with a Kaplan-Meier
estimator with two sources of right censoring and a 2-level treatment
when some event times are not observed.
Here the expected behavior is for subjects with all missing outcomes
and censoring times to be censored at \(\tau\). Using the Kaplan-Meier function in
the survival pacakge, we estimate the cumulative risk at 20
months:
tau = 10
test_data = example1
test_data$Ymin = pmin(test_data$Death,
test_data$Discon,
tau,
na.rm = TRUE)
test_data$D = ifelse(test_data$Ymin == test_data$Death, 1, 0)
test_data$D = ifelse(is.na(test_data$D),0,test_data$D)
standard = survfit(Surv(Ymin,D)~Statin, data = test_data)
100*round(1-summary(standard, times=c(tau))$surv, 5)## [1] 7.686 15.587
Using causalRisk we should get the same estimated
cumulative risk within reported precision.
models = specify_models(identify_treatment(Statin),
identify_censoring(Discon),
identify_outcome(Death))
obj5 = estimate_ipwrisk(example1, models, tau = tau,
label=c("Kaplan-Meier check"))
make_table2(obj5, risk_time = 10, risk_round = 3)causalRisk agrees with a Kaplan-Meier
estimator with two sources of right censoring and a 2-level treatment
when some event times are not observed and \(\tau\) is not specified
Here the expected behavior is for \(\tau\) to be set to the maximum observed
failure or censoring times and subjects with all missing outcomes and
censoring times to be censored at \(\tau\). Using the Kaplan-Meier function in
the survival pacakge, we estimate the cumulative risk at 20
months:
tau = max(test_data$Death,
test_data$Discon,
na.rm = TRUE)
test_data = example1
test_data$Ymin = pmin(test_data$Death,
test_data$Discon,
tau,
na.rm = TRUE)
test_data$D = ifelse(test_data$Ymin == test_data$Death, 1, 0)
test_data$D = ifelse(is.na(test_data$D),0,test_data$D)
standard = survfit(Surv(Ymin,D)~Statin, data = test_data)
100*round(1-summary(standard, times=c(10))$surv, 5)## [1] 7.686 15.587
Using causalRisk we should get the same estimated
cumulative risk within reported precision.
models = specify_models(identify_treatment(Statin),
identify_censoring(Discon),
identify_outcome(Death))
obj5 = estimate_ipwrisk(example1, models,
label=c("Kaplan-Meier check"))
make_table2(obj5, risk_time = 10, risk_round = 3)causalRisk agrees with a Kaplan-Meier
estimator when there are tied event times.
We take the example data and round the death and censoring times to
one decimal place to create many tied event and censoring times. We then
use the Kaplan-Meier function in the survival pacakge to
estimate the cumulative risk at 20 months:
test_data = example1
test_data$Death = round(test_data$Death,1)
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
standard = survfit(Surv(Ymin, D) ~ Statin, data = test_data)
100*round(1-summary(standard, times = c(20))$surv, 5)## [1] 35.006 49.361
Using causalRisk we estimate the same quantity and
should get the same estimate to the reported precision.
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj6 = estimate_ipwrisk(test_data, models,
label=c("Kaplan-Meier check"))
make_table2(obj6, risk_time = 20, risk_round =3)causalRisk agrees with a Kaplan-Meier
estimator when there are tied event and censoring times.
We take the example data and round the death times to one decimal
place to create many tied event times. We then use the Kaplan-Meier
function in the survival pacakge to estimate the cumulative
risk at 20 months:
temp_data3 = example1
temp_data3$Death = round(temp_data3$Death, 1)
temp_data3$EndofEnrollment = round(temp_data3$EndofEnrollment, 1)
temp_data3$Ymin = pmin(temp_data3$Death,
temp_data3$EndofEnrollment,
na.rm = TRUE)
temp_data3$D = ifelse(!is.na(temp_data3$Death), 1, 0)
standard = survfit(Surv(Ymin,D)~Statin, data = temp_data3, type = "fh2")
100*round(1-summary(standard, times=c(20))$surv, 5)## [1] 34.901 49.294
Using causalRisk we get estimates that deviate only
slightly:
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj6 = estimate_ipwrisk(temp_data3, models,
label=c("Kaplan-Meier check"))
make_table2(obj6, risk_time = 20, risk_round =3)causalRisk correctly handles formulae for
composites censoring variables
The expected behavior is for causalRisk to take the formula associated with the first censoring event that is specified. Therefore the following analyses should return the same results.
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment, ~DxRisk),
identify_censoring(Discon, ~Frailty),
identify_outcome(Death))
obj1 = estimate_ipwrisk(example1, models,
label=c("1"))
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment, ~DxRisk),
identify_censoring(Discon),
identify_outcome(Death))
obj2 = estimate_ipwrisk(example1, models,
label=c("2"))
make_table2(obj1, obj2, risk_time = 20, risk_round =3)causalRisk agrees with a
non-parameteric cumulative risk estimator in the presence of a competing
event.
Using the non-parametric cumulative incidence function estimator in
the cmprsk package, we estimate the cumulative risk of
CVD-related mortality at 20 months:
test_data = example1
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
test_data$D = ifelse(test_data$CvdDeath == 1 & test_data$D == 1, 2, test_data$D)
cr_out = cuminc(test_data$Ymin, test_data$D, cencode=0)
100*round(cmprsk::timepoints(cr_out, 20)$est[2], 5)## [1] 22.684
Using causalRisk we should get the same estimated
cumulative risk to the reported precision.
models = specify_models(
identify_censoring(EndofEnrollment),
identify_competing_risk(CvdDeath, event_value = 1),
identify_outcome(Death))
cr_obj = estimate_ipwrisk(example1, models,
label=c("Aalen-Johansen check"))
make_table2(cr_obj, risk_time = 20, risk_round =3)Using the non-parametric cumulative incidence function estimator in
the cmprsk package, we estimate the cumulative risk of
CVD-related mortality at 20 months in each treatment group in the
example data:
cr_out = cuminc(test_data$Ymin, test_data$D, test_data$Statin, cencode = 0)
100*round(cmprsk::timepoints(cr_out, 20)$est[c(3,4)], 5)## [1] 17.661 24.223
Using causalRisk we should get the same estimated
cumulative risk within reported precision.
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_competing_risk(CvdDeath, event_value = 1),
identify_outcome(Death))
cr_obj2 = estimate_ipwrisk(example1, models,
label=c("Aalen-Johansen check"))
make_table2(cr_obj2, risk_time = 20, risk_round =3)causalRisk agrees with a Kaplan-Meier
estimator with a single source of right censoring and a 2-level
treatment and a subgroup specified.
Using the Kaplan-Meier function in the survival pacakge,
we estimate the cumulative risk at 20 months as:
test_data = example1
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
standard = survfit(Surv(Ymin, D) ~ Statin, data = test_data, subset = DxRisk<1)
100*round(1-summary(standard, times=c(20))$surv, 5)## [1] 36.673 49.817
Using causalRisk we should get the same estimated
cumulative risk within reported precision:
models = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(example1, models,
label=c("Kaplan-Meier check"),
subgroup = DxRisk < 1)
make_table2(obj, risk_time = 20, risk_round =3)causalRisk agrees with an
inverse-probability of treatment weighted Kaplan-Meier estimator with a
2-level treatment w/ trimming at the first percentile.
As before, we make censoring non-informative and the treatment
weights are normalized in causalRisk:
test_data = sim_func1(n=100000, eta1 = 0, eta3 = 0, seed = 123)
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
x = model.matrix(~DxRisk, data = test_data)
psout = glmnet::glmnet(x, test_data$Statin,
family="multinomial",
type.multinomial="grouped", lambda = 0)
ps = predict(psout, newx = x, type = "response", s = 0)[, 2,1]
keeprecs = ifelse(ps > quantile(ps, 0.01) & ps < quantile(ps, 0.99), TRUE, FALSE)
ps = ps[keeprecs]
test_data_trim = test_data[keeprecs,]
table(test_data_trim$Statin)##
## Control Treat
## 48978 49022
# re-estimate PS
x = model.matrix(~DxRisk, data = test_data_trim)
psout = glmnet::glmnet(x, test_data_trim$Statin,
family="multinomial",
type.multinomial="grouped", lambda = 0)
ps = predict(psout, newx = x, type = "response", s = 0)[, 2, 1]
iptw = (test_data_trim$Statin=="Treat") / ps + (test_data_trim$Statin=="Control") / (1-ps)
standard = survfit(Surv(Ymin, D) ~ Statin,
data = test_data_trim, weight = iptw)
100*round(1-summary(standard, times = c(20))$surv, 5)## [1] 58.214 58.241
Using causalRisk we should get a result that has the
same number of patients and the estimators are statistically
similar.
models = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(test_data, models,
wt_type = "n",
trim = 0.01,
label=c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round = 3)causalRisk agrees with an
inverse-probability of treatment weighted Kaplan-Meier estimator with a
2-level treatment w/ trimming at the first percentile w/ in a
subgroup
As before, we make censoring non-informative and the treatment
weights are normalized in causalRisk:
test_data = sim_func1(n = 100000, eta1 = 0, eta3 = 0)
test_data = test_data %>%
filter(DxRisk < 1)
test_data$Ymin = pmin(test_data$Death,
test_data$EndofEnrollment,
na.rm = TRUE)
test_data$D = ifelse(!is.na(test_data$Death), 1, 0)
x = model.matrix(~DxRisk, data = test_data)
psout = glmnet::glmnet(x, test_data$Statin,
family="multinomial",
type.multinomial="grouped", lambda = 0)
ps = predict(psout, newx = x, type = "response", s = 0)[, 2,1]
keeprecs = ifelse(ps > quantile(ps, 0.01) & ps < quantile(ps, 0.99), TRUE, FALSE)
test_data_trim = test_data %>%
filter(keeprecs)
table(test_data_trim$Statin)##
## Control Treat
## 34973 47461
# re-estimate PS
x = model.matrix(~DxRisk, data = test_data_trim)
psout = glmnet::glmnet(x, test_data_trim$Statin,
family="multinomial",
type.multinomial="grouped", lambda = 0)
ps = predict(psout, newx = x, type = "response", s = 0)[, 2,1]
iptw = (test_data_trim$Statin=="Treat") / ps + (test_data_trim$Statin=="Control") / (1-ps)
standard = survfit(Surv(Ymin, D) ~ Statin,
subset = DxRisk < 1,
data = test_data_trim,
weight = iptw)
100*round(1-summary(standard, times=c(20))$surv, 5)## [1] 59.667 59.857
Using causalRisk we should get a result with the same
number of patients and results in an estimate that is close to the KM
and at the null
models = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
obj = estimate_ipwrisk(test_data, models,
wt_type = "n",
trim = 0.01,
subgroup = DxRisk < 1,
label = c("Kaplan-Meier check"))
make_table2(obj, risk_time = 20, risk_round = 3)causalRisk with longitudinal data with
time-varying predictors of censoring
Here we use the data set example1_timevar, which
involves longitudinal data with a time-varying predictor of censoring.
This data set is the same as example1 except that the data
are discretized into two intervals and DxRisk is made to be
time-varying, by adding a random variate to DxRisk that has
a small positive expectation. Adjusting for DxRisk in the
propensity score model and Frailty in the censoring model
should result in the same estimate as estimate_ipwrisk
applied to example1:
tv_models = specify_models(identify_interval(start_name = Start,
end_name = Stop),
identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment, ~Frailty),
identify_outcome(Death),
identify_subject(id))
adj_tv = estimate_ipwrisk(example1_timevar,
tv_models,
tau = 100,
label = c("Adjusted-TV"))
make_table2(adj_tv, risk_time = 20, risk_round = 3)
models = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment, ~Frailty),
identify_outcome(Death),
identify_subject(id))
adj = estimate_ipwrisk(example1,
models,
tau = 100,
label=c("Adjusted-baseline"))
make_table2(adj, risk_time = 20, risk_round = 3)If we now adjust for DxRisk in the censoring model in
both the time-varying and time-invariant estimator, we expect to that we
should get two estimates that are unbiased, but numerically different
from the previous estimators as well as each other as the inclusion of
both the time-varying and time-invariant DxRisk will
introduce some additional variability in the estimated censoring
weights.
tv_models = specify_models(identify_interval(start_name = Start,
end_name = Stop),
identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment, ~Frailty + DxRisk),
identify_outcome(Death),
identify_subject(id))
adj_tv = estimate_ipwrisk(example1_timevar,
tv_models,
tau = 100,
label=c("Adjusted-TV"))
make_table2(adj_tv, risk_time = 20, risk_round =3)
models = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment, ~Frailty + DxRisk),
identify_outcome(Death),
identify_subject(id))
adj = estimate_ipwrisk(example1,
models,
tau = 100,
label=c("Adjusted-baseline"))
make_table2(adj, risk_time = 20, risk_round = 3)causalRisk with data structured
longitudinally agrees with equivalent data structured
non-longitudinally
Here we use the longitudinal data set example1_timevar,
which is equivalent to example1.
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
results1 = estimate_ipwrisk(example1, models, tau = 20, label=c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
results2 = estimate_ipwrisk(example1_timevar, models, tau = 20, label=c("Time-varying"))
make_table2(results1, results2, risk_time = 20)
models = specify_models(identify_subject(id),
identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
results1 = estimate_ipwrisk(example1, models, tau = 20, label=c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_treatment(Statin),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
results2 = estimate_ipwrisk(example1_timevar, models, tau = 20,label=c("Time-varying"))
make_table2(results1, results2, risk_time = 20)
models = specify_models(identify_subject(id),
identify_treatment(Statin),
identify_censoring(EndofEnrollment, ~Frailty),
identify_outcome(Death))
results1 = estimate_ipwrisk(example1, models, tau = 20, label = c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_treatment(Statin),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment, ~Frailty),
identify_outcome(Death))
results2 = estimate_ipwrisk(example1_timevar, models, tau = 20, label = c("Time-varying"))
make_table2(results1, results2, risk_time = 20)causalRisk with data structured
longitudinally agrees with equivalent data structured non-longitudinally
when there are two sources or right censoring.
Here we use the longitudinal data set example1_timevar,
which is equivalent to example1.
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_censoring(Discon),
identify_outcome(Death))
results1 = estimate_ipwrisk(example1, models, tau = 20, label=c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_censoring(Discon),
identify_outcome(Death))
results2 = estimate_ipwrisk(example1_timevar, models, tau = 20, label=c("Time-varying"))
make_table2(results1, results2, risk_time = 20)Here again we compare results from the longitudinal data set
example1_timevar, which is equivalent to
example1.
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
results1 = estimate_ipwrisk(example1, models, subgroup = Frailty < 0,
tau = 20, label=c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
results2 = estimate_ipwrisk(example1_timevar, models, subgroup = Frailty < 0,
tau = 20, label=c("Time-varying"))
make_table2(results1, results2, risk_time = 20)When supplied by a user observation weights should be normalized to sum to \(n\). Therefore, if any constant is used as an observation weight, the weighted analysis should be identical to an unweighted analysis.
temp = example1
temp$obs_weight = rep(10,nrow(temp))
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
unweighted = estimate_ipwrisk(temp,
models,
labels = c("Unweighted"))
weighted = estimate_ipwrisk(temp,
models,
labels = c("Weighted"),
wt = obs_weight)
make_table2(unweighted, weighted, risk_time = 20)The same should be true within a subgroup:
temp = example1
temp$obs_weight = rep(10,nrow(temp))
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
unweighted = estimate_ipwrisk(temp,
models,
subgroup = Frailty < 0,
labels = c("Unweighted"))
weighted = estimate_ipwrisk(temp,
models,
subgroup = Frailty < 0,
labels = c("Weighted"),
wt = obs_weight)
make_table2(unweighted, weighted, risk_time = 20)Also, if random uniform weights are supplied as observation weights, the weighted analysis should result in an estimator that has the same expectation and variance.
temp = example1
temp$obs_weight = runif(10000)
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
unweighted = estimate_ipwrisk(temp,
models,
labels = c("Unweighted"))
weighted = estimate_ipwrisk(temp,
models,
labels = c("Weighted"),
wt = obs_weight)
make_table2(unweighted, weighted, risk_time = 20)When applied to longitudinal data, weights should be normalized to sum to \(n\), not the number of rows in a longitudinal dataset.
temp = example1
temp$obs_weight = runif(10000)
temp_timevar = temp %>% dplyr::select(id, obs_weight) %>%
dplyr::right_join(example1_timevar)## Joining with `by = join_by(id)`
models = specify_models(identify_subject(id),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
weighted = estimate_ipwrisk(temp,
models,
tau = 30,
labels = c("Weighted, time-fixed"),
wt = obs_weight)
models = specify_models(identify_subject(id),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
weighted_long = estimate_ipwrisk(temp_timevar,
models,
tau = 30,
labels = c("Weighted, time-varying"),
wt = obs_weight)
make_table2(weighted, weighted_long, risk_time = 20)Though there are very few existing implementations of augmented
inverse probability weighted (AIPW) estimators to compare to, the
authors of the reference we used to develop our efficient AIPW estimator
have an R package, riskRegression, that we can benchmark
against. Here, we provide several comparisons between our results and
the riskRegression package.
To begin, we compare the results between causalRisk and
riskRegression when null models are used for all of the
nuisance models. To improve runtime, we will use a sample of the
example1 dataset included in causalRisk for
comparison. As can be seen below, the results are extremely similar
between the two packages.
# Sample Data
set.seed(7368)
dat <- example1 %>% sample_frac(.1)
# Conduct analysis with causalRisk
models_null = specify_models(identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
fit_aipw_null = estimate_aipwrisk(dat,
models_null,
fully_efficient = TRUE,
times = 24,
tau_censor = TRUE,
labels = c("Null models"))
make_table2(fit_aipw_null)
# Conduct analysis with riskRegression
dat2 <- dat %>%
mutate(t = pmin(Death, EndofEnrollment, 24, na.rm = TRUE),
outcome = ifelse(is.na(Death), 0, t == Death),
censor = ifelse(is.na(EndofEnrollment), 0, t == EndofEnrollment))
censmod <- survival::coxph(Surv(t, outcome==0) ~ strata(Statin), data = dat2, x = TRUE)
outmod <- survival::coxph(Surv(t, outcome) ~ strata(Statin), data = dat2, x = TRUE)
psmod <- glm(Statin ~ 1, family = binomial(link = "logit"), data =dat2)
result <- ate(event = outmod,
treatment = psmod,
censor = censmod,
estimator = "AIPTW",
data = dat2,
times = 24,
cause = 1,
verbose = FALSE)
result$diffRisk %>% kable()| estimator | time | A | estimate.A | B | estimate.B | estimate | se | lower | upper | p.value |
|---|---|---|---|---|---|---|---|---|---|---|
| AIPTW | 24 | Control | 0.4285414 | Treat | 0.555398 | 0.1268566 | 0.0563232 | 0.0164652 | 0.237248 | 0.0243034 |
Unfortunately, it is impossible to specify the models for
riskRegression the same way as in causalRisk.
In causalRisk, the censoring and outcome models are fit
seperately in each treatment group. In riskRegression,
treatment is included as a covariate in the models. The closest we can
get is to stratify the models by treatment for
riskRegression. With that, the results are still very
similar to causalRisk.
# Conduct analysis with causalRisk
models_adj = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment, ~DxRisk),
identify_outcome(Death, ~DxRisk))
fit_aipw_adj = estimate_aipwrisk(dat,
models_adj,
fully_efficient = TRUE,
times = 24,
tau_censor = TRUE,
labels = c("Adjusted models"))
make_table2(fit_aipw_adj)
# Conduct analysis with riskRegression
censmod <- survival::coxph(Surv(t, outcome==0) ~ DxRisk + strata(Statin), data = dat2, x = TRUE)
outmod <- survival::coxph(Surv(t, outcome) ~ DxRisk + strata(Statin), data = dat2, x = TRUE)
psmod <- glm(Statin ~ DxRisk, family = binomial(link = "logit"), data =dat2)
result2 <- ate(event = outmod,
treatment = psmod,
censor = censmod,
estimator = "AIPTW",
data = dat2,
times = 24,
cause = 1,
verbose = FALSE)
result2$diffRisk %>% kable()| estimator | time | A | estimate.A | B | estimate.B | estimate | se | lower | upper | p.value |
|---|---|---|---|---|---|---|---|---|---|---|
| AIPTW | 24 | Control | 0.529929 | Treat | 0.5426873 | 0.0127583 | 0.0837316 | -0.1513527 | 0.1768693 | 0.8788941 |
Again, there is a difference in how riskRegression and
causalRisk handle competing risks.
riskRegression uses a multistate Cox model, while
causalRisk uses a Fine and Gray model. Nonetheless, the
results remain similar.
# Conduct analysis with causalRisk
models_cr = specify_models(identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment, ~DxRisk),
identify_outcome(Death, ~DxRisk),
identify_competing_risk(CvdDeath))
fit_aipw_cr = estimate_aipwrisk(dat,
models_cr,
fully_efficient = TRUE,
times = 24,
tau_censor = TRUE,
labels = c("Adjusted models"))
make_table2(fit_aipw_cr)
# Setup data
dat3 <- dat2 %>%
mutate(outcome = ifelse(outcome == 0, 0,
ifelse(CvdDeath == 0, 1, 2)),
outcome = factor(outcome))
# Conduct analysis with riskRegression
censmod <- survival::coxph(Surv(t, outcome == 0) ~ DxRisk + strata(Statin), data = dat3, x = TRUE, y = TRUE)
outmod <- riskRegression::CSC(Hist(t, outcome) ~ DxRisk + strata(Statin), data = dat3)
psmod <- glm(Statin ~ DxRisk, family = binomial(link = "logit"), data =dat3)
result3 <- ate(event = outmod,
treatment = psmod,
censor = censmod,
estimator = "AIPTW",
data = dat3,
times = 24,
cause = 2,
verbose = FALSE)
result3$diffRisk %>% kable()| estimator | time | A | estimate.A | B | estimate.B | estimate | se | lower | upper | p.value |
|---|---|---|---|---|---|---|---|---|---|---|
| AIPTW | 24 | Control | 0.4098683 | Treat | 0.2475148 | -0.1623535 | 0.0905999 | -0.3399261 | 0.0152191 | 0.0731358 |
make_table1 applied to the same models fit to
example1 and example1_timevar should result in
identical tables.
models = specify_models(identify_subject(id),
identify_treatment(Statin),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
time_fixed = estimate_ipwrisk(example1,
models,
tau = 30,
labels = c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_treatment(Statin),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
time_varying = estimate_ipwrisk(example1_timevar,
models,
tau = 30,
labels = c("Time-varying"))
make_table1(time_fixed, DxRisk, Sex, smd = TRUE)| Characteristic1 | Control n=10000.00 |
Treat n=10000.00 |
SMD |
|---|---|---|---|
| DxRisk, Mean (SD) | 0.74 (0.85) | -0.35 (0.88) | 1.26010726497173 |
| Sex | 0.0209159130770583 | ||
| Female | 4939 49.39 | 4835 48.35 | |
| Male | 5061 50.61 | 5165 51.65 | |
| 1 All values are N (%) unless otherwise specified | |||
make_table1(time_varying, DxRisk, Sex, smd = TRUE)| Characteristic1 | Control n=10000.00 |
Treat n=10000.00 |
SMD |
|---|---|---|---|
| DxRisk, Mean (SD) | 0.74 (0.85) | -0.35 (0.88) | 1.26010726497173 |
| Sex | 0.0209159130770583 | ||
| Female | 4939 49.39 | 4835 48.35 | |
| Male | 5061 50.61 | 5165 51.65 | |
| 1 All values are N (%) unless otherwise specified | |||
models = specify_models(identify_subject(id),
identify_treatment(Statin, ~DxRisk),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
time_fixed = estimate_ipwrisk(example1,
models,
tau = 30,
labels = c("Time-fixed"))
models = specify_models(identify_subject(id),
identify_treatment(Statin, ~DxRisk),
identify_interval(Start, Stop),
identify_censoring(EndofEnrollment),
identify_outcome(Death))
time_varying = estimate_ipwrisk(example1_timevar,
models,
tau = 30,
labels = c("Time-varying"))
make_table1(time_fixed, DxRisk, Sex, smd = TRUE)| Characteristic1 | Control n=9953.53 |
Treat n=9908.35 |
SMD |
|---|---|---|---|
| DxRisk, Mean (SD) | 0.02 (0.97) | -0.03 (0.97) | 0.0586710589322605 |
| Sex | 0.00471104495627606 | ||
| Female | 4815 48.37 | 4769 48.13 | |
| Male | 5139 51.63 | 5139 51.87 | |
| 1 All values are N (%) unless otherwise specified | |||
make_table1(time_varying, DxRisk, Sex, smd = TRUE)| Characteristic1 | Control n=9953.53 |
Treat n=9908.35 |
SMD |
|---|---|---|---|
| DxRisk, Mean (SD) | 0.02 (0.97) | -0.03 (0.97) | 0.0586710589322605 |
| Sex | 0.00471104495627606 | ||
| Female | 4815 48.37 | 4769 48.13 | |
| Male | 5139 51.63 | 5139 51.87 | |
| 1 All values are N (%) unless otherwise specified | |||