The smd
package provides the smd
method to
compute standardized mean differences between two groups for continuous
values (numeric
and integer
data types) and
categorical values (factor
, character
, and
logical
). The method also works on matrix
,
list
, and data.frame
data types by applying
smd()
over the columns of the matrix
or
data.frame
and each item of the list
. The
package is based on Yang and Dalton
(2012).
The smd
function computes the standardized mean
difference for each level \(k\) of a
grouping variable compared to a reference \(r\) level:
\[ d_k = \sqrt{(\bar{x}_r - \bar{x}_{k})^{\intercal}S_{rk}^{-1}(\bar{x}_r - \bar{x}_{k})} \]
where \(\bar{x}_{\cdot}\) and \(S_{rk}\) are the sample mean and covariances for reference group \(r\) and group \(k\), respectively. In the case that \(x\) is categorical, \(\bar{x}\) is the vector of proportions of each category level within a group, and \(S_{rk}\) is the multinomial covariance matrix.
Standard errors are computed using the formula described in Hedges and Olkin (1985):
\[ \sqrt{ \frac{n_r + n_k}{n_rn_k} + \frac{d_k^2}{2(n_r + n_k)} } \]
set.seed(123)
xn <- rnorm(90)
gg2 <- rep(LETTERS[1:2], each = 45)
gg3 <- rep(LETTERS[1:3], each = 30)
smd(x = xn, g = gg2)
#> term estimate
#> 1 B 0.03413269
smd(x = xn, g = gg3)
#> term estimate
#> 1 B -0.25169577
#> 2 C -0.07846864
smd(x = xn, g = gg2, std.error = TRUE)
#> term estimate std.error
#> 1 B 0.03413269 0.2108339
smd(x = xn, g = gg3, std.error = TRUE)
#> term estimate std.error
#> 1 B -0.25169577 0.2592192
#> 2 C -0.07846864 0.2582982
xl <- as.logical(rbinom(90, 1, prob = 0.5))
smd(x = xl, g = gg2)
#> term estimate
#> 1 B 0
df <- data.frame(xn, xi, xc, xf, xl)
smd(x = df, g = gg3)
#> variable term estimate
#> 1 xn B -0.25169577
#> 2 xn C -0.07846864
#> 3 xi B 0.30325301
#> 4 xi C 0.36089675
#> 5 xc B 1.50232594
#> 6 xc C 2.23606798
#> 7 xf B 1.50232594
#> 8 xf C 2.23606798
#> 9 xl B -0.06765101
#> 10 xl C -0.20203051
smd
with dplyr
library(dplyr, verbose = FALSE)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df$g <- gg2
df %>%
summarize_at(
.vars = vars(dplyr::matches("^x")),
.funs = list(smd = ~ smd(., g = g)$estimate))
#> xn_smd xi_smd xc_smd xf_smd xl_smd
#> 1 0.03413269 0.1687339 0.1946887 0.1946887 0