stype (pronounced stipe) is an R package for statistical
data types. It depends heavily upon the vctrs package
to:
The stype package provides classes that enforce
(run-time) safety for types common to many statistical analyses such as
v_binary, v_continuous, v_count,
and v_nominal. For example, binary data can be represented
in R in at least three ways: a logical, a
factor with two levels, or a numeric using
just 0 and 1. Which representation should one
use? The latter two do not guarantee that certain binary operations are
closed in a mathematical sense; e.g., c(0, 1, 0, 1) + 1:4
returns c(1, 3, 3, 5). Such behavior is not possible with
v_binary. Similarly, count data can be represented by an
integer in R but without the restriction of
being non-negative. The v_count constructor enforces
positivity.
Each instance of stype objects contain 2 attributes that
users may find useful: context and
data_summary. A context can be used to specify
project-specific metadata. It is an S4 object containing
slots such as short_label, long_label,
description, security_type, tags,
and purpose. A purpose, for example, can be
used to define a variable’s role in a study design such as “outcome”,
“identifier”, “covariate”, or “exposure”. This kind of contextual
information is invaluable in data pipelines.
A stype vector also contains a data_summary
object, which is automatically generated and contain summary
statistics about the data. All objects contain the following
statistics:
n: number of observationshas_missing: an indicator of whether the variable has
missing datan_nonmissing: the number of nonmissingn_missing: the number of missingproportion_missing: the proportion missingis_constant: an indicator of whether all the values are
the sameEach type has additional summary statistics relevant to its data. For
example, v_continuous contains the mean, standard
deviation, min, max, and various quantiles. The
data_summary is updated whenever a variable is subset or
two vectors of the type are combined.
The package also prints certain attributes, for example:
> stype::v_binary(c(TRUE, FALSE, TRUE))
<binary[3]>
[1] 1 0 1
Proportion = 0.667
> stype::v_binary(c(TRUE, FALSE, TRUE, NA))
<binary[4]>
[1] 1 0 1 NA
Proportion = 0.667; Missing = 1.000