The Language

gofl is built around 3 algebraic operators:

  • +: the sum operator
  • :: the product operator
  • *: the sum of the sum and product

By design, these operators correspond to the same operators in R model formulas. Under the hood, these operators are replaced in the AST by internal generic operators %s%, %p%, and %ssp%

Additionally, gofl has two functions, tag and .zoom, which have specific use cases.

The Algebra

The work of gofl is done by defining how sum (%s%) and product (%p%) work on different types.

Permuation matrices are used to define variable levels:

X <- gofl:::as_tmatrix("var1", c("level1", "level2", "level3"))
Y <- gofl:::as_tmatrix("var2", LETTERS[1:5])

Each row corresponds to a group:

X@mat
#> 3 x 3 diagonal matrix of class "ddiMatrix"
#>      var1____level1 var1____level2 var1____level3
#> [1,]              1              .              .
#> [2,]              .              1              .
#> [3,]              .              .              1
Y@mat
#> 5 x 5 diagonal matrix of class "ddiMatrix"
#>      var2____A var2____B var2____C var2____D var2____E
#> [1,]         1         .         .         .         .
#> [2,]         .         1         .         .         .
#> [3,]         .         .         1         .         .
#> [4,]         .         .         .         1         .
#> [5,]         .         .         .         .         1

The sum of these two matrices is the direct sum, or block diagonal matrix:

gofl:::`%s%`(X@mat, Y@mat)
#> 8 x 8 sparse Matrix of class "dtCMatrix"
#>      var1____level1 var1____level2 var1____level3 var2____A var2____B var2____C
#> [1,]              1              .              .         .         .         .
#> [2,]              .              1              .         .         .         .
#> [3,]              .              .              1         .         .         .
#> [4,]              .              .              .         1         .         .
#> [5,]              .              .              .         .         1         .
#> [6,]              .              .              .         .         .         1
#> [7,]              .              .              .         .         .         .
#> [8,]              .              .              .         .         .         .
#>      var2____D var2____E
#> [1,]         .         .
#> [2,]         .         .
#> [3,]         .         .
#> [4,]         .         .
#> [5,]         .         .
#> [6,]         .         .
#> [7,]         1         .
#> [8,]         .         1

The product of these two matrices is the cartesian product.

gofl:::`%p%`(X@mat, Y@mat)
#> 15 x 8 sparse Matrix of class "dgCMatrix"
#>       var1____level1 var1____level2 var1____level3 var2____A var2____B
#>  [1,]              1              .              .         1         .
#>  [2,]              1              .              .         .         1
#>  [3,]              1              .              .         .         .
#>  [4,]              1              .              .         .         .
#>  [5,]              1              .              .         .         .
#>  [6,]              .              1              .         1         .
#>  [7,]              .              1              .         .         1
#>  [8,]              .              1              .         .         .
#>  [9,]              .              1              .         .         .
#> [10,]              .              1              .         .         .
#> [11,]              .              .              1         1         .
#> [12,]              .              .              1         .         1
#> [13,]              .              .              1         .         .
#> [14,]              .              .              1         .         .
#> [15,]              .              .              1         .         .
#>       var2____C var2____D var2____E
#>  [1,]         .         .         .
#>  [2,]         .         .         .
#>  [3,]         1         .         .
#>  [4,]         .         1         .
#>  [5,]         .         .         1
#>  [6,]         .         .         .
#>  [7,]         .         .         .
#>  [8,]         1         .         .
#>  [9,]         .         1         .
#> [10,]         .         .         1
#> [11,]         .         .         .
#> [12,]         .         .         .
#> [13,]         1         .         .
#> [14,]         .         1         .
#> [15,]         .         .         1

The sum of the sum and product does both the above operations but columns with the same name are stacked appropriately:

gofl:::`%ssp%`(X@mat, Y@mat)
#> 23 x 8 sparse Matrix of class "dgCMatrix"
#>       var1____level1 var1____level2 var1____level3 var2____A var2____B
#>  [1,]              1              .              .         .         .
#>  [2,]              .              1              .         .         .
#>  [3,]              .              .              1         .         .
#>  [4,]              .              .              .         1         .
#>  [5,]              .              .              .         .         1
#>  [6,]              .              .              .         .         .
#>  [7,]              .              .              .         .         .
#>  [8,]              .              .              .         .         .
#>  [9,]              1              .              .         1         .
#> [10,]              1              .              .         .         1
#> [11,]              1              .              .         .         .
#> [12,]              1              .              .         .         .
#> [13,]              1              .              .         .         .
#> [14,]              .              1              .         1         .
#> [15,]              .              1              .         .         1
#> [16,]              .              1              .         .         .
#> [17,]              .              1              .         .         .
#> [18,]              .              1              .         .         .
#> [19,]              .              .              1         1         .
#> [20,]              .              .              1         .         1
#> [21,]              .              .              1         .         .
#> [22,]              .              .              1         .         .
#> [23,]              .              .              1         .         .
#>       var2____C var2____D var2____E
#>  [1,]         .         .         .
#>  [2,]         .         .         .
#>  [3,]         .         .         .
#>  [4,]         .         .         .
#>  [5,]         .         .         .
#>  [6,]         1         .         .
#>  [7,]         .         1         .
#>  [8,]         .         .         1
#>  [9,]         .         .         .
#> [10,]         .         .         .
#> [11,]         1         .         .
#> [12,]         .         1         .
#> [13,]         .         .         1
#> [14,]         .         .         .
#> [15,]         .         .         .
#> [16,]         1         .         .
#> [17,]         .         1         .
#> [18,]         .         .         1
#> [19,]         .         .         .
#> [20,]         .         .         .
#> [21,]         1         .         .
#> [22,]         .         1         .
#> [23,]         .         .         1

These operators are defined analoguouly are defined for lists. The sum of two lists is concatenation:

gofl:::`%s%`(list(1, 2), list(2, 3, 5))
#> [[1]]
#> [1] 1
#> 
#> [[2]]
#> [1] 2
#> 
#> [[3]]
#> [1] 2
#> 
#> [[4]]
#> [1] 3
#> 
#> [[5]]
#> [1] 5

The product of two lists is the cartesian product:

gofl:::`%p%`(list(1, 2), list(2, 3, 5))
#> [[1]]
#> [1] 1 2
#> 
#> [[2]]
#> [1] 1 3
#> 
#> [[3]]
#> [1] 1 5
#> 
#> [[4]]
#> [1] 2 2
#> 
#> [[5]]
#> [1] 2 3
#> 
#> [[6]]
#> [1] 2 5

The same operators also work for integers:

gofl:::`%s%`(2L, 3L)
#> [1] 5
gofl:::`%p%`(2L, 3L)
#> [1] 6
gofl:::`%ssp%`(2L, 3L)
#> [1] 11

To get gofl to work a data structure called tagged holds the matrix and the tags:

str(X)
#> Formal class 'tagged' [package "gofl"] with 2 slots
#>   ..@ mat :Formal class 'ddiMatrix' [package "Matrix"] with 4 slots
#>   .. .. ..@ diag    : chr "N"
#>   .. .. ..@ Dim     : int [1:2] 3 3
#>   .. .. ..@ Dimnames:List of 2
#>   .. .. .. ..$ : NULL
#>   .. .. .. ..$ : chr [1:3] "var1____level1" "var1____level2" "var1____level3"
#>   .. .. ..@ x       : num [1:3] 1 1 1
#>   ..@ tags:List of 3
#>   .. ..$ : NULL
#>   .. ..$ : NULL
#>   .. ..$ : NULL

Then the operators are applied simultaneously to the matrices and the tags.

gofl:::`%s%`(X, Y)
#> An object of class "tagged"
#> Slot "mat":
#> 8 x 8 sparse Matrix of class "dtCMatrix"
#>      var1____level1 var1____level2 var1____level3 var2____A var2____B var2____C
#> [1,]              1              .              .         .         .         .
#> [2,]              .              1              .         .         .         .
#> [3,]              .              .              1         .         .         .
#> [4,]              .              .              .         1         .         .
#> [5,]              .              .              .         .         1         .
#> [6,]              .              .              .         .         .         1
#> [7,]              .              .              .         .         .         .
#> [8,]              .              .              .         .         .         .
#>      var2____D var2____E
#> [1,]         .         .
#> [2,]         .         .
#> [3,]         .         .
#> [4,]         .         .
#> [5,]         .         .
#> [6,]         .         .
#> [7,]         1         .
#> [8,]         .         1
#> 
#> Slot "tags":
#> [[1]]
#> NULL
#> 
#> [[2]]
#> NULL
#> 
#> [[3]]
#> NULL
#> 
#> [[4]]
#> NULL
#> 
#> [[5]]
#> NULL
#> 
#> [[6]]
#> NULL
#> 
#> [[7]]
#> NULL
#> 
#> [[8]]
#> NULL

Hijacking the AST

The traverse_expr function replaces + with %s% and so on.

ff <- ~ a*b + c:d + e
nf <- gofl:::traverse_expr(ff, f = identity)
# It's not pretty to look at
nf
#> ~(new("standardGeneric", .Data = function (x, y) 
#> standardGeneric("%s%"), generic = "%s%", package = "gofl", group = list(), 
#>     valueClass = character(0), signature = c("x", "y"), default = NULL, 
#>     skeleton = (function (x, y) 
#>     stop(gettextf("invalid call in method dispatch to '%s' (no default method)", 
#>         "%s%"), domain = NA))(x, y)))((new("standardGeneric", 
#>     .Data = function (x, y) 
#>     standardGeneric("%s%"), generic = "%s%", package = "gofl", 
#>     group = list(), valueClass = character(0), signature = c("x", 
#>     "y"), default = NULL, skeleton = (function (x, y) 
#>     stop(gettextf("invalid call in method dispatch to '%s' (no default method)", 
#>         "%s%"), domain = NA))(x, y)))((new("standardGeneric", 
#>     .Data = function (x, y) 
#>     standardGeneric("%ssp%"), generic = "%ssp%", package = "gofl", 
#>     group = list(), valueClass = character(0), signature = c("x", 
#>     "y"), default = NULL, skeleton = (function (x, y) 
#>     stop(gettextf("invalid call in method dispatch to '%s' (no default method)", 
#>         "%ssp%"), domain = NA))(x, y)))(a, b), (new("standardGeneric", 
#>     .Data = function (x, y) 
#>     standardGeneric("%p%"), generic = "%p%", package = "gofl", 
#>     group = list(), valueClass = character(0), signature = c("x", 
#>     "y"), default = NULL, skeleton = (function (x, y) 
#>     stop(gettextf("invalid call in method dispatch to '%s' (no default method)", 
#>         "%p%"), domain = NA))(x, y)))(c, d)), e)

Now the new expression can be evaluated with data. Usually, our data here would be the tagged objects, but just for illustration:

rlang::eval_tidy(nf[[2]], data = list(a = 1L, b = 3L, c = 4L, d = 2L, e =  7L))
#> [1] 22

The traverse_expr function can also take a function as an argument that is applied to each leaf of the AST. In this example gofl:::replace_by_size finds the nrow of a matrix.

rlang::eval_tidy(
  expr = gofl:::traverse_expr(ff, f = gofl:::replace_by_size)[[2]], 
  data = list(
  a = matrix(nrow = 1), 
  b = matrix(nrow = 3), 
  c = matrix(nrow = 4), 
  d = matrix(nrow = 2),
  e = matrix(nrow = 7)))
#> [1] 22

To continue the example from above and demonstrate how it works on the tagged type:

ff <- ~ var1 + var2
rlang::eval_tidy(
  expr = gofl:::traverse_expr(ff, identity)[[2]], 
  data = list(var1 = X, var2 = Y))
#> An object of class "tagged"
#> Slot "mat":
#> 8 x 8 sparse Matrix of class "dtCMatrix"
#>      var1____level1 var1____level2 var1____level3 var2____A var2____B var2____C
#> [1,]              1              .              .         .         .         .
#> [2,]              .              1              .         .         .         .
#> [3,]              .              .              1         .         .         .
#> [4,]              .              .              .         1         .         .
#> [5,]              .              .              .         .         1         .
#> [6,]              .              .              .         .         .         1
#> [7,]              .              .              .         .         .         .
#> [8,]              .              .              .         .         .         .
#>      var2____D var2____E
#> [1,]         .         .
#> [2,]         .         .
#> [3,]         .         .
#> [4,]         .         .
#> [5,]         .         .
#> [6,]         .         .
#> [7,]         1         .
#> [8,]         .         1
#> 
#> Slot "tags":
#> [[1]]
#> NULL
#> 
#> [[2]]
#> NULL
#> 
#> [[3]]
#> NULL
#> 
#> [[4]]
#> NULL
#> 
#> [[5]]
#> NULL
#> 
#> [[6]]
#> NULL
#> 
#> [[7]]
#> NULL
#> 
#> [[8]]
#> NULL