Design of the Features module

A Feature is a type parametrized by two types, name and d:

newtype (KnownSymbol name) => Feature name d =
  MkFeature  ( FeatureData d )

The type d here stands for "data", which then parametrizes the FeatureData type. The FeatureData type is wrapper around an Either:

newtype FeatureData d = MkFeatureData {
    getFeatureData :: Either MissingReason d  -- ^ Unwrap FeatureData.
  }

Type of d can be almost anything and need not be a scalar. All the following are valid types for d:

  • Int

  • Text

  • (Int, Maybe Text)

  • [Double]

The name type is a bit special: it does not appear on the right-hand side of the =. In type-theory parlance, name is a phantom type. So, a Feature type constructor takes two arguments (name and d), but its value constructor (MkFeature) takes a single value of type FeatureData d.

Values of the FeatureData type contain the data we’re ultimately interested in analyzing or passing along to downstream applications. However, a FeatureData value does not simply contain data of type d. The type allows for the possibility of missingness, failures, or errors via the Either type. A content of a FeatureData d, then, is either a Left MissingReason or a Right d.

The use of Either has important implications when defining features, as we will see. Now that we know the internals of a Feature, how do we create them? There are two ways to create features:

  1. a pure lifting of data into a Feature or

  2. writing a Definition: a function that defines a Feature based on other Feature s.

The first method is a way to get data directly into a Feature. The following function takes a list of Events and makes a Feature of them:

allEvents :: [Event Day] -> Feature "allEvents" [Event Day]
allEvents = pure

The pure lifting is generally used to lift a subject’s input data into a Feature, so that other features can be defined from a subject’s data. Feature`s are derived from other `Feature by the Definition type Specifically, Definition is a type containing a function that maps Feature inputs to a Feature output. define (or defineA) constructs the Definition. For example:

myDef :: Definition (Feature "a" Int -> Feature "b" Bool)
myDef = define (\x -> if x > 0 then True else False)

x is type Int not Feature "a" Int and the return type is Bool not Feature "b" Bool. The define function and Definition type do the magic of lifting these types to the Feature level. To see this more clearly, see myDef2 below:

intToBool :: Int -> Bool
intToBool x = if x > 0 then True else False)

myDef2 :: Definition (Feature "a" Int -> Feature "b" Bool)
myDef2 = define intToBoo

myDef2 is equivalent to myDef.

The define function, then, let’s us focus on the logic of our Feature without needing to worry about handling the error cases. If we were to write a function with signature Feature "a" Int → Feature "b" Bool directly, it would look something like:

myFeat :: Feature "a" Int -> Feature "b" Bool
myFeat (MkFeature (MkFeatureData (Left r))) = MkFeature (MkFeatureData (Left r))
myFeat (MkFeature (MkFeatureData (Right x))) = MkFeature (MkFeatureData (Right $ intToBool x))

One would need to pattern match all the possible types of inputs, which gets more complicated as the number of inputs increases.

As an aside, since Feature are Functors, one could instead write:

myFeat :: Feature "a" Int -> Feature "b" Bool
myFeat = fmap intToBool

This would require understanding how Functors and similar structures are used. The define and defineA functions provide a common interface to these structures without needing to understand the details.

Evaluating Definitions

To evaluate a Definition, we use the eval function. Consider the following example. The input data is a list of Int`s. If the list is empty (`null), this is considered an error in feat1. If the list has more than 3 elements, then in feat2, the sum is computed; otherwise 0 is returned.

featInts :: [Int] -> Feature "someInts" [Int]
featInts = pure

feat1 :: Definition (Feature "someInts" [Int] -> Feature "hasMoreThan3" Bool)
feat1 = defineA
  (\ints -> if null ints then makeFeature (missingBecause $ Other "no data")
           else makeFeature $ featureDataR (length ints > 3))

feat2 :: Definition (
      Feature "hasMoreThan3" Bool
  -> Feature "someInts" [Int]
  -> Feature "sum" Int)
feat2 = define (\b ints -> if b then sum ints else 0)

ex0 = featInts []
ex0a = eval feat1 ex0 -- MkFeature (MkFeatureData (Left (Other "no data")))
ex0b = eval feat2 (ex0a, ex0) -- MkFeature (MkFeatureData (Left (Other "no data")))

ex1 = featInts [3, 8]
ex1a = eval feat1 ex1 -- MkFeature (MkFeatureData (Right False))
ex1b = eval feat2 (ex1a, ex1) -- MkFeature (MkFeatureData (Right 0))

ex2 = featInts [1..4]
ex2a = eval feat1 ex2 -- MkFeature (MkFeatureData (Right True))
ex2b = eval feat2 (ex2a, ex2) -- MkFeature (MkFeatureData (Right 10))

Note the value of ex0b. It is a Left because the value of ex0a is a Left; in other words, errors propogate along Feature. If a given Feature dependency is a Left then that Feature will also be Left. A Feature’s internal `Either structure is a way to prevent downstream dependencies from needing to be computed, which increases performance.

Type Safety of Features

In describing the Feature type, the utility of having the name as a type may not have been clear. To clarify, consider the following example:

x :: Feature "someInt" Natural
x = pure 39

y :: Feature "age" Natural
y = pure 43

f :: Definition (Feature "age" Natural -> Feature "isOld" Bool)
f = define (>= 39)

fail = eval f x
pass = eval f y

In the example, fail does not compile because "someInt" is not "age", even though both the data type are Natural.