`formula.formulae`¶

Formula objects¶

A formula is basically a sympy expression for the mean of something of the form:

mean = sum([Beta(e)*e for e in expr])

Or, a linear combination of sympy expressions, with each one multiplied by its own “Beta”. The elements of expr can be instances of Term (for a linear regression formula, they would all be instances of Term). But, in general, there might be some other parameters (i.e. sympy.Symbol instances) that are not Terms.

The design matrix is made up of columns that are the derivatives of mean with respect to everything that is not a Term, evaluted at a recarray that has field names given by [str(t) for t in self.terms].

For those familiar with R’s formula syntax, if we wanted a design matrix like the following:

> s.table = read.table("http://www-stat.stanford.edu/~jtaylo/courses/stats191/data/supervisor.table", header=T)
> d = model.matrix(lm(Y ~ X1*X3, s.table)
)
> d
   (Intercept) X1 X3 X1:X3
1            1 51 39  1989
2            1 64 54  3456
3            1 70 69  4830
4            1 63 47  2961
5            1 78 66  5148
6            1 55 44  2420
7            1 67 56  3752
8            1 75 55  4125
9            1 82 67  5494
10           1 61 47  2867
11           1 53 58  3074
12           1 60 39  2340
13           1 62 42  2604
14           1 83 45  3735
15           1 77 72  5544
16           1 90 72  6480
17           1 85 69  5865
18           1 60 75  4500
19           1 70 57  3990
20           1 58 54  3132
21           1 40 34  1360
22           1 61 62  3782
23           1 66 50  3300
24           1 37 58  2146
25           1 54 48  2592
26           1 77 63  4851
27           1 75 74  5550
28           1 57 45  2565
29           1 85 71  6035
30           1 82 59  4838
attr(,"assign")
[1] 0 1 2 3
>

With the Formula, it looks like this:

First read the same data as above:

>>> from os.path import dirname, join as pjoin
>>> import numpy as np
>>> import formula
>>> fname = pjoin(dirname(formula.__file__), 'data', 'supervisor.table')
>>> r = np.recfromtxt(fname, names=True)

Define the formula

>>> from formula import terms, Formula
>>> X1, X3 = terms(('X1', 'X3'))
>>> f = Formula([X1, X3, X1*X3, 1])
>>> f.mean
_b0*X1 + _b1*X3 + _b2*X1*X3 + _b3

The 1 is the “intercept” term, I have explicity not used R’s default of adding it to everything.

>>> f.design(r)
array([(51.0, 39.0, 1989.0, 1.0), (64.0, 54.0, 3456.0, 1.0),
       (70.0, 69.0, 4830.0, 1.0), (63.0, 47.0, 2961.0, 1.0),
       (78.0, 66.0, 5148.0, 1.0), (55.0, 44.0, 2420.0, 1.0),
       (67.0, 56.0, 3752.0, 1.0), (75.0, 55.0, 4125.0, 1.0),
       (82.0, 67.0, 5494.0, 1.0), (61.0, 47.0, 2867.0, 1.0),
       (53.0, 58.0, 3074.0, 1.0), (60.0, 39.0, 2340.0, 1.0),
       (62.0, 42.0, 2604.0, 1.0), (83.0, 45.0, 3735.0, 1.0),
       (77.0, 72.0, 5544.0, 1.0), (90.0, 72.0, 6480.0, 1.0),
       (85.0, 69.0, 5865.0, 1.0), (60.0, 75.0, 4500.0, 1.0),
       (70.0, 57.0, 3990.0, 1.0), (58.0, 54.0, 3132.0, 1.0),
       (40.0, 34.0, 1360.0, 1.0), (61.0, 62.0, 3782.0, 1.0),
       (66.0, 50.0, 3300.0, 1.0), (37.0, 58.0, 2146.0, 1.0),
       (54.0, 48.0, 2592.0, 1.0), (77.0, 63.0, 4851.0, 1.0),
       (75.0, 74.0, 5550.0, 1.0), (57.0, 45.0, 2565.0, 1.0),
       (85.0, 71.0, 6035.0, 1.0), (82.0, 59.0, 4838.0, 1.0)], 
      dtype=[('X1', '<f8'), ('X3', '<f8'), ('X1*X3', '<f8'), ('1', '<f8')])

class formula.formulae.Beta¶

A dummy symbol tied to a Term term

Methods

`apart`
`args_cnc`
`as_base_exp`
`as_coeff_Mul`
`as_coeff_add`
`as_coeff_exponent`
`as_coeff_factors`
`as_coeff_mul`
`as_coeff_terms`
`as_coefficient`
`as_dummy`
`as_expr`
`as_independent`
`as_leading_term`
`as_numer_denom`
`as_ordered_factors`
`as_ordered_terms`
`as_poly`
`as_powers_dict`
`as_real_imag`
`as_terms`
`atoms`
`cancel`
`class_key`
`coeff`
`collect`
`combsimp`
`compare`
`compare_pretty`
`compute_leading_term`
`conjugate`
`could_extract_minus_sign`
`count`
`count_ops`
`diff`
`doit`
`dummy_eq`
`evalf`
`expand`
`extract_additively`
`extract_multiplicatively`
`factor`
`find`
`fromiter`
`getO`
`getn`
`has`
`integrate`
`invert`
`is_hypergeometric`
`is_polynomial`
`is_rational_function`
`iter_basic_args`
`leadterm`
`limit`
`lseries`
`match`
`matches`
`n`
`normal`
`nseries`
`nsimplify`
`powsimp`
`radsimp`
`ratsimp`
`refine`
`removeO`
`replace`
`rewrite`
`separate`
`series`
`simplify`
`sort_key`
`subs`
`together`
`trigsimp`

class formula.formulae.Formula(seq, char='b')¶

A Formula is a model for a mean in a regression model.

It is often given by a sequence of sympy expressions, with the mean model being the sum of each term multiplied by a linear regression coefficient.

The expressions may depend on additional Symbol instances, giving a non-linear regression model.

Methods

`delete_terms`
`design`
`subs`

coefs¶: Coefficients in the linear regression formula.

delete_terms(other)¶

design(input, param=None, return_float=False, contrasts=None)¶

Construct the design matrix, and optional contrast matrices.

Parameters :

input : np.recarray

Recarray including fields needed to compute the Terms in getparams(self.design_expr).

param : None or np.recarray

Recarray including fields that are not Terms in getparams(self.design_expr)

return_float : bool, optional

If True, return a np.float array rather than a np.recarray

contrasts : None or dict, optional

Contrasts. The items in this dictionary should be (str, Formula) pairs where a contrast matrix is constructed for each Formula by evaluating its design at the same parameters as self.design. If not None, then the return_float is set to True.

dtype¶: The dtype of the design matrix of the Formula.

mean¶: Expression for the mean, expressed as a linear combination of terms, each with dummy variables in front.

params¶: The parameters in the Formula.

subs(old, new)¶

Perform a sympy substitution on all terms in the Formula

Returns a new instance of the same class

Parameters :

old : sympy.Basic

The expression to be changed

new : sympy.Basic

The value to change it to.

Returns :

newf : Formula

Examples

>>> from formula import terms
>>> s, t = terms('s, t')
>>> f, g = [sympy.Function(l) for l in 'fg']
>>> form = Formula([f(t),g(s)])
>>> newform = form.subs(g, sympy.Function('h'))
>>> newform.terms
array([f(t), h(s)], dtype=object)
>>> form.terms
array([f(t), g(s)], dtype=object)

terms¶: Terms in the linear regression formula.

unique¶: Return a Formula(np.unique(self.terms))

formula.formulae.is_beta(obj)¶: Is obj a Beta?

formula.formulae.is_formula(obj)¶: Is obj a Formula?

Table Of Contents

This Page

`formula.formulae`¶

Formula objects¶

Navigation

Table Of Contents

This Page

Quick search

formula.formulae¶

Formula objects¶

Navigation

`formula.formulae`¶