py4sci

Table Of Contents

This Page

formula.formulae

Formula objects

A formula is basically a sympy expression for the mean of something of the form:

mean = sum([Beta(e)*e for e in expr])

Or, a linear combination of sympy expressions, with each one multiplied by its own “Beta”. The elements of expr can be instances of Term (for a linear regression formula, they would all be instances of Term). But, in general, there might be some other parameters (i.e. sympy.Symbol instances) that are not Terms.

The design matrix is made up of columns that are the derivatives of mean with respect to everything that is not a Term, evaluted at a recarray that has field names given by [str(t) for t in self.terms].

For those familiar with R’s formula syntax, if we wanted a design matrix like the following:

> s.table = read.table("http://www-stat.stanford.edu/~jtaylo/courses/stats191/data/supervisor.table", header=T)
> d = model.matrix(lm(Y ~ X1*X3, s.table)
)
> d
   (Intercept) X1 X3 X1:X3
1            1 51 39  1989
2            1 64 54  3456
3            1 70 69  4830
4            1 63 47  2961
5            1 78 66  5148
6            1 55 44  2420
7            1 67 56  3752
8            1 75 55  4125
9            1 82 67  5494
10           1 61 47  2867
11           1 53 58  3074
12           1 60 39  2340
13           1 62 42  2604
14           1 83 45  3735
15           1 77 72  5544
16           1 90 72  6480
17           1 85 69  5865
18           1 60 75  4500
19           1 70 57  3990
20           1 58 54  3132
21           1 40 34  1360
22           1 61 62  3782
23           1 66 50  3300
24           1 37 58  2146
25           1 54 48  2592
26           1 77 63  4851
27           1 75 74  5550
28           1 57 45  2565
29           1 85 71  6035
30           1 82 59  4838
attr(,"assign")
[1] 0 1 2 3
>

With the Formula, it looks like this:

First read the same data as above:

>>> from os.path import dirname, join as pjoin
>>> import numpy as np
>>> import formula
>>> fname = pjoin(dirname(formula.__file__), 'data', 'supervisor.table')
>>> r = np.recfromtxt(fname, names=True)

Define the formula

>>> from formula import terms, Formula
>>> X1, X3 = terms(('X1', 'X3'))
>>> f = Formula([X1, X3, X1*X3, 1])
>>> f.mean
_b0*X1 + _b1*X3 + _b2*X1*X3 + _b3

The 1 is the “intercept” term, I have explicity not used R’s default of adding it to everything.

>>> f.design(r)
array([(51.0, 39.0, 1989.0, 1.0), (64.0, 54.0, 3456.0, 1.0),
       (70.0, 69.0, 4830.0, 1.0), (63.0, 47.0, 2961.0, 1.0),
       (78.0, 66.0, 5148.0, 1.0), (55.0, 44.0, 2420.0, 1.0),
       (67.0, 56.0, 3752.0, 1.0), (75.0, 55.0, 4125.0, 1.0),
       (82.0, 67.0, 5494.0, 1.0), (61.0, 47.0, 2867.0, 1.0),
       (53.0, 58.0, 3074.0, 1.0), (60.0, 39.0, 2340.0, 1.0),
       (62.0, 42.0, 2604.0, 1.0), (83.0, 45.0, 3735.0, 1.0),
       (77.0, 72.0, 5544.0, 1.0), (90.0, 72.0, 6480.0, 1.0),
       (85.0, 69.0, 5865.0, 1.0), (60.0, 75.0, 4500.0, 1.0),
       (70.0, 57.0, 3990.0, 1.0), (58.0, 54.0, 3132.0, 1.0),
       (40.0, 34.0, 1360.0, 1.0), (61.0, 62.0, 3782.0, 1.0),
       (66.0, 50.0, 3300.0, 1.0), (37.0, 58.0, 2146.0, 1.0),
       (54.0, 48.0, 2592.0, 1.0), (77.0, 63.0, 4851.0, 1.0),
       (75.0, 74.0, 5550.0, 1.0), (57.0, 45.0, 2565.0, 1.0),
       (85.0, 71.0, 6035.0, 1.0), (82.0, 59.0, 4838.0, 1.0)], 
      dtype=[('X1', '<f8'), ('X3', '<f8'), ('X1*X3', '<f8'), ('1', '<f8')])
class formula.formulae.Beta

A dummy symbol tied to a Term term

Methods

apart
args_cnc
as_base_exp
as_coeff_Mul
as_coeff_add
as_coeff_exponent
as_coeff_factors
as_coeff_mul
as_coeff_terms
as_coefficient
as_dummy
as_expr
as_independent
as_leading_term
as_numer_denom
as_ordered_factors
as_ordered_terms
as_poly
as_powers_dict
as_real_imag
as_terms
atoms
cancel
class_key
coeff
collect
combsimp
compare
compare_pretty
compute_leading_term
conjugate
could_extract_minus_sign
count
count_ops
diff
doit
dummy_eq
evalf
expand
extract_additively
extract_multiplicatively
factor
find
fromiter
getO
getn
has
integrate
invert
is_hypergeometric
is_polynomial
is_rational_function
iter_basic_args
leadterm
limit
lseries
match
matches
n
normal
nseries
nsimplify
powsimp
radsimp
ratsimp
refine
removeO
replace
rewrite
separate
series
simplify
sort_key
subs
together
trigsimp
class formula.formulae.Formula(seq, char='b')

A Formula is a model for a mean in a regression model.

It is often given by a sequence of sympy expressions, with the mean model being the sum of each term multiplied by a linear regression coefficient.

The expressions may depend on additional Symbol instances, giving a non-linear regression model.

Methods

delete_terms
design
subs
coefs

Coefficients in the linear regression formula.

delete_terms(other)
design(input, param=None, return_float=False, contrasts=None)

Construct the design matrix, and optional contrast matrices.

Parameters :

input : np.recarray

Recarray including fields needed to compute the Terms in getparams(self.design_expr).

param : None or np.recarray

Recarray including fields that are not Terms in getparams(self.design_expr)

return_float : bool, optional

If True, return a np.float array rather than a np.recarray

contrasts : None or dict, optional

Contrasts. The items in this dictionary should be (str, Formula) pairs where a contrast matrix is constructed for each Formula by evaluating its design at the same parameters as self.design. If not None, then the return_float is set to True.

dtype

The dtype of the design matrix of the Formula.

mean

Expression for the mean, expressed as a linear combination of terms, each with dummy variables in front.

params

The parameters in the Formula.

subs(old, new)

Perform a sympy substitution on all terms in the Formula

Returns a new instance of the same class

Parameters :

old : sympy.Basic

The expression to be changed

new : sympy.Basic

The value to change it to.

Returns :

newf : Formula

Examples

>>> from formula import terms
>>> s, t = terms('s, t')
>>> f, g = [sympy.Function(l) for l in 'fg']
>>> form = Formula([f(t),g(s)])
>>> newform = form.subs(g, sympy.Function('h'))
>>> newform.terms
array([f(t), h(s)], dtype=object)
>>> form.terms
array([f(t), g(s)], dtype=object)
terms

Terms in the linear regression formula.

unique

Return a Formula(np.unique(self.terms))

formula.formulae.is_beta(obj)

Is obj a Beta?

formula.formulae.is_formula(obj)

Is obj a Formula?