py4sci

This Page

formula.parts

Parts of formulae: Terms, Factors, etc

class formula.parts.Factor(name, levels, char='b', coding='indicator', reference=None)

A qualitative variable in a regression model

A Factor is similar to R’s factor. The levels of the Factor can be either strings or ints.

Methods

fromcol
get_term
drop_reference

The drop_reference formula: a binary column for each level of the factor except self.reference.

formula
static fromcol(col, name)

Create a Factor from a column array.

Parameters :

col : ndarray

an array with ndim==1

name : str

name of the Factor

Returns :

factor : Factor

Examples

>>> data = np.array([(3,'a'),(4,'a'),(5,'b'),(3,'b')], np.dtype([('x', np.float), ('y', 'S1')]))
>>> f1 = Factor.fromcol(data['y'], 'y')
>>> f2 = Factor.fromcol(data['x'], 'x')
>>> d = f1.formula.design(data)
>>> print d.dtype.descr
[('y_a', '<f8'), ('y_b', '<f8')]
>>> d = f2.formula.design(data)
>>> print d.dtype.descr
[('x_3', '<f8'), ('x_4', '<f8'), ('x_5', '<f8')]
get_term(level)

Retrieve a term of the Factor...

indicator

The indicator formula: a binary column for each level of the factor.

class formula.parts.FactorTerm

Boolean Term derived from a Factor.

Its properties are the same as a Term except that its product with itself is itself.

Methods

apart
args_cnc
as_base_exp
as_coeff_Mul
as_coeff_add
as_coeff_exponent
as_coeff_factors
as_coeff_mul
as_coeff_terms
as_coefficient
as_dummy
as_expr
as_independent
as_leading_term
as_numer_denom
as_ordered_factors
as_ordered_terms
as_poly
as_powers_dict
as_real_imag
as_terms
atoms
cancel
class_key
coeff
collect
combsimp
compare
compare_pretty
compute_leading_term
conjugate
could_extract_minus_sign
count
count_ops
diff
doit
dummy_eq
evalf
expand
extract_additively
extract_multiplicatively
factor
find
fromiter
getO
getn
has
integrate
invert
is_hypergeometric
is_polynomial
is_rational_function
iter_basic_args
leadterm
limit
lseries
match
matches
n
normal
nseries
nsimplify
powsimp
radsimp
ratsimp
refine
removeO
replace
rewrite
separate
series
simplify
sort_key
subs
together
trigsimp
class formula.parts.Term

A sympy.Symbol type to represent a term an a regression model

Terms can be added to other sympy expressions with the single convention that a term plus itself returns itself.

It is meant to emulate something on the right hand side of a formula in R. In particular, its name can be the name of a field in a recarray used to create a design matrix.

>>> t = Term('x')
>>> xval = np.array([(3,),(4,),(5,)], np.dtype([('x', np.float)]))
>>> f = t.formula
>>> d = f.design(xval)
>>> print d.dtype.descr
[('x', '<f8')]
>>> f.design(xval, return_float=True)
array([ 3.,  4.,  5.])

Methods

apart
args_cnc
as_base_exp
as_coeff_Mul
as_coeff_add
as_coeff_exponent
as_coeff_factors
as_coeff_mul
as_coeff_terms
as_coefficient
as_dummy
as_expr
as_independent
as_leading_term
as_numer_denom
as_ordered_factors
as_ordered_terms
as_poly
as_powers_dict
as_real_imag
as_terms
atoms
cancel
class_key
coeff
collect
combsimp
compare
compare_pretty
compute_leading_term
conjugate
could_extract_minus_sign
count
count_ops
diff
doit
dummy_eq
evalf
expand
extract_additively
extract_multiplicatively
factor
find
fromiter
getO
getn
has
integrate
invert
is_hypergeometric
is_polynomial
is_rational_function
iter_basic_args
leadterm
limit
lseries
match
matches
n
normal
nseries
nsimplify
powsimp
radsimp
ratsimp
refine
removeO
replace
rewrite
separate
series
simplify
sort_key
subs
together
trigsimp
formula

Return a Formula with only terms=[self].

formula.parts.fromrec(recarr)

Create Terms and Factors from structured array

We assume fields of type object and string are Factors, all others are Terms.

Parameters :

recarr : ndarray

array with composite dtype

Returns :

facterms : dict

dict with keys of recarr dtype field names, and values being a Factor (if the field was object or string type) and a Term otherwise.

Examples

>>> arr = np.array([(100,'blue'), (0, 'red')], dtype=
...                [('awesomeness','i'), ('shirt','S7')])
>>> teams = fromrec(arr)
>>> is_term(teams['awesomeness'])
True
>>> is_factor(teams['shirt'])
True
formula.parts.getparams(expression)

Return the parameters of an expression that are not Term instances but are instances of sympy.Symbol.

Examples

>>> from formula import terms, Formula
>>> x, y, z = terms('x, y, z')
>>> f = Formula([x,y,z])
>>> getparams(f)
[]
>>> f.mean
_b0*x + _b1*y + _b2*z
>>> getparams(f.mean)
[_b0, _b1, _b2]
>>>
>>> th = sympy.Symbol('theta')
>>> f.mean*sympy.exp(th)
(_b0*x + _b1*y + _b2*z)*exp(theta)
>>> getparams(f.mean*sympy.exp(th))
[theta, _b0, _b1, _b2]
formula.parts.getterms(expression)

Return the all instances of Term in an expression.

Examples

>>> from formula import terms, Formula
>>> x, y, z = terms('x, y, z')
>>> f = Formula([x,y,z])
>>> getterms(f)
[x, y, z]
>>> getterms(f.mean)
[x, y, z]
formula.parts.is_factor(obj)

Is obj a Factor?

formula.parts.is_factor_term(obj)

Is obj a FactorTerm?

formula.parts.is_term(obj)

Is obj a Term?

formula.parts.stratify(factor, variable)

Create a new variable, stratified by the levels of a Factor.

Parameters :

variable : str or a simple sympy expression whose string representation

are all lower or upper case letters, i.e. it can be interpreted as a name

Returns :

formula : Formula

Formula whose mean has one parameter named _variable%d, for each level in factor.levels

Examples

>>> f = Factor('a', ['x','y'])
>>> sf = stratify(f, 'theta')
>>> sf.mean
_theta0*a_x + _theta1*a_y