`formula.parts`¶

Parts of formulae: Terms, Factors, etc

class formula.parts.Factor(name, levels, char='b', coding='indicator', reference=None)¶

A qualitative variable in a regression model

A Factor is similar to R’s factor. The levels of the Factor can be either strings or ints.

Methods

`fromcol`
`get_term`

drop_reference¶: The drop_reference formula: a binary column for each level of the factor except self.reference.

formula¶

static fromcol(col, name)¶

Create a Factor from a column array.

Parameters :

col : ndarray

an array with ndim==1

name : str

name of the Factor

Returns :

factor : Factor

Examples

>>> data = np.array([(3,'a'),(4,'a'),(5,'b'),(3,'b')], np.dtype([('x', np.float), ('y', 'S1')]))
>>> f1 = Factor.fromcol(data['y'], 'y')
>>> f2 = Factor.fromcol(data['x'], 'x')
>>> d = f1.formula.design(data)
>>> print d.dtype.descr
[('y_a', '<f8'), ('y_b', '<f8')]
>>> d = f2.formula.design(data)
>>> print d.dtype.descr
[('x_3', '<f8'), ('x_4', '<f8'), ('x_5', '<f8')]

get_term(level)¶: Retrieve a term of the Factor...

indicator¶: The indicator formula: a binary column for each level of the factor.

class formula.parts.FactorTerm¶

Boolean Term derived from a Factor.

Its properties are the same as a Term except that its product with itself is itself.

Methods

`apart`
`args_cnc`
`as_base_exp`
`as_coeff_Mul`
`as_coeff_add`
`as_coeff_exponent`
`as_coeff_factors`
`as_coeff_mul`
`as_coeff_terms`
`as_coefficient`
`as_dummy`
`as_expr`
`as_independent`
`as_leading_term`
`as_numer_denom`
`as_ordered_factors`
`as_ordered_terms`
`as_poly`
`as_powers_dict`
`as_real_imag`
`as_terms`
`atoms`
`cancel`
`class_key`
`coeff`
`collect`
`combsimp`
`compare`
`compare_pretty`
`compute_leading_term`
`conjugate`
`could_extract_minus_sign`
`count`
`count_ops`
`diff`
`doit`
`dummy_eq`
`evalf`
`expand`
`extract_additively`
`extract_multiplicatively`
`factor`
`find`
`fromiter`
`getO`
`getn`
`has`
`integrate`
`invert`
`is_hypergeometric`
`is_polynomial`
`is_rational_function`
`iter_basic_args`
`leadterm`
`limit`
`lseries`
`match`
`matches`
`n`
`normal`
`nseries`
`nsimplify`
`powsimp`
`radsimp`
`ratsimp`
`refine`
`removeO`
`replace`
`rewrite`
`separate`
`series`
`simplify`
`sort_key`
`subs`
`together`
`trigsimp`

class formula.parts.Term¶

A sympy.Symbol type to represent a term an a regression model

Terms can be added to other sympy expressions with the single convention that a term plus itself returns itself.

It is meant to emulate something on the right hand side of a formula in R. In particular, its name can be the name of a field in a recarray used to create a design matrix.

>>> t = Term('x')
>>> xval = np.array([(3,),(4,),(5,)], np.dtype([('x', np.float)]))
>>> f = t.formula
>>> d = f.design(xval)
>>> print d.dtype.descr
[('x', '<f8')]
>>> f.design(xval, return_float=True)
array([ 3.,  4.,  5.])

Methods

`apart`
`args_cnc`
`as_base_exp`
`as_coeff_Mul`
`as_coeff_add`
`as_coeff_exponent`
`as_coeff_factors`
`as_coeff_mul`
`as_coeff_terms`
`as_coefficient`
`as_dummy`
`as_expr`
`as_independent`
`as_leading_term`
`as_numer_denom`
`as_ordered_factors`
`as_ordered_terms`
`as_poly`
`as_powers_dict`
`as_real_imag`
`as_terms`
`atoms`
`cancel`
`class_key`
`coeff`
`collect`
`combsimp`
`compare`
`compare_pretty`
`compute_leading_term`
`conjugate`
`could_extract_minus_sign`
`count`
`count_ops`
`diff`
`doit`
`dummy_eq`
`evalf`
`expand`
`extract_additively`
`extract_multiplicatively`
`factor`
`find`
`fromiter`
`getO`
`getn`
`has`
`integrate`
`invert`
`is_hypergeometric`
`is_polynomial`
`is_rational_function`
`iter_basic_args`
`leadterm`
`limit`
`lseries`
`match`
`matches`
`n`
`normal`
`nseries`
`nsimplify`
`powsimp`
`radsimp`
`ratsimp`
`refine`
`removeO`
`replace`
`rewrite`
`separate`
`series`
`simplify`
`sort_key`
`subs`
`together`
`trigsimp`

formula¶: Return a Formula with only terms=[self].

formula.parts.fromrec(recarr)¶

Create Terms and Factors from structured array

We assume fields of type object and string are Factors, all others are Terms.

Parameters :

recarr : ndarray

array with composite dtype

Returns :

facterms : dict

dict with keys of recarr dtype field names, and values being a Factor (if the field was object or string type) and a Term otherwise.

Examples

>>> arr = np.array([(100,'blue'), (0, 'red')], dtype=
...                [('awesomeness','i'), ('shirt','S7')])
>>> teams = fromrec(arr)
>>> is_term(teams['awesomeness'])
True
>>> is_factor(teams['shirt'])
True

formula.parts.getparams(expression)¶

Return the parameters of an expression that are not Term instances but are instances of sympy.Symbol.

Examples

>>> from formula import terms, Formula
>>> x, y, z = terms('x, y, z')
>>> f = Formula([x,y,z])
>>> getparams(f)
[]
>>> f.mean
_b0*x + _b1*y + _b2*z
>>> getparams(f.mean)
[_b0, _b1, _b2]
>>>
>>> th = sympy.Symbol('theta')
>>> f.mean*sympy.exp(th)
(_b0*x + _b1*y + _b2*z)*exp(theta)
>>> getparams(f.mean*sympy.exp(th))
[theta, _b0, _b1, _b2]

formula.parts.getterms(expression)¶

Return the all instances of Term in an expression.

Examples

>>> from formula import terms, Formula
>>> x, y, z = terms('x, y, z')
>>> f = Formula([x,y,z])
>>> getterms(f)
[x, y, z]
>>> getterms(f.mean)
[x, y, z]

formula.parts.is_factor(obj)¶: Is obj a Factor?

formula.parts.is_factor_term(obj)¶: Is obj a FactorTerm?

formula.parts.is_term(obj)¶: Is obj a Term?

formula.parts.stratify(factor, variable)¶

Create a new variable, stratified by the levels of a Factor.

Parameters :

variable : str or a simple sympy expression whose string representation

are all lower or upper case letters, i.e. it can be interpreted as a name

Returns :

formula : Formula

Formula whose mean has one parameter named _variable%d, for each level in factor.levels

Examples

>>> f = Factor('a', ['x','y'])
>>> sf = stratify(f, 'theta')
>>> sf.mean
_theta0*a_x + _theta1*a_y

Navigation

This Page

Quick search

formula.parts¶

Navigation

`formula.parts`¶