Parts of formulae: Terms, Factors, etc
A qualitative variable in a regression model
A Factor is similar to R’s factor. The levels of the Factor can be either strings or ints.
Methods
| fromcol | |
| get_term |
The drop_reference formula: a binary column for each level of the factor except self.reference.
Create a Factor from a column array.
| Parameters : | col : ndarray
name : str
|
|---|---|
| Returns : | factor : Factor |
Examples
>>> data = np.array([(3,'a'),(4,'a'),(5,'b'),(3,'b')], np.dtype([('x', np.float), ('y', 'S1')]))
>>> f1 = Factor.fromcol(data['y'], 'y')
>>> f2 = Factor.fromcol(data['x'], 'x')
>>> d = f1.formula.design(data)
>>> print d.dtype.descr
[('y_a', '<f8'), ('y_b', '<f8')]
>>> d = f2.formula.design(data)
>>> print d.dtype.descr
[('x_3', '<f8'), ('x_4', '<f8'), ('x_5', '<f8')]
Retrieve a term of the Factor...
The indicator formula: a binary column for each level of the factor.
Boolean Term derived from a Factor.
Its properties are the same as a Term except that its product with itself is itself.
Methods
| apart | |
| args_cnc | |
| as_base_exp | |
| as_coeff_Mul | |
| as_coeff_add | |
| as_coeff_exponent | |
| as_coeff_factors | |
| as_coeff_mul | |
| as_coeff_terms | |
| as_coefficient | |
| as_dummy | |
| as_expr | |
| as_independent | |
| as_leading_term | |
| as_numer_denom | |
| as_ordered_factors | |
| as_ordered_terms | |
| as_poly | |
| as_powers_dict | |
| as_real_imag | |
| as_terms | |
| atoms | |
| cancel | |
| class_key | |
| coeff | |
| collect | |
| combsimp | |
| compare | |
| compare_pretty | |
| compute_leading_term | |
| conjugate | |
| could_extract_minus_sign | |
| count | |
| count_ops | |
| diff | |
| doit | |
| dummy_eq | |
| evalf | |
| expand | |
| extract_additively | |
| extract_multiplicatively | |
| factor | |
| find | |
| fromiter | |
| getO | |
| getn | |
| has | |
| integrate | |
| invert | |
| is_hypergeometric | |
| is_polynomial | |
| is_rational_function | |
| iter_basic_args | |
| leadterm | |
| limit | |
| lseries | |
| match | |
| matches | |
| n | |
| normal | |
| nseries | |
| nsimplify | |
| powsimp | |
| radsimp | |
| ratsimp | |
| refine | |
| removeO | |
| replace | |
| rewrite | |
| separate | |
| series | |
| simplify | |
| sort_key | |
| subs | |
| together | |
| trigsimp |
A sympy.Symbol type to represent a term an a regression model
Terms can be added to other sympy expressions with the single convention that a term plus itself returns itself.
It is meant to emulate something on the right hand side of a formula in R. In particular, its name can be the name of a field in a recarray used to create a design matrix.
>>> t = Term('x')
>>> xval = np.array([(3,),(4,),(5,)], np.dtype([('x', np.float)]))
>>> f = t.formula
>>> d = f.design(xval)
>>> print d.dtype.descr
[('x', '<f8')]
>>> f.design(xval, return_float=True)
array([ 3., 4., 5.])
Methods
| apart | |
| args_cnc | |
| as_base_exp | |
| as_coeff_Mul | |
| as_coeff_add | |
| as_coeff_exponent | |
| as_coeff_factors | |
| as_coeff_mul | |
| as_coeff_terms | |
| as_coefficient | |
| as_dummy | |
| as_expr | |
| as_independent | |
| as_leading_term | |
| as_numer_denom | |
| as_ordered_factors | |
| as_ordered_terms | |
| as_poly | |
| as_powers_dict | |
| as_real_imag | |
| as_terms | |
| atoms | |
| cancel | |
| class_key | |
| coeff | |
| collect | |
| combsimp | |
| compare | |
| compare_pretty | |
| compute_leading_term | |
| conjugate | |
| could_extract_minus_sign | |
| count | |
| count_ops | |
| diff | |
| doit | |
| dummy_eq | |
| evalf | |
| expand | |
| extract_additively | |
| extract_multiplicatively | |
| factor | |
| find | |
| fromiter | |
| getO | |
| getn | |
| has | |
| integrate | |
| invert | |
| is_hypergeometric | |
| is_polynomial | |
| is_rational_function | |
| iter_basic_args | |
| leadterm | |
| limit | |
| lseries | |
| match | |
| matches | |
| n | |
| normal | |
| nseries | |
| nsimplify | |
| powsimp | |
| radsimp | |
| ratsimp | |
| refine | |
| removeO | |
| replace | |
| rewrite | |
| separate | |
| series | |
| simplify | |
| sort_key | |
| subs | |
| together | |
| trigsimp |
Return a Formula with only terms=[self].
Create Terms and Factors from structured array
We assume fields of type object and string are Factors, all others are Terms.
| Parameters : | recarr : ndarray
|
|---|---|
| Returns : | facterms : dict
|
Examples
>>> arr = np.array([(100,'blue'), (0, 'red')], dtype=
... [('awesomeness','i'), ('shirt','S7')])
>>> teams = fromrec(arr)
>>> is_term(teams['awesomeness'])
True
>>> is_factor(teams['shirt'])
True
Return the parameters of an expression that are not Term instances but are instances of sympy.Symbol.
Examples
>>> from formula import terms, Formula
>>> x, y, z = terms('x, y, z')
>>> f = Formula([x,y,z])
>>> getparams(f)
[]
>>> f.mean
_b0*x + _b1*y + _b2*z
>>> getparams(f.mean)
[_b0, _b1, _b2]
>>>
>>> th = sympy.Symbol('theta')
>>> f.mean*sympy.exp(th)
(_b0*x + _b1*y + _b2*z)*exp(theta)
>>> getparams(f.mean*sympy.exp(th))
[theta, _b0, _b1, _b2]
Return the all instances of Term in an expression.
Examples
>>> from formula import terms, Formula
>>> x, y, z = terms('x, y, z')
>>> f = Formula([x,y,z])
>>> getterms(f)
[x, y, z]
>>> getterms(f.mean)
[x, y, z]
Is obj a Factor?
Is obj a FactorTerm?
Is obj a Term?
Create a new variable, stratified by the levels of a Factor.
| Parameters : | variable : str or a simple sympy expression whose string representation
|
|---|---|
| Returns : | formula : Formula
|
Examples
>>> f = Factor('a', ['x','y'])
>>> sf = stratify(f, 'theta')
>>> sf.mean
_theta0*a_x + _theta1*a_y