Parts of formulae: Terms, Factors, etc
A qualitative variable in a regression model
A Factor is similar to R’s factor. The levels of the Factor can be either strings or ints.
Methods
fromcol | |
get_term |
The drop_reference formula: a binary column for each level of the factor except self.reference.
Create a Factor from a column array.
Parameters : | col : ndarray
name : str
|
---|---|
Returns : | factor : Factor |
Examples
>>> data = np.array([(3,'a'),(4,'a'),(5,'b'),(3,'b')], np.dtype([('x', np.float), ('y', 'S1')]))
>>> f1 = Factor.fromcol(data['y'], 'y')
>>> f2 = Factor.fromcol(data['x'], 'x')
>>> d = f1.formula.design(data)
>>> print d.dtype.descr
[('y_a', '<f8'), ('y_b', '<f8')]
>>> d = f2.formula.design(data)
>>> print d.dtype.descr
[('x_3', '<f8'), ('x_4', '<f8'), ('x_5', '<f8')]
Retrieve a term of the Factor...
The indicator formula: a binary column for each level of the factor.
Boolean Term derived from a Factor.
Its properties are the same as a Term except that its product with itself is itself.
Methods
apart | |
args_cnc | |
as_base_exp | |
as_coeff_Mul | |
as_coeff_add | |
as_coeff_exponent | |
as_coeff_factors | |
as_coeff_mul | |
as_coeff_terms | |
as_coefficient | |
as_dummy | |
as_expr | |
as_independent | |
as_leading_term | |
as_numer_denom | |
as_ordered_factors | |
as_ordered_terms | |
as_poly | |
as_powers_dict | |
as_real_imag | |
as_terms | |
atoms | |
cancel | |
class_key | |
coeff | |
collect | |
combsimp | |
compare | |
compare_pretty | |
compute_leading_term | |
conjugate | |
could_extract_minus_sign | |
count | |
count_ops | |
diff | |
doit | |
dummy_eq | |
evalf | |
expand | |
extract_additively | |
extract_multiplicatively | |
factor | |
find | |
fromiter | |
getO | |
getn | |
has | |
integrate | |
invert | |
is_hypergeometric | |
is_polynomial | |
is_rational_function | |
iter_basic_args | |
leadterm | |
limit | |
lseries | |
match | |
matches | |
n | |
normal | |
nseries | |
nsimplify | |
powsimp | |
radsimp | |
ratsimp | |
refine | |
removeO | |
replace | |
rewrite | |
separate | |
series | |
simplify | |
sort_key | |
subs | |
together | |
trigsimp |
A sympy.Symbol type to represent a term an a regression model
Terms can be added to other sympy expressions with the single convention that a term plus itself returns itself.
It is meant to emulate something on the right hand side of a formula in R. In particular, its name can be the name of a field in a recarray used to create a design matrix.
>>> t = Term('x')
>>> xval = np.array([(3,),(4,),(5,)], np.dtype([('x', np.float)]))
>>> f = t.formula
>>> d = f.design(xval)
>>> print d.dtype.descr
[('x', '<f8')]
>>> f.design(xval, return_float=True)
array([ 3., 4., 5.])
Methods
apart | |
args_cnc | |
as_base_exp | |
as_coeff_Mul | |
as_coeff_add | |
as_coeff_exponent | |
as_coeff_factors | |
as_coeff_mul | |
as_coeff_terms | |
as_coefficient | |
as_dummy | |
as_expr | |
as_independent | |
as_leading_term | |
as_numer_denom | |
as_ordered_factors | |
as_ordered_terms | |
as_poly | |
as_powers_dict | |
as_real_imag | |
as_terms | |
atoms | |
cancel | |
class_key | |
coeff | |
collect | |
combsimp | |
compare | |
compare_pretty | |
compute_leading_term | |
conjugate | |
could_extract_minus_sign | |
count | |
count_ops | |
diff | |
doit | |
dummy_eq | |
evalf | |
expand | |
extract_additively | |
extract_multiplicatively | |
factor | |
find | |
fromiter | |
getO | |
getn | |
has | |
integrate | |
invert | |
is_hypergeometric | |
is_polynomial | |
is_rational_function | |
iter_basic_args | |
leadterm | |
limit | |
lseries | |
match | |
matches | |
n | |
normal | |
nseries | |
nsimplify | |
powsimp | |
radsimp | |
ratsimp | |
refine | |
removeO | |
replace | |
rewrite | |
separate | |
series | |
simplify | |
sort_key | |
subs | |
together | |
trigsimp |
Return a Formula with only terms=[self].
Create Terms and Factors from structured array
We assume fields of type object and string are Factors, all others are Terms.
Parameters : | recarr : ndarray
|
---|---|
Returns : | facterms : dict
|
Examples
>>> arr = np.array([(100,'blue'), (0, 'red')], dtype=
... [('awesomeness','i'), ('shirt','S7')])
>>> teams = fromrec(arr)
>>> is_term(teams['awesomeness'])
True
>>> is_factor(teams['shirt'])
True
Return the parameters of an expression that are not Term instances but are instances of sympy.Symbol.
Examples
>>> from formula import terms, Formula
>>> x, y, z = terms('x, y, z')
>>> f = Formula([x,y,z])
>>> getparams(f)
[]
>>> f.mean
_b0*x + _b1*y + _b2*z
>>> getparams(f.mean)
[_b0, _b1, _b2]
>>>
>>> th = sympy.Symbol('theta')
>>> f.mean*sympy.exp(th)
(_b0*x + _b1*y + _b2*z)*exp(theta)
>>> getparams(f.mean*sympy.exp(th))
[theta, _b0, _b1, _b2]
Return the all instances of Term in an expression.
Examples
>>> from formula import terms, Formula
>>> x, y, z = terms('x, y, z')
>>> f = Formula([x,y,z])
>>> getterms(f)
[x, y, z]
>>> getterms(f.mean)
[x, y, z]
Is obj a Factor?
Is obj a FactorTerm?
Is obj a Term?
Create a new variable, stratified by the levels of a Factor.
Parameters : | variable : str or a simple sympy expression whose string representation
|
---|---|
Returns : | formula : Formula
|
Examples
>>> f = Factor('a', ['x','y'])
>>> sf = stratify(f, 'theta')
>>> sf.mean
_theta0*a_x + _theta1*a_y