py4sci

Previous topic

formula.convenience

Next topic

formula.rutils

This Page

formula.utils

Utility routines

class formula.utils.Bomber(msg)

Raise exception for any attribute access

formula.utils.contrast_from_cols_or_rows(L, D, pseudo=None)

Construct a contrast matrix from a design matrix D

(possibly with its pseudo inverse already computed) and a matrix L that either specifies something in the column space of D or the row space of D.

Parameters :

L : ndarray

Matrix used to try and construct a contrast.

D : ndarray

Design matrix used to create the contrast.

pseudo : None or array-like, optional

If not None, gives pseudo-inverse of D. Allows you to pass this if it is already calculated.

Returns :

C : ndarray

Matrix with C.shape[1] == D.shape[1] representing an estimable contrast.

Notes

From an n x p design matrix D and a matrix L, tries to determine a p x q contrast matrix C which determines a contrast of full rank, i.e. the n x q matrix

dot(transpose(C), pinv(D))

is full rank.

L must satisfy either L.shape[0] == n or L.shape[1] == p.

If L.shape[0] == n, then L is thought of as representing columns in the column space of D.

If L.shape[1] == p, then L is thought of as what is known as a contrast matrix. In this case, this function returns an estimable contrast corresponding to the dot(D, L.T)

This always produces a meaningful contrast, not always with the intended properties because q is always non-zero unless L is identically 0. That is, it produces a contrast that spans the column space of L (after projection onto the column space of D).

formula.utils.factor_codings(*factor_monomials)

Find which factors to code with indicator or contrast variables

Determine which factors to code with indicator variables (using len(factor.levels) columns of 0s and 1s) or contrast coding (using len(factor.levels)-1). The elements of the sequence should be tuples of strings. Further, the factors are assumed to be in graded order, that is [len(f) for f in factor_monomials] is assumed non-decreasing.

Notes

Even though the elements of factor_monomials are assumed to be in graded order, the final result depends on the ordering of the strings of the factors within each of the tuples.

Examples

>>> factor_codings(('b',), ('a',), ('b', 'c'), ('a','b','c'))
{('b', 'c'): [('b', 'indicator'), ('c', 'contrast')], ('a',): [('a', 'contrast')], ('b',): [('b', 'indicator')], ('a', 'b', 'c'): [('a', 'contrast'), ('b', 'indicator'), ('c', 'indicator')]}
>>> factor_codings(('a',), ('b',), ('b', 'c'), ('a','b','c'))
{('b', 'c'): [('b', 'indicator'), ('c', 'contrast')], ('a',): [('a', 'indicator')], ('b',): [('b', 'contrast')], ('a', 'b', 'c'): [('a', 'contrast'), ('b', 'indicator'), ('c', 'indicator')]}

Here is a version with debug strings to see what is happening:

>>> factor_codings(('a',), ('b', 'c'), ('a','b','c')) 
Adding a from ('a',) as indicator because we have not seen any factors yet.
Adding b from ('b', 'c') as indicator because set([('c',), ()]) is not a subset of set([(), ('a',)])
Adding c from ('b', 'c') as indicator because set([(), ('b',)]) is not a subset of set([(), ('a',)])
Adding a from ('a', 'b', 'c') as contrast because set([('c',), ('b', 'c'), (), ('b',)]) is a subset of set([('b', 'c'), (), ('c',), ('b',), ('a',)])
Adding b from ('a', 'b', 'c') as indicator because set([('c',), (), ('a', 'c'), ('a',)]) is not a subset of set([('b', 'c'), (), ('c',), ('b',), ('a',)])
Adding c from ('a', 'b', 'c') as indicator because set([('a', 'b'), (), ('b',), ('a',)]) is not a subset of set([('b', 'c'), (), ('c',), ('b',), ('a',)])
{('b', 'c'): [('b', 'indicator'), ('c', 'indicator')], ('a',): [('a', 'indicator')], ('a', 'b', 'c'): [('a', 'contrast'), ('b', 'indicator'), ('c', 'indicator')]}
formula.utils.fullrank(X, r=None)

Return a matrix whose column span is the same as X using an SVD decomposition.

If the rank of X is known it can be specified by r– no check is made to ensure that this really is the rank of X.

formula.utils.is_string_like(obj)

Return True if obj looks like a string

formula.utils.iterable(obj)

return true if obj is iterable

formula.utils.make_dummy(name)

Make dummy variable of given name

Parameters :

name : str

name of dummy variable

Returns :

dum : Dummy instance

Notes

The interface to Dummy changed between 0.6.7 and 0.7.0

formula.utils.rank(M, tol=None)

Return matrix rank of array using SVD method

Rank of the array is the number of SVD singular values of the array that are greater than tol.

Parameters :

M : array_like

array of <=2 dimensions

tol : {None, float}

threshold below which SVD values are considered zero. If tol is None, and S is an array with singular values for M, and eps is the epsilon value for datatype of S, then tol is set to S.max() * eps.

formula.utils.rec_append_fields(rec, names, arrs, dtypes=None)

Return a new record array with field names populated with data from arrays in arrs. If appending a single field, then names, arrs and dtypes do not have to be lists. They can just be the values themselves.

formula.utils.simplicial_complex(*simplices)

Take a list of simplices and compute its the simplicial complex generated by these simplices, returning the maximal simplices of this complex.

>>> faces, maximal, all = simplicial_complex([('a','b','c'), ('c','d'), ('e','f'), ('e',)])
>>> faces
{1: set([('a',), ('c',), ('b',), ('e',), ('d',), ('f',)]), 2: set([('b', 'c'), ('a', 'b'), ('e', 'f'), ('c', 'd'), ('a', 'c')]), 3: set([('a', 'b', 'c')])}
>>> maximal
[('a', 'b', 'c'), ('e', 'f'), ('c', 'd')]
>>> all
set([('b', 'c'), ('a',), ('c',), ('c', 'd'), ('e', 'f'), ('a', 'c'), ('d',), ('b',), ('f',), ('a', 'b'), ('e',), ('a', 'b', 'c')])
>>>