Patterns in static

Apophenia

Typedefs | Functions
apop_regression.c File Reference

Typedefs

typedef const char * ccp
 

Functions

void apop_estimate_parameter_tests (apop_model *est)
 
gsl_vector * apop_vector_unique_elements (const gsl_vector *v)
 
apop_dataapop_text_unique_elements (const apop_data *d, size_t col)
 
apop_dataapop_data_to_dummies (apop_data *d, int col, char type, int keep_first, char append, char remove)
 
apop_dataapop_data_to_factors (apop_data *data, char intype, int incol, int outcol)
 
apop_dataapop_data_get_factor_names (apop_data *data, int col, char type)
 
apop_dataapop_text_to_factors (apop_data *d, size_t textcol, int datacol)
 
apop_dataapop_estimate_coefficient_of_determination (apop_model *m)
 

Detailed Description

Generally, if it assumes something is Normally distributed, it's here.

Function Documentation

apop_data* apop_data_get_factor_names ( apop_data data,
int  col,
char  type 
)

Factor names are stored in an auxiliary table with a name like "<categories for your_var>". Producing this name is annoying (and prevents us from eventually making it human-language independent), so use this function to get the list of factor names.

Parameters
dataThe data set. (No default, must not be NULL)
colThe column in the main data set whose name I'll use to check for the factor name list. Vector==-1. (default=0)
typeIf you are referring to a text column, use 't'. (default='d')
Returns
A pointer to the page in the data set with the given factor names.
apop_data* apop_data_to_dummies ( apop_data d,
int  col,
char  type,
int  keep_first,
char  append,
char  remove 
)

A utility to make a matrix of dummy variables. You give me a single vector that lists the category number for each item, and I'll produce a matrix with a single one in each row in the column specified.

After that, you have to decide what to do with the new matrix and the original data column.

  • You can manually join the dummy data set with your main data, e.g.:
    1 apop_data *dummies = apop_data_to_dummies(main_regression_vars, .col=8, .type='t');
    2 apop_data_stack(main_regression_vars, dummies, 'c', .inplace='y');
  • The .remove='y' option specifies that I should use apop_data_rm_columns to remove the column used to generate the dummies. Implemented only for type=='d'.
  • By specifying .append='y' or .append='e' I will run the above two lines for you. Your apop_data pointer will not change, but its matrix element will be reallocated (via apop_data_stack).
  • By specifying .append='i', I will place the matrix of dummies in place, immediately after the data column you had specified. You will probably use this with .remove='y' to replace the single column with the new set of dummy columns. Bear in mind that if there are two or more dummy columns (which there probably are if you are bothering to use this function), subsequent column numbers will change.
  • If .append='i' and you asked for a text column, I will append to the end of the table, which is equivalent to append='e'.
Parameters
dThe data set with the column to be dummified (No default.)
colThe column number to be transformed; -1==vector (default = 0)
type'd'==data column, 't'==text column. (default = 't')
keep_firstif zero, return a matrix where each row has a one in the (column specified MINUS ONE). That is, the zeroth category is dropped, the first category has an entry in column zero, et cetera. If you don't know why this is useful, then this is what you need. If you know what you're doing and need something special, set this to one and the first category won't be dropped. (default = 0)
appendIf 'e' or 'y', append the dummy grid to the end of the original data matrix. If 'i', insert in place, immediately after the original data column. (default = 'n')
removeIf 'y', remove the original data or text column. (default = 'n')
Returns
An apop_data set whose matrix element is the one-zero matrix of dummies. If you used .append, then this is the main matrix. Also, I add a page named "\<categories for your_var\>" giving a reference table of names and column numbers (where your_var is the appropriate column heading).
Exceptions
out->error=='a'allocation error
out->error=='d'dimension error
apop_data* apop_data_to_factors ( apop_data data,
char  intype,
int  incol,
int  outcol 
)

Convert a column of text or numbers into a column of numeric factors, which you can use for a multinomial probit/logit, for example.

If you don't run this on your data first, apop_probit and apop_logit default to running it on the vector or (if no vector) zeroth column of the matrix of the input apop_data set, because those models need a list of the unique values of the dependent variable.

Parameters
dataThe data set to be modified in place. (No default. If NULL, returns NULL and a warning)
intypeIf 't', then incol refers to text, otherwise ('d' is a good choice) refers to the vector or matrix. Default = 't'.
incolThe column in the text that will be converted. -1 is the vector. Default = 0.
outcolThe column in the data set where the numeric factors will be written (-1 means the vector). Default = 0.

For example:

1 apop_data *d = apop_query_to_mixed_data("mmt", "select 1, year, color from data");
2 apop_data_to_factors(d);

Notice that the query pulled a column of ones for the sake of saving room for the factors. It reads column zero of the text, and writes it to column zero of the matrix.

Another example:

1 apop_data *d = apop_query_to_data("mmt", "select type, year from data");
2 apop_data_to_factors(d, .intype='d', .incol=0, .outcol=0);

Here, the type column is converted to sequential integer factors and those factors overwrite the original data. Since a reference table is added as a second page of the apop_data set, you can recover the original values as needed.

Returns
A table of the factors used in the code. This is an apop_data set with only one column of text. Also, I add a page named "<categories for your_var>" giving a reference table of names and column numbers (where your_var is the appropriate column heading) use apop_data_get_factor_names to retrieve that table.
Exceptions
out->error=='a'allocation error.
out->error=='d'dimension error.
  • If the vector or matrix you wanted to write to is NULL, I will allocate it for you.
  • This function uses the Designated initializers syntax for inputs.
void apop_estimate_parameter_tests ( apop_model est)

For many, it is a knee-jerk reaction to a parameter estimation to test whether each individual parameter differs from zero. This function does that.

Parameters
estThe apop_model, which includes pre-calculated parameter estimates, var-covar matrix, and the original data set.

Returns nothing. At the end of the routine, est->info->more includes a set of t-test values: p value, confidence (=1-pval), t statistic, standard deviation, one-tailed Pval, one-tailed confidence.

apop_data* apop_text_to_factors ( apop_data d,
size_t  textcol,
int  datacol 
)

Deprecated. Use apop_data_to_factors.

Convert a column of text in the text portion of an apop_data set into a column of numeric elements, which you can use for a multinomial probit, for example.

Parameters
dThe data set to be modified in place.
datacolThe column in the data set where the numeric factors will be written (-1 means the vector, which I will allocate for you if it is NULL)
textcolThe column in the text that will be converted.

For example:

1 apop_data *d = apop_query_to_mixed_data("mmt", "select 1, year, color from data");
2 apop_text_to_factors(d, 0, 0);

Notice that the query pulled a column of ones for the sake of saving room for the factors.

Returns
A table of the factors used in the code. This is an apop_data set with only one column of text. Also, the more element is a reference table of names and column numbers.
Exceptions
out->error=='d'dimension error.
apop_data* apop_text_unique_elements ( const apop_data d,
size_t  col 
)

Give me a column of text, and I'll give you a sorted list of the unique elements. This is basically running "select distinct * from datacolumn", but without the aid of the database.

Parameters
dAn apop_data set with a text component
colThe text column you want me to use.
Returns
An apop_data set with a single sorted column of text, where each unique text input appears once.
See also
apop_vector_unique_elements
gsl_vector* apop_vector_unique_elements ( const gsl_vector *  v)

Give me a vector of numbers, and I'll give you a sorted list of the unique elements. This is basically running "select distinct datacol from data order by datacol", but without the aid of the database.

Parameters
va vector of items
Returns
a sorted vector of the distinct elements that appear in the input.
  • NaNs appear at the end of the sort order.
See also
apop_text_unique_elements

Autogenerated by doxygen on Wed Oct 15 2014 (Debian 0.999b+ds3-2).