Patterns in static

Apophenia

Functions
apop_tests.c File Reference

Functions

apop_dataapop_t_test (gsl_vector *a, gsl_vector *b)
 
apop_dataapop_paired_t_test (gsl_vector *a, gsl_vector *b)
 
apop_dataapop_f_test (apop_model *est, apop_data *contrast)
 
apop_dataapop_test_anova_independence (apop_data *d)
 
apop_dataapop_anova (char *table, char *data, char *grouping1, char *grouping2)
 
double apop_test (double statistic, char *distribution, double p1, double p2, char tail)
 

Function Documentation

apop_data* apop_anova ( char *  table,
char *  data,
char *  grouping1,
char *  grouping2 
)

This function produces a traditional one- or two-way ANOVA table. It works from data in an SQL table, using queries of the form select data from table group by grouping1, grouping2.

Parameters
tableThe table to be queried. Anything that can go in an SQL from clause is OK, so this can be a plain table name or a temp table specification like (select ... ), with parens.
dataThe name of the column holding the count or other such data
grouping1The name of the first column by which to group data
grouping2If this is NULL, then the function will return a one-way ANOVA. Otherwise, the name of the second column by which to group data in a two-way ANOVA.
apop_data* apop_f_test ( apop_model est,
apop_data contrast 
)

Runs an F-test specified by q and c. Your best bet is to see the chapter on hypothesis testing in Modeling With Data, p 309. It will tell you that:

\[{N-K\over q} {({\bf Q}'\hat\beta - {\bf c})' [{\bf Q}' ({\bf X}'{\bf X})^{-1} {\bf Q}]^{-1} ({\bf Q}' \hat\beta - {\bf c}) \over {\bf u}' {\bf u} } \sim F_{q,N-K},\]

and that's what this function is based on.

Parameters
estan apop_model that you have already calculated. (No default)
contrastThe matrix ${\bf Q}$ and the vector ${\bf c}$, where each row represents a hypothesis. (Defaults: if matrix is NULL, it is set to the identity matrix with the top row missing. If the vector is NULL, it is set to a zero matrix of length equal to the height of the contrast matrix. Thus, if the entire apop_data set is NULL or omitted, we are testing the hypothesis that all but $\beta_1$ are zero.)
Returns
An apop_data set with a few variants on the confidence with which we can reject the joint hypothesis.
Todo:
There should be a way to get OLS and GLS to store $(X'X)^{-1}$. In fact, if you did GLS, this is invalid, because you need $(X'\Sigma X)^{-1}$, and I didn't ask for $\Sigma$.
  • There are two approaches to an $F$-test: the ANOVA approach, which is typically built around the claim that all effects but the mean are zero; and the more general regression form, which allows for any set of linear claims about the data. If you send a NULL contrast set, I will generate the set of linear contrasts that are equivalent to the ANOVA-type approach. Readers of {Modeling with Data}, note that there's a bug in the book that claims that the traditional ANOVA approach also checks that the coefficient for the constant term is also zero; this is not the custom and doesn't produce the equivalence presented in that and other textbooks.
Exceptions
out->error='a'Allocation error.
out->error='d'dimension-matching error.
out->error='i'matrix inversion error.
out->error='m'GSL math error.
double apop_test ( double  statistic,
char *  distribution,
double  p1,
double  p2,
char  tail 
)

This is a convenience function to do the lookup of a given statistic along a given distribution. You give me a statistic, its (hypothesized) distribution, and whether to use the upper tail, lower tail, or both. I will return the odds of a Type I error given the model—in statistician jargon, the $p$-value. [Type I error: odds of rejecting the null hypothesis when it is true.]

For example,

1 apop_test(1.3);

will return the density of the standard Normal distribution that is more than 1.3 from zero. If this function returns a small value, we can be confident that the statistic is significant. Or,

1 apop_test(1.3, "t", 10, tail='u');

will give the appropriate odds for an upper-tailed test using the $t$-distribution with 10 degrees of freedom (e.g., a $t$-test of the null hypothesis that the statistic is less than or equal to zero).

Several more distributions are supported; see below.

  • For a two-tailed test (the default), this returns the density outside the range. I'll only do this for symmetric distributions.
  • For an upper-tail test ('u'), this returns the density above the cutoff
  • For a lower-tail test ('l'), this returns the density below the cutoff
Parameters
statisticThe scalar value to be tested.
distributionThe name of the distribution; see below.
p1The first parameter for the distribution; see below.
p2The second parameter for the distribution; see below.
tail'u' = upper tail; 'l' = lower tail; anything else = two-tailed. (default = two-tailed)
Returns
The odds of a Type I error given the model (the $p$-value).

Here is a list of distributions you can use, and their parameters.

"normal" or "gaussian"

  • p1=mu, p2=sigma
  • default (0, 1)

"lognormal"

  • p1=mu, p2=sigma
  • default (0, 1)
  • Remember, mu and sigma refer to the Normal one would get after exponentiation
  • One-tailed tests only

"uniform"

  • p1=lower edge, p2=upper edge
  • default (0, 1)
  • two-tailed tests are run relative to the center, (p1+p2)/2.

"t"

  • p1=df
  • no default

"chi squared", "chi", "chisq":

  • p1=df
  • no default
  • One-tailed tests only; default='u' ( $p$-value for typical cases)

"f"

  • p1=df1, p2=df2
  • no default
  • One-tailed tests only

Autogenerated by doxygen on Wed Oct 15 2014 (Debian 0.999b+ds3-2).