Patterns in static

Apophenia

Functions
The GSL's histograms and Apophenia's PMFs.

Functions

apop_modelapop_model_to_pmf (apop_model *model, apop_data *binspec, long int draws, int bin_count, gsl_rng *rng)
 
apop_dataapop_histograms_test_goodness_of_fit (apop_model *observed, apop_model *expected)
 
apop_dataapop_test_kolmogorov (apop_model *m1, apop_model *m2)
 

Detailed Description

Function Documentation

apop_data* apop_histograms_test_goodness_of_fit ( apop_model observed,
apop_model expected 
)

Test the goodness-of-fit between two apop_pmf models.

If you send two histograms, I assume that the histograms are synced: for PMFs, you've used apop_data_to_bins to generate two histograms using the same binspec, or you've used apop_data_pmf_compress to guarantee that each observation value appears exactly once in each data set.

In any case, you are confident that all values in the observed set appear in the expected set with nonzero weight; otherwise this will return a $\chi^2$ statistic of GSL_POSINF, indicating that it is impossible for the observed data to have been drawn from the expected distribution.

  • If an observation row has weight zero, I skip it. if apop_opts.verbose >=1 I will show a warning.
apop_model* apop_model_to_pmf ( apop_model model,
apop_data binspec,
long int  draws,
int  bin_count,
gsl_rng *  rng 
)

Make random draws from an apop_model, and bin them using a binspec in the style of apop_data_to_bins. If you have a data set that used the same binspec, you now have synced histograms, which you can plot or sensibly test hypotheses about.

The output is normalized to integrate to one.

Parameters
binspecA description of the bins in which to place the draws; see apop_data_to_bins. (default: as in apop_data_to_bins.)
modelThe model to be drawn from. Because this function works via random draws, the model needs to have a draw method. (No default)
drawsThe number of random draws to make. (arbitrary default = 10,000)
bin_countIf no bin spec, the number of bins to use (default: as per apop_data_to_bins, $\sqrt(N)$)
rngThe gsl_rng used to make random draws. (default: an RNG from apop_rng_get_thread)
Returns
An apop_pmf model.
apop_data* apop_test_kolmogorov ( apop_model m1,
apop_model m2 
)

Run the Kolmogorov-Smirnov test to determine whether two distributions are identical.

Parameters
m1A sorted PMF model. I.e., a model estimated via something like
1 apop_model *m1 = apop_estimate(apop_data_sort(input_data), apop_pmf);
Parameters
m2Another apop_model. If it is a PMF, then I will use a two-sample test, which is different from the one-sample test used if this is not a PMF.
Returns
An apop_data set including the $p$-value from the Kolmogorov-Smirnov test that the two distributions are equal.
Exceptions
out->error='m'Model error: m1 is not an apop_pmf. I verify this by checking whether m1->cdf == apop_pmf->cdf.
  • If you are using a apop_pmf model, the data set(s) must be sorted before you call this. See apop_data_sort and the discussion of CDFs in the apop_pmf documentation. If you don't do this, the test will almost certainly reject the null hypothesis that m1 and m2 are identical.

Here is an example, which tests whether a set of draws from a Normal(0, 1) matches a sequence of Normal distributions with increasing mean.

#include <apop.h>
//This program finds the p-value of a K-S test between
//500 draws from a N(0, 1) and a N(x, 1), where x grows from 0 to 1.
apop_model * model_to_pmfs(apop_model *m1, int size){
apop_data *outd1 = apop_model_draws(m1, size);
}
#ifndef Testing
#define cprintf(...) printf(__VA_ARGS__)
#else
#define cprintf(...)
#endif
int main(){
apop_model *pmf1 = model_to_pmfs(n1, 5e2);
apop_data *ktest;
//first, there should be zero divergence between a PMF and itself:
apop_model *pmf2 = apop_model_copy(pmf1);
ktest = apop_test_kolmogorov(pmf1, pmf2);
double pval = apop_data_get(ktest, .rowname="p value, 2 tail");
assert(pval > .999);
//as the mean m drifts, the pval for a comparison
//between a N(0, 1) and N(m, 1) gets smaller.
cprintf("mean\tpval\n");
double prior_pval = 18;
for(double i=0; i<= .6; i+=0.2){
ktest = apop_test_kolmogorov(pmf1, n11);
#ifndef Testing
#endif
double pval = apop_data_get(ktest, .rowname="p value, 2 tail");
assert(pval < prior_pval);
cprintf("%g\t%g\n", i, pval);
prior_pval = pval;
}
}

Autogenerated by doxygen on Wed Oct 15 2014 (Debian 0.999b+ds3-2).