SQLite lets users define new functions for use in queries, and Apophenia uses this facility to define a few common functions.
select ran() from table
will produce a new random number between zero and one for every row of the input table, using gsl_rng_uniform
.
- The SQL standard includes the
count(x)
and avg(x)
aggregators, but statisticians are usually interested in higher moments as well—at least the variance. Therefore, SQL queries using the Apophenia library may include any of these moments:
select count(x), stddev(x), avg(x), var(x), variance(x), skew(x), kurt(x), kurtosis(x),
std(x), stddev_samp(x), stddev_pop(x), var_samp(x), var_pop(x)
from table
group by whatever
var
and variance
; kurt
and kurtosis
do the same thing. Choose the one that sounds better to you. var
, var_samp
, stddev
and stddev_samp
give sample variance/standard deviation; variance
, var_pop
std
and stddev_pop
give population standard deviation. The plethora of variants are for mySQL compatibility.
- The var/skew/kurtosis functions calculate sample moments, so if you want the population moment, multiply the result by (n-1)/n .
- Also provided: wrapper functions for standard math library functions—
sqrt(x)
, pow(x,y)
, exp(x)
, log(x)
, and trig functions. They call the standard math library function of the same name to calculate
,
,
,
,
,
, et cetera.
- The
ran()
function calls gsl_rng_uniform
to produce a uniform draw between zero and one. It keeps its own gsl_rng
, which is intialized on first call using the value of apop_ots.rng_seed
(which is then incremented, so the next function to use it will get a different seed).
select sqrt(x), pow(x,0.5), exp(x), log(x),
sin(x), cos(x), tan(x), asin(x), acos(x), atan(x)
from table
Here is a test script using many of the above.
#define Diff(L, R) assert(fabs((L)-(R)<1e-4));
#define getrow(rowname) apop_data_get(row, .colname=#rowname)
Diff(gsl_pow_2(getrow(root)), getrow(rr))
Diff(getrow(ln), getrow(l10)*log(10))
Diff(getrow(rr), getrow(rragain))
Diff(getrow(one), 1)
return 0;
}
int main(){
"insert into a values(1); "
"insert into a values(1); "
"create table randoms as "
"select ran() as rr "
"from a,a,a,a,a,a,a,a,a,a,a,a,a;");
"select rr, sqrt(rr) as root, "
"log(rr) as ln, log10(rr) as l10, "
"exp(log(rr)) as rragain, "
"pow(sin(rr),2)+pow(cos(rr),2) as one "
"from randoms");
Diff(
apop_var(rrow)*8191./8192., 1/12. );
sqrt(1/12.)*8192./8191);
}
Here is some more realistic sample code:
int main(){
for (int i=0; i< 1e5; i++)
"select a, pow(a, 2) as sq, pow(a, 0.5) as sqrt "
"from atab");
double matrix_pop_stddev = sqrt(
apop_data_get(cov)*(d->matrix->size1/(d->matrix->size1-1.)));
assert(fabs(db_pop_stddev - matrix_pop_stddev) < 1e-4);
double actual_stddev = sqrt(2*gsl_pow_3(.5)/3);
assert(fabs(db_pop_stddev - actual_stddev) < 1e-3);
float actual_sq_mean = 1./3;
assert(fabs(sq_mean - actual_sq_mean) < 1e-3);
float actual_sqrt_mean = 2./3;
assert(fabs(sqrt_mean - actual_sqrt_mean) < 1e-3);
}