Patterns in static

Apophenia

Data Structures | Macros | Functions | Variables
apop_conversions.c File Reference

Data Structures

struct  line_parse_t
 
struct  apop_char_info
 

Macros

#define DbType   apop_opts.db_engine=='m' ? "text" : "character"
 
#define DbType2   apop_opts.db_engine=='m' ? "double" : "numeric"
 
#define Textrealloc(str, len)
 

Functions

void xprintf (char **q, char *format,...)
 
gsl_vector * apop_array_to_vector (double *in, int size)
 
gsl_matrix * apop_vector_to_matrix (const gsl_vector *in, char row_col)
 
apop_dataapop_db_to_crosstab (char *tabname, char *r1, char *r2, char *datacol)
 
void apop_crosstab_to_db (apop_data *in, char *tabname, char *row_col_name, char *col_col_name, char *data_col_name)
 
apop_dataapop_data_rank_compress (apop_data *in)
 
apop_dataapop_data_rank_expand (apop_data *in)
 
gsl_vector * apop_vector_copy (const gsl_vector *in)
 
gsl_matrix * apop_matrix_copy (const gsl_matrix *in)
 
apop_dataapop_text_to_data (char const *text_file, int has_row_names, int has_col_names, int const *field_ends, char const *delimiters)
 
void apop_data_unpack (const gsl_vector *in, apop_data *d, char use_info_pages)
 
gsl_vector * apop_data_pack (const apop_data *in, gsl_vector *out, char all_pages, char use_info_pages)
 
apop_dataapop_data_fill_base (apop_data *in, double ap[])
 
gsl_vector * apop_vector_fill_base (gsl_vector *in, double ap[])
 
apop_dataapop_text_fill_base (apop_data *data, char *text[])
 
char * prep_string_for_sqlite (int prepped_statements, char const *astring)
 
int apop_use_sqlite_prepared_statements (size_t col_ct)
 
int apop_prepare_prepared_statements (char const *tabname, size_t col_ct, sqlite3_stmt **statement)
 
char * cut_at_dot (char const *infile)
 
int apop_text_to_db (char const *text_file, char *tabname, int has_row_names, int has_col_names, char **field_names, int const *field_ends, apop_data *field_params, char *table_params, char const *delimiters, char if_table_exists)
 

Variables

char * apop_nul_string
 
sqlite3 * db
 

Detailed Description

The various functions to convert from one format to another.

Macro Definition Documentation

#define Textrealloc (   str,
  len 
)
Value:
(str) = \
(str) != apop_nul_string \
? realloc((str), (len)) \
: (((len) > 0) ? malloc(len) : apop_nul_string);

Function Documentation

void apop_crosstab_to_db ( apop_data in,
char *  tabname,
char *  row_col_name,
char *  col_col_name,
char *  data_col_name 
)

See apop_db_to_crosstab for the storyline; this is the complement, which takes a crosstab and writes its values to the database.

For example, I would take

c0c1
r023
r104

and do the following writes to the database:

1 insert into your_table values ('r0', 'c0', 2);
2 insert into your_table values ('r0', 'c1', 3);
3 insert into your_table values ('r1', 'c0', 3);
4 insert into your_table values ('r1', 'c1', 4);
  • If your data set does not have names (or not enough names), I will use the scheme above, filling in names of the form r0, r1, ... c0, c1, .... Text columns get their own numbering system, t0, t1, ..., which is a little more robust than continuing the column count from the matrix.
  • I handle only the matrix and text.
apop_data* apop_data_rank_compress ( apop_data in)

One often finds data where the column indicates the value of the data point. There may be two columns, and a mark in the first indicates a miss while a mark in the second is a hit. Or say that we have the following list of observations:

1 2 3 3 2 1 1 2 1 1 2 1 1

Then we could write this as:

1 0 1 2 3
2 ----------
3 0 6 4 2

because there are six 1s observed, four 2s observed, and two 3s observed. We call this rank format, because 1 (or zero) is typically the most common, 2 is second most common, et cetera.

This function takes in a list of observations, and aggregates them into a single row in rank format.

  • You may be interested in apop_data_to_factors to convert real numbers or text into a matrix of categories.
  • The number of bins is simply the largest number found. So if there are bins {0, 1, 2} and your data set happens to consist of 0 0 1 1 0, then I won't know to generate results with three bins where the last bin has probability zero.
/* A round trip: generate Zipf-distributed draws, summarize them to a single list of
rankings, then expand the rankings to a list of single entries. The sorted list at the end
of this should be identical to the (sorted) original list. */
#include <apop.h>
int main(){
gsl_rng *r = apop_rng_alloc(2342);
int i, length = 1e4;
apop_data *draws = apop_data_alloc(length);
for (i=0; i< length; i++)
apop_draw(apop_data_ptr(draws, i, -1), r, a_zipf);
apop_data *by_rankings = apop_data_rank_compress(draws);
//The first row of the matrix is suitable for plotting.
//apop_data_show(by_rankings);
assert(apop_matrix_sum(by_rankings->matrix) == length);
apop_data *re_expanded = apop_data_rank_expand(by_rankings);
gsl_sort_vector(draws->vector);
gsl_sort_vector(re_expanded->vector);
assert(apop_vector_distance(draws->vector, re_expanded->vector) < 1e-5);
}
apop_data* apop_data_rank_expand ( apop_data in)

The complement to this is apop_data_rank_compress; see that function's documentation for the story and an example.

This function takes in a data set where the zeroth column includes the count(s) of times that zero was observed, the first gives the count(s) of times that one was observed, et cetera. It outputs a data set whose vector element includes a list that has exactly the given frequency of zeros, ones, et cetera.

apop_data* apop_db_to_crosstab ( char *  tabname,
char *  r1,
char *  r2,
char *  datacol 
)

Give the name of a table in the database, and names of three of its columns: the x-dimension, the y-dimension, and the data. the output is a 2D matrix with rows indexed by r1 and cols by r2.

Parameters
tabnameThe database table I'm querying. Anything that will work inside a from clause is OK, such as a subquery in parens.
r1The column of the data set that will indicate the rows of the output crosstab
r2The column of the data set that will indicate the columns of the output crosstab
datacolThe column of the data set holding the data for the cells of the crosstab
  • If the query to get data to fill the table (select r1, r2, datacol from tabname) returns an empty data set, then I will return a NULL data set and if apop_opts.verbosity >= 1 print a warning.
  • This setup presumes that there is one value for each (row, col) coordinate in the data. You may want an aggregate instead. There are two ways to do this, both of which hack the fact that this function runs a simple select query to generate the data. One is to specify an ad hoc table to pull from:
1 apop_data * out = apop_db_to_crosstab("(select row, col, count(*) ct from base_data group by row, col)", "row", "col", "ct");

The other is to use the fact that the table name will be at the end of the query, so you can add conditions to the table:

1 apop_data * out = apop_db_to_crosstab("base_data group by row, col", "row", "col", "count(*)");
2 //which will expand to "select row, col, count(*) from base_data group by row, col"
See also
apop_crosstab_to_db
Exceptions
out->error='n'Name not found error.
out->error='q'Query returned an empty table (which might mean that it just failed).
char* prep_string_for_sqlite ( int  prepped_statements,
char const *  astring 
)

–If the string has zero length, then it's probably a missing value. –If the string isn't a number, it needs quotes

Autogenerated by doxygen on Wed Oct 15 2014 (Debian 0.999b+ds3-2).