Create a histogram from data by putting data into bins of fixed width.
- Parameters
-
indata | The input data that will be binned. This is copied and the copy will be modified. |
close_top_bin | Normally, a bin covers the range from the point equal to its minimum to points strictly less than the minimum plus the width. if 'y' , then the top bin includes points less than or equal to the upper bound. This solves the problem of displaying histograms where the top bin is just one point. |
binspec | This is an apop_data set with the same number of columns as indata . If you want a fixed size for the bins, then the first row of the bin spec is the bin width for each column. This allows you to specify a width for each dimension, or specify the same size for all with something like: |
bin_count | If you don't provide a bin spec, I'll provide this many evenly-sized bins. Default: . 1 Apop_row(indata, 0, firstrow);
2 apop_data *binspec = apop_data_copy(firstrow);
3 gsl_matrix_set_all(binspec->matrix, 10); //bins of size 10 for all dim.s
4 apop_data_to_bins(indata, binspec);
The presumption is that the first bin starts at zero in all cases. You can add a second row to the spec to give the offset for each dimension. Default: NULL. if no binspec and no binlist, then a grid with offset equal to the min of the column, and bin size such that it takes bins to cover the range to the max element. |
- Returns
- A pointer to a binned apop_data set. If you didn't give me a binspec, then I attach one to the output set as a page named
<binspec>
, so you can snap a second data set to the same grid using 1 apop_data_to_bins(first_set, NULL);
2 apop_data_to_bins(second_set, apop_data_get_page(first_set, "<binspec>"));
The text segment, if any, is not binned. I use apop_data_pmf_compress as the final step in the binning, and that does respect the text segment.
Here is a sample program highlighting the difference between apop_data_to_bins and apop_data_pmf_compress .
#define _GNU_SOURCE
#ifdef Testing
#define printdata(dataset) ;
#else
#define printdata(dataset) \
printf("\n-----------\n\n"); \
apop_data_print(dataset);
#endif
int main(){
asprintf(&d->names->title, "Original data set");
printdata(d);
asprintf(&binned->names->title, "Post binning");
printdata(binned);
assert(fabs(
asprintf(&d->names->title, "Post compression");
printdata(d);
assert(fabs(
apop_p(firstrow, d_as_pmf) - 2./6 < 1e-5));
}