Patterns in static

Apophenia

Functions
Maximum likelihood estimation

Functions

void apop_maximum_likelihood (apop_data *data, apop_model *dist)
 
apop_modelapop_estimate_restart (apop_model *e, apop_model *copy, char *starting_pt, double boundary)
 

Detailed Description

Notes on simulated annealing

Simulated annealing is a controlled random walk. As with the other methods, the system tries a new point, and if it is better, switches. Initially, the system is allowed to make large jumps, and then with each iteration, the jumps get smaller, eventually converging. Also, there is some decreasing probability that if the new point is {less} likely, it will still be chosen. Simulated annealing is best for situations where there may be multiple local optima. Early in the random walk, the system can readily jump from one to another; later it will fine-tune its way toward the optimum. The number of points tested is basically not dependent on the function: if you give it a 4,000 step program, that is basically how many steps it will take. If you know your function is globally convex (as are most standard probability functions), then this method is overkill.The GSL's simulated annealing system doesn't actually do very much. It basically provides a for loop that calls a half-dozen functions that we the users get to write. So, the file apop_mle.c handles all of this for you. The likelihood function is taken from the model, the metric is the Manhattan metric, the copy/destroy functions are just the usual vector-handling fns., et cetera. The reader who wants further control is welcome to override these functions.Verbosity: if ep->verbose==1, show likelihood, temp, &c. in a table; if ep->verbose>1, show that plus the vector of params.

Function Documentation

apop_model* apop_estimate_restart ( apop_model e,
apop_model copy,
char *  starting_pt,
double  boundary 
)

The simplest use of this function is to restart a model at the latest parameter estimates.

1 apop_model *m = apop_estimate(data, model_using_an_MLE_search);
2 for (int i=0; i< 10; i++)
3  m = apop_estimate_restart(m);
4 apop_data_show(m);

By adding a line to reduce the tolerance each round [e.g., Apop_settings_set(m, apop_mle, tolerance, pow(10,-i))], you can start broad and hone in on a precise optimum.

You may have a new estimation method, such as first doing a coarse simulated annealing search, then a fine conjugate gradient search. When reading this example, recall that the form for adding a new settings group differs from the form for modifying existing settings:

1 Apop_model_add_settings(your_base_model, apop_mle, .method=APOP_SIMAN);
2 apop_model *m = apop_estimate(data, your_base_model);
3 Apop_settings_set(m, apop_mle, method, APOP_CG_PR);
4 m = apop_estimate_restart(m);
5 apop_data_show(m);

Only one estimate is returned, either the one you sent in or a new one. The loser (which may be the one you sent in) is freed. That is, there is no memory leak in the above loop.

Parameters
eAn apop_model that is the output from a prior MLE estimation. (No default, must not be NULL.)
copyAnother not-yet-parametrized model that will be re-estimated with (1) the same data and (2) a starting_pt as per the next setting (probably to the parameters of e). If this is NULL, then copy e. (Default = NULL)
starting_pt"ep"=last estimate of the first model (i.e., its current parameter estimates); "es"= starting point originally used by the first model; "np"=current parameters of the new (second) model; "ns"=starting point specified by the new model's MLE settings. (default = "ep")
boundaryI test whether the starting point you give me is outside this certain bound, so I can warn you if there's divergence in your sequence of re-estimations. (default: 1e8)
Returns
At the end of this procedure, we'll have two apop_model structs: the one you sent in, and the one produced using the new method/scale. If the new estimate includes any NaNs/Infs, then the old estimate is returned (even if the old estimate included NaNs/Infs). Otherwise, the estimate with the largest log likelihood is returned.
void apop_maximum_likelihood ( apop_data data,
apop_model dist 
)

The maximum likelihood calculations.

  • I assume that apop_prep has been called on your model. The easiest way to guarantee this is to use apop_estimate, which calls this function if the input model has no estimate method.
  • All of the settings are specified by adding a apop_mle_settings struct to your model, so see the many notes there. Notably, the default method is the Fletcher-Reeves conjugate gradient method, and if your model does not have a dlog likelihood function, then a numeric gradient will be calculated via apop_numerical_gradient. Add a apop_mle_settings group to your model for other methods, including the Nelder-Mead simplex and simulated annealing.
Parameters
dataAn apop_data set.
distThe apop_model object: apop_gamma, apop_probit, apop_zipf, &c. You can add an apop_mle_settings struct to it (Apop_model_add_group(your_model, apop_mle, .verbose=1, .method="PR cg", and_so_on)). So, see the apop_mle_settings documentation for the many options, such as choice of method and tuning parameters.
Returns
None, but the input model is modified to include the parameter estimates, &c.
  • There is auxiliary info in the ->info element of the post-estimation struct. Get elements via, e.g.:
    1 apop_model *est = apop_estimate(your_data, apop_probit);
    2 int status = apop_data_get(est->info, .rowname="status");
    3 if (status)
    4  //trouble
    5 else
    6  //optimum found
  • During the search for an optimum, ctrl-C (SIGINT) will halt the search, and the function will return whatever parameters the search was on at the time.

Autogenerated by doxygen on Wed Oct 15 2014 (Debian 0.999b+ds3-2).