Ticket #904 (closed assessed: fixed)

Opened 8 years ago

Last modified 2 years ago

Add AIC (Akaike Information Criteria)

Reported by: villyc Owned by:
Priority: high Milestone: NOT SET
Component: Fit to Time Series Version: 6.5 beta
Severity: feature Keywords:
Cc:

Description (last modified by jeroens) (diff)

We need to give users information about how to avoid over fitting when doing the time series fitting. For this, we need a new routine for the time series fitting form. There are two options for how to do this, a simple and a more complicated which involves a series of runs. I'll describe both:

The simple version

We calculate the

AIC = 2 x number of search parameters + number of data points x ln (SS)

The number of search parameters is the number of vulnerabilities we estimate + the number of spline points + the number of years we find primary production anomalies for. The number of data points is more tricky. In principle it is:

Relative biomass = # data points –1 
Absolute biomass = # data points 
Catches = # data points

This is however likely to overestimate the real number of datapoints because of correlation between them. Numbers in a data series are not independent observations. Carl suggests that we divide the number of datapoints with 5. I'm inclined to say that each time series gives us 1-2 data points. In any case, I'll give directions for what we do, still discussing. So, we need to make this an entry with a default value, which users can overrule. On the form, we thus need: Data points: AIC = The easy way is to just write the AIC in the iterations field after a search is done.

The more complicated version

see the attached spreadsheet. I made the analysis here, and it can be done fairly simple as a batch run. For this we need to run through:

First make a run with time series loaded, calculate SS, then AIC with no search parameters

If vulnerability is checked then Reset vulnerabilities If anomaly is checked then reset the forcing function in use

If vulnerability then first run sensitivity, and find out the order in which to include predators in search. Search for vulnerability for the most sensitive predator; record SS and AIC; reset vulnerability ; search for the two most sensitive predators, etc until we search for all consumers.

If anomaly search is checked (but not vulnerability search): search for 0 spline points (done already, it's the same results as before we set any search parameters), then try 2, 3, 4, 5, 10, 15, 20, 25, 30, 35, up to number of years – 5. Then make an anomaly search with spline = 0, I.e. Search for primary production anomaly for all years.

Finally if both vulnerability and anomaly are searched, the leave the anomaly as is, I.e. If it is e.g., 3 spline points then we only do those 3 spline points. We now do exactly the same as for the vulnerability search above, only we reset both vulnerabilities and the forcing function before each run.

Perhaps we can just dump the results from this to a csv file?

Attachments

AIC.xlsx (53.1 KB) - added by jeroens 8 years ago.
AIC complicated version

Change History

comment:1 Changed 8 years ago by jeroens

  • Description modified (diff)

Changed 8 years ago by jeroens

AIC complicated version

comment:2 Changed 8 years ago by jeroens

  • Component changed from Overall to Fit to Time Series

comment:3 Changed 8 years ago by joeb

A simple version has been implemented. 

A user can run multiple trials with a different number of blocks (V's) selected. For each trial the SS and Akaike Information Criteria are saved to a grid. The number of AIC data points can be change updating the AIC value in the grid.

comment:4 in reply to: ↑ description Changed 7 years ago by brianl

I would expect that if a fit to 20 time series data points gives the same SS as a fit to 10 time series data points, and the number of parameters is the same, the scenario with 20 points would be preferred. This assumes that comparisons with different numbers of data points are even valid. Nonetheless, shouldn't there be a penalty for the number of data points? Thus

AIC = 2 x number of search parameters + number of data points x ln (SS/number of data points)

comment:5 Changed 2 years ago by jeroens

  • Status changed from new to closed
  • Version set to 6.5 beta
  • Resolution set to fixed

AIC has been implemented somewhere in the past years, and was released with EwE 6.3 (I believe). We just never got around to updating this ticket.

However, in EwE 6.5 the AIC data points are calculated correctly.

Note: See TracTickets for help on using tickets.