Tuesday, October 31, 2017

Meledrio, or a simple reflection on Hydrological modelling - Part VI - A little about calibration

The normal calibration strategy is to split the data we want to reproduce into two setz:

  • one for the calibration phase
  • one for the "validation" phase
Let's assume that we have an automatic calibrator. It usually:
  • generates a set of model's parameters, 
  • estimates with the rainfall-runoff hydrological model and any given set of parameters the discharges, 
  • compares what computed with what is measured by using a goodness of fit indicator
  • keeps the set of parameter that gives the best performances
  • repeats the operation a huge number of times (and use some heuristics for searching the best set overall)

This  set of parameters is the one used for "forecasting" and

  • is now used against the validation set to check its performances.
However, my  experience (with my students who usually perform it) is that the best parameter set in the calibration procedure, is not usually the best in validation procedure. So I suggest, at least as a trial and for further investigations to:

  • separate the initial data set into 3 parts (one for first calibration, one for selection, and one for validation).
  • Among the 1% (or x% where x is let at your decision) of best performing in the calibration phase  is selected (called the behavioural set). Then 1% (one over 10^4) best performing in the selection phase is further sieved. 
  • This 1 per ten thousand is chosen to be used in the validation phase
The hypothesis to test is that this three steps way to calibrate returns usually better performances in validation than the original two step steps one.

1 comment:

  1. I have a felling that this problem is more of a problem for semi-distributed models, but useful for test by comparing fully-distributed and semi-distributed models. In semi-distributed models, important variables such as vegetation and land cover may not be included as time-series data, but they does change significantly in few years (e.g. in 10 years) resulting significant impact on the hydrological processes (on evapotranspiration and vegetation). So obviously, the parameters used in before 10 years can not be the same after 10 years.