Tuesday, October 31, 2017

Meledrio, or a simple reflection on Hydrological modelling - Part VI - A little about calibration

The normal calibration strategy is to split the data we want to reproduce into two setz:

  • one for the calibration phase
  • one for the "validation" phase
Let's assume that we have an automatic calibrator. It usually:
  • generates a set of model's parameters, 
  • estimates with the rainfall-runoff hydrological model and any given set of parameters the discharges, 
  • compares what computed with what is measured by using a goodness of fit indicator
  • keeps the set of parameter that gives the best performances
  • repeats the operation a huge number of times (and use some heuristics for searching the best set overall)

This  set of parameters is the one used for "forecasting" and

  • is now used against the validation set to check its performances.
However, my  experience (with my students who usually perform it) is that the best parameter set in the calibration procedure, is not usually the best in validation procedure. So I suggest, at least as a trial and for further investigations to:

  • separate the initial data set into 3 parts (one for first calibration, one for selection, and one for validation).
  • Among the 1% (or x% where x is let at your decision) of best performing in the calibration phase  is selected (called the behavioural set). Then 1% (one over 10^4) best performing in the selection phase is further sieved. 
  • This 1 per ten thousand is chosen to be used in the validation phase
The hypothesis to test is that this three steps way to calibrate returns usually better performances in validation than the original two step steps one.

Sunday, October 29, 2017

Open Science Framework - OSF

And recently I discovered OSF, the Open Science Framework. My students told me that there exists many of them, of this type of on-line tools that make leverage of the cloud to store and helps groups to manage their workflow. However, OSF seems particularly well suited to work for scientists’ group, since it contains links various science-oriented features, like connections to Mendeley, Figshare, Github and others. An OSF “project” can contain writings, figures, codes, data. All of this can be uploaded for free in their servers or being maintained in one of your cloud storage like Dropbox or GoogleDrive. 

For starting, you can take one our of your time to follow one of their YouTube video, like the one below.

Their web page contains also some useful guides that make the rest (do not hesitate to click on icons: they contain useful material!). The first you can start with is the one about the wiki, a customizable initial page that appear in any project or sub-project. There are some characteristics that I want to emphasize here. Startin a new project is easy, and when you have learn how to do it, you almost have learn all of it. Any project can have subprojects, called “components”. Each component behaves like a project by itself, so when dealing with it, you do not have to learn something really new. Any (sub)project can be private (the default) or public, separately, and therefore your global workflow can contain private and public stuff. 

Many people are working on OSF. For instance Titus Brown’s Living in a Ivory Basement blog also has some detailed review of it. They also coded a command line client for downloading files from OSF which can be further useful.  

Wednesday, October 25, 2017

Return Period


Some people, I realised, have problems with the concept of return period. This is the definition in wikipedia (accessed October 25th, 2017):
A return period, also known as a recurrence interval (sometimes repeat interval) is an estimate of the likelihood of an event, such as an earthquake, flood[1], landslide[2], or a river discharge flow to occur.
It is a statistical measurement typically based on historic data denoting the average recurrence interval over an extended period of time, and is usually used for risk analysis (e.g. to decide whether a project should be allowed to go forward in a zone of a certain risk, or to design structures to withstand an event with a certain return period). The following analysis assumes that the probability of the event occurring does not vary over time and is independent of past events.
Something that wikipedia does not include is rainfall intensity. The first paragraph, should be then something like:
"A return period of x time units, also known as a recurrence interval (sometimes repeat interval) is an estimate of the likelihood of an event, such as an earthquake, flood[1], landslide[2], rainfall intensity, a river discharge flow or any observable, to occur (or be overcome) on average every x time units."

Return period clearly involves a statistical concept, which is traced back to a probability, and a time concept, that is the sampling time.
Let us assume we have a sequence of data, for which, at moment, the sampling time is unknown, composed by a discrete number, $n$, of data.
The empirical cumulative distribution function (ECDF) of the data is a representation of the empirical statistics for those data. Let $ECDFc$ be the complementary empirical cumulative distribution function, meaning $ECDFc(h) \equiv 1 - ECDF(h)$.
Let h* be one of the possible values of these data (not necessarily present in the sequence but included in the range of experimental values). We are interested in the probability of $h^*$ being overcome. If $m$ is the number of time $h^*$ is matched or overcome, then
$$ ECDFc(h^*)= m/n $$
$$ECDF(h^*) = 1 - m/n$$
We can, at this point assume that ECDF resembles some probability function, but this is a further topic we do not want to talk about here. What we want to stress i s that ECDFs (probabilities) are not automatically associated to a time. All the data in the sequence refers to different picks of a random variable, and these picks are not necessarily time-ordered or can be happened all at the same time. So the “frequencies" that can be associated to the above events are not time frequencies.
Now introduce time by saying that, for instance, each datum was sampled at regular time step $\Delta t$, what before I called “time units”, and, for practical reasons we are not interested to the ECDF of the data but to know how frequently (in clock time sense) it is repeated. So, we can say that the total time of our record is
$$T = n\, \Delta t$$
and in this time span, the number of time, h* is overcome is (by construction)
$$m=ECDFc(h^*)*n$$
On average, along the record obtained, the time frequency on which values greater than $h^*$ are obtained is the empirical return period:
$$T_r:=\frac{T}{m} =\frac{n *\Delta t}{ECDFc(h^*)*n} = \frac{\Delta t}{ECDF(h^*)}$$
So, the empirical return period of $h^*$ is inversely proportional to the complementary ECDF($h^*$) but, properly there is a “$\Delta t$” to remind that it is given in time units. One basic assumption in our definition is that the underneath probability is well defined, which is not if climate change is in action. This is a delicate and well discussed topic*, but again, not the core of this page.

There is a crucial initial step, the sampling of data which affects the final result. If the data in the sequence are, for instance annual maxima of precipitation, then the return period is given in years. If the data were daily precipitation totals, then the return period is given in days. And so on. Because usually the time unit has value “1” (but dimension of a time), the numeric value of the return period is just the inverse of the ECDFc. We should not forgot, however, that the equation contains a mute dimension. We are talking about times, not dimensionless numbers (probabilities).

Being Bayesian, probably you can introduce this in a different way. I let you as an exercise to do it.

** On the topic of stationarity, please give a look to:

Milly, P. C. D., Betancourt, J., Falkenmark, M., Hirsch, R. M., Kundzewicz, Lettenmaier, D. P., & Stouffer, R. J. (2008). Stationarity Is Dead: Whither Water Management? Science, 319, 1–2.

Montanari, A, and Koutsoyiannis, D,  Modeling and mitigating natural hazards: Stationarity is immortal!, Water Resources Research, 50 (12), 9748–9756, doi:10.1002/2014WR016092, 2014.

Serinaldi, F., & Kilsby, C. G. (2015). Stationarity is undead: Uncertainty dominates the distribution of extremes. Advances in Water Resources, 77(C), 17–36. http://doi.org/10.1016/j.advwatres.2014.12.013

Sunday, October 22, 2017

Simple models for hydrological hazard mapping

This contains the second talk I gave to high-school teacher at MUSE for the Life Project FRANCA. My intention was to show (under a lot of simplification assumptions) how hydrological models work, and give a few hints on which type of hydraulics models of sediment transport can be useful.
 Clicking on the figure above you can access the slides (in Italian but with a little time, I will provide a translation). In their simplicity, the slides are a storyboard for action that could be taken in the SteepStream project to provide an estimation of hazards of Meledrio river basin (and the other two selected).

Friday, October 20, 2017

On some Hydrological Extremes

This is the talk given at MUSE for the Life FRANCA Project. Life FRANCA has the objective to communicate with people about hydrological hazards and risk. In particular the Audience in this case was composed by high school teachers.


Clicking on the Figure you will be redirected to the presentation.

Wednesday, October 18, 2017

Using Colorblind friendly Plots

Brought to my attention by Michele Bottazzi. I rarely think to this. Instead it is important. Please refers to this Brian Connelly post:

Click on the figure to be redirected. BTW, this was the 500th post!🎉

Tuesday, October 17, 2017

TranspirAction

This post contains the presentation given by Michele Bottazzi. His presentation look forward to dig into the forecasting of transpiration from plants (and evaporation from soils) through concentrated parameters modelling. His findings will have a counterpart in our JGrass-NewAGE system.
The figure illustrate his willing to find a new, modern, way to scale up leaf theories to canopy and landscape. The starting point is one recent work by Schymanski and Or but it will go, hopefully, far beyond it. Click on the Figure to access his presentation.

An ML based meta modelling infrastructure for environmental models

This is the presentation Francesco gave for his admission to the third year of Ph.D. studies. He summarizes his work done so far and foresees his work during the next year.
Francesco's work is a keystone of the work in our group, since he sustains most of informatics and pur commitment to OMS3. Besides of this two are his major achievements: the building of the Ne3 infrastructure (an infrastructure inside an infrastructure!)  which allows an enormous flexibility to our modelling, and the new road opened towards modeling discharges through machine learning techniques. But there are other connections he opens that are visible through his talk. Please clisk on the figure to access the presentation.

Sunday, October 15, 2017

A few topics for a Master thesis in Hydrology

After the series about Meledrio I thought that each one of the post actually identifies at least one Thesis topic:

Actually, each one of them could be material for more than one Thesis, depending the direction we want to take. All the Theses topics assume that JGrass-NewAGE is the tool used for investigations.
Actually there are some spinoff of those topics:
  • Using machine learning to set part of model inputs and/or 
  • Doing hydrological modeling with machine learning
  • Preprocessing and treating (via Python or Java) satellite data as input of JGrass-NewAGE (a systematisation of some work made by Wuletawu Abera on Posina cacthment and/or Blue Nile)
  • Implementation of the new version of JGrass-NewAGE on val di Sole
  • Using satellite data, besides geometric features, to extract river networks
  • Snow models intercomparison (GEOtop and those in JGrass-NewAGE, with reference to work done by Stefano Tasin and Gabriele Massera) 
Other to other Hydrological topics:
  • Mars (also here) and planetary Hydrology (with GEOtop or some of its evolutions which account for different temperature ranges and other fluid fluxes)
  • Copying with Evapotranspiration and irrigation at various scales
  • Copying the carbon cycle to the hydrological cycle (either in GEOtop or in JGrass-NewAGE)
Other possible topics regarding water management:
  • Hypothesis on the management of reservoir for optimal water management in river Adige.
  • Managing Urban Waters Complexity
Other possible topics regards, on a more theoretical (mathematical-physical) side:
On the side of informatics:
For who wants to work with us on the Master thesis, the rules to follow are those for Ph.D. students, even if to a minor extent. See here:

Saturday, October 14, 2017

Meledrio, or a simple reflection on hydrological modelling - Part V

Another question related to discharges is, obviously their measure. Is discharge measure correct ? Is the stage-discharge relation reliable ? Why do not give intervals of confidence for the measures ? Yesterday, a colleague of mine, told me. A measure without an error band is not a measure. That is, obviously an issue. But today reflection is on a different question.  We have a record of discharges. It could look like this (forgive me the twisted lines):
Actually, what we imagine is the following:
I.e. we think it is all water. However, a little of reflection should make us think that, a more realistic picture is:
Meaning that part of the discharge volume is actually sediment transported around. This open the issue on how to quantify it. Figure enlighten than during some floods, actually the sediment could be a consistent part of the volume, and, if we are talking of small mountain catchments like Meledrio, it could be the major part of the discharge. Hydraulics and sediment transport, so far, was used separately from hydrology and hydrology separated from sediment transport, but what people see is both of them (water and sediment).
This actually could be not enough. The real picture could be, actually like this:
Where we have some darker water. The mass transport phenomena, in fact, could affect part of the basin during intense storms, but the liquid water could not be able to sustain all this transport. Aronne Armanini suggested to me that, in that case, debris flow can start and be stopped somewhere inside of the basin. Te water content they have, instead, could be equally likely released to the streams and boosting furthermore the flood.  Isn't it interesting ? Who said that modeling discharges is an assessed problem ?

Friday, October 13, 2017

Meledrio, or a simple reflection on hydrological modelling - Part IV

An issue that often is risen is about the complexity of models. Assuming the same Meledrio basin, which is the model we can think to be the simpler for getting quantitatively the water budget ?
The null-null hypothesis model is obviously using the past averages to get the future. Operatively:
  • Get precipitation and discharge 
  • Precipitation is  separated by temperature (T) in rainfall (T>0) and snowfall. Satellite data can be used for the separation. 
  • Take their average (maybe monthly average)
  • Take their difference. 
  • Assume that the difference is  50% recharge and 50% ET

My null hypothesis is the following. I kept it simple but not too simple:
  • Precipitation, discharge and temperature are the measured data
  • Their time series are split into 2 parts (one for calibration and one for validation)
  • Precipitation is measured and separated by temperature (T) in rainfall (T>0) and snowfall (T<0). Satellite data can be used alternatively for the separation. These variable can be made spatial by using a Kriging (or
  • Infiltration is estimated by SCS-CN method. SCS parameters  interval are set according to soil cover, by distinguishing it in qualitatively 4 classes of CN (high infiltrability, medium high, medium low, low). In each subregion, identified by soil cover, CN is let vary in the range allowed by its classification. Soil needs to have a maximum storage capacity (see also ET below). Once this has been exceeded water goes to runoff. 
  • Discharge is modeled as a set of parallel linear reservoirs. One for HRU (Hydrologic Response Unit). 
  • Total discharge is simply the summation of all the discharges of the HRUs.
  • CN and mean residence time (the parameter in linear reservoirs) are calibrated to reproduce total discharge (so a calibrator must be available)
  • A set of optimal parameters is selected.
  • Precipitation that does not infiltrates is separated into evapotranspiration, ET, and recharge.
  •  ET is estimated with Priestly-Taylor (so you need an estimator for radiation) corrected by a stress factor, linearly proportional to the water storage content. PT alpha coefficient is taken at its standard value, i.e 1.28
  • What is not ET is recharge.  Please notice that there is a feedback between recharge and ET because of the stress factor. 
  • If present, snow is modeled through Regina Hock model (paper here), in case, calibrated trough MODIS.
The Petri Net representation of the model (no snow) can be figured out to be as follows:

The setup this model, therefore is not so simple, indeed, but not overwhelmingly complicate.

Any other model has to do better than this. If successful, it become hp 1. 
A related question is how we measure goodness of fitting and if we can distinguish the performances of one model from another one. That is, obviously, another issue.

Thursday, October 12, 2017

Meledrio, or a simple reflection on hydrological modelling - Part III

Well, this is not exactly Meledrio.  It starts a little downstream of it. In fact, we do not have discharge data in Meledrio (so far) and we want to anchor our analysis to something measured. So we have a gauge station in Malè. A gauge station for who does not know it, measure just water levels (stages) and them convert to water discharge through a stage-discharge relation (see USGS here). Anyway, a sample signal is here:
The orange lines represent discharge simulated with one of our models (uncalibrated at this stage). The blue line is the measured discharge (meaning the measured stage after having applied an unknown stage-discharge relationship, because the guys who should did not gave us it). But look at little more closer:
We could have provided a better zooming, however, the argument of discussion is: what the hell is all that noise in the measured signal ? It is natural ? It is error of measurements ? Is due to some human action ? 
Having a better zoom, one could see that that signal is almost a square wave going up and in few hours, and therefore the suspected cause are humans. 
Next question: how can we calibrate the model that does not have this unknown action inside to reproduced the measured signal ?
Clearly the question is ill-posed and we should work the other way around. Can we filter out in the measured signal the effect of humans ?
Hints: we could try to analyze the measured signal first. Analyzing actually could mean, in this case, to decompose it, for instance in Fourier series or Wavelets and wipe away the square signal (a hint in hints), reproducing an "undisturbed signal" to cope with. 
Then we could probably calibrate the the model to the cleaned data. Ah! You do not know what calibration means ? This is another story.

P.S. - This is actually part of a more general problem, which is measurement treatments. Often we, naively, treat them as true values. Instead they are not and should pre-analyzed for consistency and validate before. MeteoIO is a tool that answers to part of the requests. But, for instance, it does not treat the specific question above.

Wednesday, October 11, 2017

Meledrio, or a simple reflection on hydrological modelling - Part II

In the previous studies made on the hydrology of Meledrio some ancillary data are produced. For instance:

Soil Use
Geo-lithology-Lithology

Usually also other maps are produced, for instance soil cover (which, in principle, could be different from soil use).  The problem I have is that, usually, I do not know what to do with these data.  There are actually two questions related to maps of such kind.
  • The first is,  are these characteristics are part of the model (see, for instance, the previous post)?. 
  • The second is, if the models somewhat contains a quantity, or a parameter,  that can be affected by the mapped characteristics, but the is not directly the characteristic,  how the parameter can be deduced ? In other words there is a (statistical) method to relate soil use to models parameters ?  
I confess that the only systematic trial to obtain this type of inference that I know are the pedotransfer functions. Whilst the concept could be exported to more general models' attributes, however they refer to very specific models that contains hydraulic conductivity or porosity as a parameter and not to other models, for instance those based on reservoirs, where hydraulic conductivity usually is not explicitly present.
Another typology of sub-models where something similar exists is the SCS-CN model.  Specific models, sometimes can contain specific conversion tables produced either by Authors than practictioners (SWAT, for instance).  In SCS-CN, the tables of soil categories are associated with values of the Curve Number parameters, and people pretend to believe that the association is reliable. But it is fiction not science.
In a time when reviewers say that modelling discharges is not enough to assess the validity of a hydrological model, at the same time they allows holes in the peer review process where papers make an unscrupulous use of the same concept.  
There is actually a whole new science branch, hydropedology, that seems devoted to the task to transform maps of soil properties into significant hydrological numbers (mine is the brutal interpretation of it, obviously hydropedology has the scope to understand, not only to predict), and I add below some relevant reference.  However, the analysis are fine and interesting food to thoughts, but the practical matter is still scanty. Probably for two facts: because normal statistical inference is not enough sophisticated to obtain important results (beyond pedotransfer functions) and because (reservoir type of) models have parameters that are too much involved to be interpreted as a simple function of a mapped characteristics. An opportunity for machine learning techniques ?

References

Lin, H., Bouma, J., Pachepsky, Y., Western, A., Thompson, J., van Genuchten, R., et al. (2006). Hydropedology: Synergistic integration of pedology and hydrology. Water Resources Research, 42(5), 2509–13. http://doi.org/10.1029/2005WR004085

Pacechepsky, Y. A., Smettem, K. R. J., Vanderborght, J., Herbst, M., Vereecken, H., & Wösten, J. (2004). Reality and fiction of models and data in soil hydrology (pp. 1–30).

Vereecken, H., Schenpf, A., Hoopmans, J. V., Javaux, M., Or, D., Roose, J., et al. (2016, May 13). Modeling Soil Processes: Review, Key Challenges, and New Perspectives. http://doi.org/10.2136/vzj2015.09.0131

Vereecken, H., Weynants, M., Javaux, M., Pachepsky, Y., Schaap, M. G., & Genuchten, M. T. V. (2010). Using Pedotransfer Functions to Estimate the van Genuchten–Mualem Soil Hydraulic Properties: A Review. Vadose Zone Journal, 9(4), 795–27. http://doi.org/10.2136/vzj2010.0045

Terribile, F., Coppola, A., Langella, G., Martina, M., & Basile, A. (2011). Potential and limitations of using soil mapping information to understand landscape hydrology. Hydrology and Earth System Sciences, 15(12), 3895–3933. http://doi.org/10.5194/hess-15-3895-2011

Tuesday, October 10, 2017

Meledrio, or a simple reflection on hydrological modelling - Part I

The problem is well explained by the following figure, which represents the statistics of slopes in Meledrio basin.
The overall distribution is bimodal, that make us to suspect that something was going on. In fact, this below is the Google view of the basin.
It clearly show that the hydrographical right side of the basin (on the left in figure) is the one that has steeper slopes, and the left side the one that has the lower ones. This is definitely shown by the slope map
(Please observe that the map is reversed with respect the Google view, since there we were looking to the basin from North). Different slopes, would be associated in our mind with different runoff and subsurface water velocities. This would clearly be accounted for in a model like GEOtop but not (at least explicitly) by a system of reservoirs, especially when we calibrate all the reservoirs all together. A possible partition of the basin in the Jgrass-NewAGE system is represented below
Because the single Hydrologic Response Units are mostly on one side of the catchment, they could be said to be in a area which is homogeneous from the point of view of slope statistics. Therefore, when we treat it as a collection of reservoirs, in principle we could parameterise them differently, according to their slope. In practice, however, we do not have enough measurements to be able to do this separate calibration and we look at the basin homogeneously.  Are we not missing something ?
Well, we are. The first thinking would be to try to add to our reservoir the knowledge gained from geomorphology, and assume that the mean travel time, or some relevant parameter connected to it, depends proportionally to (mean) slope (or some of its power) and inversely to the distance water has to across to get out of the HRU. This is obviously possible, and maybe we could easily try it.
In general, however,  hydrologists who are not stupid, do not care of it. Why ? The reasons can be that our assumption that slopes count is blurred by the heterogeneity of the other factors that concur to form the hydrologic response. However, the magnitude of the heterogeneity can be different at different scales and could be really nice to do some investigations in this direction.

Friday, October 6, 2017

SteepStreams preliminary Hydrological works

This contains the talk given at the 2017 meeting of the SteepStreams ERANET project. It is assumed to talk about the hydrological cycle of the Noce river in Val di Sole valley (Trentino, Italy). It is a preliminary view of what we are going to do in the project and does not pretend to present particularly deep results. However, it could give some interesting hints on methodology.
https://www.slideshare.net/GEOFRAMEcafe/lisbon-talk-for-steepstreams

Clicking on the figure, you can access the presentation.  Here below you find also a more detailed summary with links of material about the Meledrio basin, one of the experimental catchments used in the project.
As above, clicking on the figure, you can access the presentation.