Wednesday, August 27, 2014

Which Hydrological model is better ?

The talk below (you can click also on the image) is about the GEOtop and JGrass-NewAge models, their physical bases, their informatics based on older (the first) and new (the latter) programming paradigms, the lessons I learned in building them with my group of people in an academic environment, their future, and the understanding that there is no the best model, but certainly a better way to do models.

Hydrological modelling was for long time, and still is, almost a synonym of simulating rainfall-runoff. Recently, however, the scope of hydrology became wider, even among engineers. Modelling in hydrology now certainly still means modelling discharges, but also modelling snow, evapotranspiration and turbulent exchanges, and surface/subsurface interactions. With the goal of reproducing the whole picture of the terrestrial hydrological fluxes, my coworkers and I worked together in the last decade to build new models and new types of models. We started from the lesson by P. Eagleson, and we built first the process-based (grid based) GEOtop model. GEOtop is “terrain-based” (it is based on the use of digital terrain models and uses the knowledge of interaction between morphology and process) “distributed” (all the simulated variables are calculated for each pixel of the basin) model of “the water cycle” (it simulates all the components of the water cycle, accounting for both the mass budget and the energy budget, the two budget equations being coupled through the temperature of the soil, which controls evaporation, hydraulic conductivity, and accumulation of the snowpack).  However, this GEOtop was intimidating many, either for the complexity of the processes described and its internals, and possibly not apt at large scale modelling where faster solutions are required.

Therefore we also worked on a different, more parsimonious model, called JGrass-NewAGE. From the lesson learned by implementing and maintaining GEOtop, we also found necessary to build the new model on new informatics. This system sacrifices process details in favour of efficient calculations.  It is made of components apt at returning statistical hydrological quantities, opportunely averaged in time and space.  One of the goals of this implementation effort was to create the basis for a physico-statistical hydrology in which the hydrological spatially distributed dynamics are reduced into low dimensional components, when necessary surrogating the internal heterogeneities with "suitable noise" and a probabilistic description. Unlike other efforts of synthesis, JGrass-NewAge keeps the spatial description explicit, at various degrees of simplicity.  This has been made possible by opportune processing of distributed information which, in this way, has become part of the model itself.

As a conclusion modelling remains a "liquid" practice where various needs must be fulfilled  each time we face a new problem (if science is driven by problems and not tools). Therefore an infrastructure that makes of this fluidity its center is necessary. This is the reason we adopted OMS3.

Friday, August 22, 2014

A little bridge between JGrasstools and R

I receive from Emanuele Cordano, and I publish, knowing that is of interest to many:

"Dear all, 

some months ago I developed a R package on github which allows to execute some classes of jgrasstools from R.  I did it because I needed to do hydro-geormorphological analysis with R rasters maps. It is quite trivial. The package creates a groovy script file from a R S3 object and then executes the script. 

The R code is here on Github with GPL license: 

It also contains a jar of jgrasstools or they can be complied and downloaded from github.  It needs that groovy is previously installed. 
The code is now experimental and needs more testing.   The examples are limited to geomorphological analysis and basin extraction. 
In the next months I'm going to continue the development and to perform the documentation and the examples. 
I would like to share this experimental R package with those who are potentially interested. 

Any feedback is appreciated and please let me know if you know something similar already existing. 

Regards 

Emanuele Cordano

"
Who wants more information about the JGrasstools, could browse this link, and for knowing more about  to geomorphological analysis (he can get a quite comprehensive set of slides in Italian here, and, a less comprehensive introduction in English here).

The link is also added to the main R hydrological resources post.

Thursday, August 21, 2014

Compiling meteoIO Examples with Eclipse (under Mac OS 10.4.9)

This is an ongoing task in the direction to compile GEOtop 2.0 under Eclipse and subsequently to embed part of it in OMS v3

Since to work with GEOtop 2.0 one needs to be aware of what meteoIO does, I started from it. 

To keep it simple, I first compiled the meteoIO libraries as described in the related post, by command line (compiling meteoIo under Eclipse will be a further task to explore eventually). Then I open Eclipse. For the occasion I did a new installation of the recent Luna IDE, installing the default CDT for managing C/C++ projects

I created a new project for one of the examples (by command line and following the instructions given in meteoIO website you can compile all the examples at once. However, if you get in problems to execute them, you do not know what to do, and you need to go to explore the examples content. Then you need a IDE or, at least a text editor, to browse the files ….). 

Then you have to add the example file you need. Assume, for instance, the file called time.cpp.
Import it in Eclipse by using  

  • ->File
  • ->Import
  • ->File System
  • -> Next Button 
  • -> Browsing directories 
  • -> Selecting the desired file by checking the box 
  • -> Finish


Before compiling successfully you have to tell the compiler two things: where the include files of the meteoIO files are (which contain the definition of all the meteoIO methods), and where the meteoIO libraries are. The first is required for compile time, the second for linking time. 

In particular the libraries of meteoIo are meteoio, meteoio.2, and meteoio.2.4.3 (please note that each of them, under Mac OS X is written as lib*.dylib, where * stands for any of the three names above). I actually placed them in one of my working directory but a standard choice would be probably better.

The includes files were instead placed under /usr/local/include (they were actually under  /usr/local/include/meteoio but the last directory is specified inside the code)

So here they are the instructions for making the libraries visible to the linker:
  • -> right click on the project
  • -> find “Properties” on the menu and click it
  • -> choose “C/C++ General”
  • -> choose “Path and symbols”
  • -> select Libraries
  • -> click on Add
  • -> specify the name of the libraries (without lib and .dylib)  
  • -> select Libraries Path
  • -> click on "Add"
  • -> specify the path were the linker can locate the libraries
  • -> Finish


Here you find instead the instructions to for making visible the include files:
  • -> Right click on the project
  • -> Select Properties
  • -> Expand the options
  • -> choose “C/C++ General”
  • -> choose “Path and symbols”
  • -> Browse the filesystem to find the /usr/local/include directory [details as above]
  • -> Finish


Then you are ready to build your project and execute it. ^1^2^3

______________________________________________________________________________

^1 - Please be aware that when you change building option, e.g. from "Debug" to "Release" option of Building (the hammer Icon in Eclipse) you have to said again where the library are to the linker. Actually there is a checkbox that you can  select for importing the the configuration in both the Debug and Release modes.

^2 The executables can be found under the Workspace Folder/Project Folder/Release, where "WorkSpace Folder" is the directory declared at the starting of Eclipse, "Project Folder" is the name of the current project.

^3 Many MeteoIO example require something in input and do not fail gracefully when this input is not provided (they give the error code "segmentation fault 11).

Data cleaning is part of any science process


 I take it verbatim this post from the R blog Revolution.

"A New York Times article yesterday discovers the 80-20 rule: that 80% of a typical data science project is sourcing cleaning and preparing the data, while the remaining 20% is actual data analysis. The article gives short shrift to this important task by calling it "janitorial work", but whether you call it data munging, data wrangling or anything else, it's a critical part of the data science. I'm in agreement with Jeffrey Heer, professor of computer science at the University of Washington and a co-founder of Trifacta, who is quoted in the article saying,

“It’s an absolute myth that you can send an algorithm over raw data and have insights pop up.”

As an illustration of this point, check out the essay by Julia Evans, Machine learning isn't Kaggle competitions (hat tip: Drew Conway). A Kaggle competion typically presents a nice, clean, regularized data set to the competitors, but this isn't representative of the real-world process of making predictions from data. As Julia points out:

Cleaning up data to the point where you can work with it is a huge amount of work. If you’re trying to reconcile a lot of sources of data that you don’t control like in this flight search example, it can take 80% of your time.

While there are projects underway to help automate the data cleaning process and reduce the time it takes, the task of automation is made difficult by the fact that the process is as much art as science, and no two data preparation tasks are the same. That's why flexible, high-level langauages like R are a key part of the process. As Mitchell Sanders notes in a Tech Republic article,

Data science requires a difficult blend of domain knowledge, math and statistics expertise, and code hacking skills. In particular, he suggests that expert knowledge of tools like R and SAS are critical. "If you can't use the tools, you can't analyze the data."

This is a critical step to gaining any kind of insight from data, which is why data scientists still command premium salaries today, according to data from Indeed.com."

Now consider the data problem in hydrology. Some agencies make their best in giving data to researchers and public, however, these data has to be further processed to get results and insight in physical processes. They cannot be provided to models "as they are". They usually have to be pre-processed. That is why other tools like meteoIO are needed.  Moreover, the data themselves have to be inspected and parsed before their use, even with meteoIO tools. Therefore the 80%-20% law tends to be true also in hydrology.

Wednesday, August 20, 2014

Rills in Utah

Going from Zion National Park to Grand Arches National Park, I had the occasion to see several astonishing geological landscapes. Particularly exciting to me were the rillings that I could observe along the road, eroded in different types of lithology. Here below some images at low resolution (by clicking on the image you can get the higher resolution). Unfortunately I took them with a cellular phone and therefore they are not so clear as they could.
In the first image, taken close to the capitol Reef national park, on 24th National Road, where red sedimentary rocks are present, above rills are in the central part of the image. On top other processes than erosion dominate, i.e. rockfall, but in the bottom deposits rills and fluvial type of forms appear everywhere. 
Same as above, but in a different material (taken close to Hanksville). On the bottom, rock formations that repeat sequentially are also present. The geometry there is convex-divergent but still filled with rills.
Same material as above. Different geometries, more pronounced aggregative formations.
Same are as above a complete network eroded in the center of the image. Remarkably a sedimentary (?) layer across the formation but without very much effect on the rilling structure.


I do not know if a laser altimeter survey of the area is available, but in the case, it would be really interesting for  geomorphologists to analyse the literally thousands of rills and river networks that formed in this arid environments.

"Two Rivulets side by side,
Two blended, parallel, strolling tides,
Companions, travelers, gossiping as they journey."

W. Withman

Friday, August 1, 2014

What is life ? (by Erwin Schroedinger) and Hydrology

The excuse for this blog post was the reading of an old (1944) little book entitled “What is life ?” by Erwin Schroedinger. It presents the point of view of a physicist on life, before the discover of DNA, and actually influenced the subsequent research by Watson and Crick
My reading, besides being influenced by a general curiosity, had a scope. Hydrology, especially in its very modern declination called ecohydrology (see also here) has a lot to do with the complexity of physical, chemical and biological interactions.  However even the more physical aspects of hydrology deployed in space, present patterns, heterogeneities, feedbacks that are by themselves of an overwhelming degree of complexity. Therefore getting the method there, for life understanding,  could help for a method here, in hydrology.  The whole book is all enjoyable, however, my commentary here covers mostly three chapters, the first and the sixth, and very little the seventh.  Excerpts from the book are in italics, my own notes in normal characters. 

CHAPTER 1 - The Classical Physicist’s Approach to the Subject

INTRODUCTION

“.. though warned at the outset that the subject-matter was a difficult one a …, even though the physicist’s most dreaded weapon, mathematical deduction, would hardly be utilized. The reason for this was not that the subject was simple enough to be explained without mathematics, but rather that it was much too involved to be fully accessible to mathematics.”

Here  I see a parallel with many hydrological processes, say for instance, the hillslope processes. Many outstanding colleagues support the idea that the physics of the argument is too much complex to be treated mathematically. 

The large and important and very much discussed question is: How can the
events in space and time which take place within the spatial boundary of a living organism
be accounted for by physics and chemistry? The preliminary answer which this little book will
endeavor to expound and establish can be summarized as follows: The obvious inability of present-day physics and chemistry to account for such events is no reason at all for doubting that
they can be accounted for by those sciences.

Now, just substitute to “living organism”  “river basin” and you have an answer to the first question for hydydrology. It is indubitably that actually, in these seventy years, passed by the publication of the book, also biology itself, and molecular biology in particular did a lot of steps in the direction traced by E.S., as is, at the same level, clear that hydrology processes knowledge, and the establishment of Hydrology as a physical Science, since the work by P. Eagleson, made extraordinary jumps forward.

STATISTICAL PHYSICS. THE FUNDAMENTAL DIFFERENCE IS  STRUCTURE


Yet the difference which I have just termed fundamental is of such a kind that it might easily appear slight to anyone except a physicist who is thoroughly imbued with the knowledge that the laws of
physics and chemistry are statistical throughout.

This statement applies verbatim to Hydrology.

THE NAIVE PHYSICIST APPROACH TO THE SUBJECT

I propose to develop first what you might call 'a naive physicist's ideas about organisms', that is,
the ideas which might arise in the mind of a physicist who, after having learnt his physics and, more especially, the statistical foundation of his science, begins to think about organisms and
about the way they behave and function and who comes to ask himself conscientiously whether
he, from what he has learnt, from the point of view of his comparatively simple and clear and
humble science, can make any relevant

Substitute “organisms” with hydrology, hydrological processes, watersheds, at your convenience.

WHY ATOMS ARE SO SMALL ?

Why are atoms so small? … Suppose that you could mark the molecules in a
glass of water; then pour the contents of the glass into the ocean and stir the latter thoroughly so as to distribute the marked molecules uniformly throughout the seven seas; if then you took a
glass of water anywhere out of the ocean, you would find in it about a hundred of your marked
molecules.

Besides being a truly hydrological example, attributed to Lord Kelvin, it also envision the scales of hydrology from molecule (in the quantum domain) to oceans (the so call, global hydrology). 

CHAPTER 6 - Order, Disorder and Entropy

A REMARKABLE GENERAL CONCLUSION FROM THE MODEL

From Delbruck's general picture of the … substance it emerges that living matter, while not eluding the 'laws of physics' as established up to date, is likely to involve 'other laws of physics' hitherto unknown, which, however, once they have been revealed, will form just as integral a part of this science as the former.

Substitute Delbrucks’s with “modern Hydrology’; “living matter” with “hydrological processes”. Where these other laws are, is the new frontier of hydrology. A frontier, already envisioned by some time indeed, because, I cannot deny that I can see in it the “Gold medal search” of Ignacio Rodriguez-Iturbe own work.

….

LIVING MATTER EVADES THE DECAY TO EQUILIBRIUM 

When a system that is not alive is isolated or placed in a uniform environment, all motion usually comes to a standstill very soon as a result of various kinds of friction; differences of electric or
chemical potential are equalized, substances which tend to form a chemical compound do so,
temperature becomes uniform by heat conduction. After that the whole system fades
away into a dead, inert lump of matter. A permanent state is reached, in which no
observable events occur. The physicist calls this the state of thermodynamical equilibrium, or of
‘maximum entropy'

There is poetry in this sentence: but it could be subtly imperfect: natural systems usually work under disequilibrium conditions. In fact E.S. remarks it later in the chapter. However, not only living organisms but also eco-hydro-systems work the same way, even if at a more aggregate and “higher” level of organisation. Organisation of spatial physical systems, like river networks, and hydrological interactions work the same way, and often they show the same type of complex organisation. For their organisation, obviously, we would less inclined to talk about evading equilibrium conditions, and there we would be probably correct, but at the same time a little wrong …

IT FEEDS ON ‘NEGATIVE ENTROPY’

By eating, drinking , breathing and (in case of plants) assimilating. The technical term is metabolism. The Greek word means change or exchange. Exchange of what? Originally the underlying idea is, no doubt, exchange of material …That the exchange of material should be the essential thing is absurd …  For a while in the past our curiosity was silenced by being told that we feed upon energy …Needless to say, taken literally, this is just as absurd. … Every process, event, happening -call it what you will; in a word, everything that is going on in Nature means an increase of the entropy of the part of the world where it is going on.

I do not completely agree with the phrases excerpts. E.S. himself, in commenting further, does move out of this strict vision. Entropy represents uncertainty of kinetic energy microscopic configurational space. However, it is driven by energy which is, as well as mass (because space-time is locally hyperbolic and we work in non relativistic conditions), conserved. Is just the feeding up with heat that move water from a less entropic state (ice) to a more entropic state (vapor). Once in an energetic state, water molecules configuration is the most probable (more or less), but as experience teaches, the way the passage between energetic states is obtained, can strongly affects the final “metastable” configuration (and, for instance, snow flakes, are an example). So for living systems, as well as for the hydrological fluxes and states, metastable, out of equilibrium states are the key. Once the systems are not anymore fed up with mass and energy, the system decay to a stable state, which is, at the same time a state of feasible minimal potential energy and  feasible maximum   entropy. Metastability is intrinsic to everything. The universe itself, as we conceive it, is a metastable state  that moves out of the Big Bang. It would be an oddity if the same would not be true for hydrological fluxes.

CHAPTER 7 - Is Life based on the Laws of Physics ?

The tile itself is compelling. E.S. certainly opens many question as: NEW LAWS HAS TO BE EXPECTED IN THE ORGANISM. He concludes that new laws are to be expected emerging (but the word meaning was not there seventy years ago) from disorder, or organising the new order appearing at macroscopic scales:

The orderliness encountered in the unfolding of life springs from a different source. It appears
that there are two different 'mechanisms' by which orderly events can be produced: the
'statistical mechanism' which produces order from disorder and the new one, producing order from order” 

The same type of problematics can arise even in watershed hydrology (read the title: Is Hydrology based on the Laws of Physics ?). The current practice declares that the collective work of many water molecules, and their interactions can be describe under certain circumstances, by macroscopic laws, in which the collective behaviour, the spatial structure of the problem, or other situations, are more important than the simple molecular dynamics (think to the residence time interpretation of the Instantaneous Unit Hydrograph, for the Italians, here, or, remaining on the same topic, the fact that the hydrologic response is mainly determined by the geomorphic organisation, than Navier-Stokes equation)

In the “THE NEW PRINCIPLES ARE NOT ALIEN TO PHYSICS”, E.S. in fact claims that the new physics is still physics, even if, in some sense, super-physical.  He seems to me  in a search, that is not certainly concluded, of  a unifying principle for understanding the stratification of reality, even the physical one,  in layers, each one governed by its own rules. This was enunciated more recently (translation into English is mine) as follows: 

“ We cannot deny that our universe is not a chaos; we recognise being, objects thet we recall with names. These object or things are forms, structures provided of a certain   stability; fill a certain portion of space and perdure for a certain time …” 


The search for scaling, scale invariance and scale breaking in hydrology, that made history in the last two decades,  was the analogous search of understanding these higher levels of organisation of the hydrological processes that still are quite elusive indeed.

_______________________________________________________________________________

On the same topics of What is life ? I found also the Ph.D thesis by Nathaniel Virgo , entitled “Thermodynamics and the structure of living systems”. He is also author of interesting papers referred on his website.
The thesis, besides, E.S. works cites also the previous work by Morowitz and an interesting paper by Schneider

References

- N.Virgo, Thermodynamics and the structure of living systems, University of Sussex, 2011
- Morowitz, H. (1968). Energy flow in biology. New York and London: Academic Press.
- Morowitz, H. (1978). Foundations of bioenergetics. Academic Press.

- Schneider, E. D., & Kay, J. J. (1994). Life as a manifestation of the second law of thermodynamics. Mathematical and Computer Modelling, 19(6–8), 25–48.

Wednesday, July 30, 2014

Uncertainty and Information Theory

We all are persuaded that uncertainty is a big topic, in life but also, in hydrology. So important that many hydrologists dedicate their life to its estimation, in connection to hydrological processes. Uncertainty since it is uncertain also generate confusion, and some of tis literature is  confuse and confusing (I don't want to cite negatively anyone, but I could).
Whatever the case, one of the best talk I attended to at last Fall American Geophysical Union Meeting, was the invited lecture by Hoshin Gupta. Hoshin has an outstanding (really outstanding, I mean) carrier in finding calibration methods, indentifiability of parameters and understanding uncertainty in models. Recently (see for instance Gong et al., 2013) he started to apply concepts derived from information theory to hydrology.  BTW, you can find the pdfs of his AGU’s presentations here: on the necessity to apply information theory concept to evaluate models structural hypotheses, and another one about Information theory and Bayesian inference in hydrology (both with a lot of citations).

I never really understood why hydrologists do not use information theory  concepts. I-Theory is a well developed mathematica theory with a lot of tools, and could help to get out from the fuzziness around  the determination of uncertainty in models. Besides, using the concept of I-Theory information/uncertainty one can gain knowledge about the complexity of processes outputs and, possibly, infer something about the "complexity" of models required to mathematically account for it in a proper way (remind: "Everything should be made as simple as possible but not simpler").

Hoshin is not the only one that was attracted by information theory. In my occasional browsing of the topic, I also found some other interesting papers: the first one, by  Majda and Gershgorin, is concerned by climate models. This is encouraging, because climate models are certainly at least as involved as hydrological models are, and, if not, even more. A second is Weijs et al. (2013): this is concerned with time series: we compare time series, therefore knowing how much information is hidden in a time serie (at least with reference according to some encoding key) is certainly useful. For Wejis and van de Giesen, this paper is just a coming back to the topic (see also Weijs et al., 2010, and Weijs CV)

Another paper came from  Rudell on EOS remarkably highlighting that the I-Theory applications to hydrology attracted last year  many more people than use to be.
For making me feeling among the smarter, I  bought a book, by Mezard (see also, and GS) and Montanari (Andrea, not our colleague Alberto who also has quite a production on uncertainty: please see his website) which can be a further source of ideas and thoughts.

So far, I never actually read carefully any one of the papers (or the book), but excited at the idea to have time to do it in deep.

References

Gong, W., H. V. Gupta, D. Yang, K. Sricharan, and A. O. Hero III (2013), Estimating epistemic and aleatory uncertainties during hydrologic modeling: An information theoretic approach, Water Resour. Res., 49, 2253–2273, doi:10.1002/wrcr.20161.

Mézard, M. and Montanari, A. , Information, Physics, and Computation, Oxford University press, 2009

Majda, A. J.,  and Gershgorin, B., Quantifying uncertainty in climate change science through empirical information theory, PNAS, August 24, 2010, vol. 107,no. 34, 14958–14963

Ruddel, B.L, N. A. Brunsell and P. C. Stoy, Applying Information Theory in the Geosciences to Quantify Process Uncertainty, Feedback, Scale, Eos, Vol. 94, No. 5, 29 January 2013

 Weijs, S. V.;  Schoups, G.  and van de Giesen, N., Why hydrological predictions should be evaluated using information theory, Hydrol. Earth Syst. Sci., 14, 2545-2558, 2010, www.hydrol-earth-syst-sci.net/14/2545/2010/, doi:10.5194/hess-14-2545-2010

Weijs, S. V., van de Giesen, N. and Parlange, M. B., Data compression to define information content of hydrological time series, Hydrol. Earth Syst. Sci., 17, 3171–3187, 2013 www.hydrol-earth-syst-sci.net/17/3171/2013/ doi:10.5194/hess-17-3171-2013