Sunday, November 30, 2014

H2O - The Java way to Machine Learning

I discover H2O thanks to the R community where I found a post about the connection of this tool with R. Besides the interest for Machine Learning and Statistics (Data Science) which has been increasing during this year (yes, I did not do anything with it: but I need to learn before, and I take years to do it ;-( ), I was intrigued by the fact that it is implemented in Java, is said to be fast, has connection to R, Scala and Python, and has among the advisors Joshua Bloch, so, I guess, it should be REALLY good Java from which to learn.  Obviously I did add it to my previous overviews of Java  and R tools for hydrologists.

If they are really good as they promised, there are several things that we can copy from them: the connection to R, their class to read data, their strategies for doing math.

The tool is open source, information and download can be found at H2O site. During the H2O conference, the famous statisticians Trevor Hastie and John Chambers gave interesting talks that can be found here.

Thursday, November 27, 2014

Ning Lu lectures on hillslope processes and (especially) stability, at the Summer School on Landslides

In 2013 University of Calabria organised a very interesting School on Landslide triggering (many thanks to Lino Versace, Giovanna Capparelli and Giuseppe Formetta).  I actually gave a hand to organised it, and  I also gave a lecture on Richards equation.  Waiting for the official post of the lectures at the school site (after which, I will remove my videos), I cannot wait anymore to have on-line the lectures by Ning Lu. He gave four talks taken out of his beautiful book, Hillslope Hydrology and Stability, written with Jonathan Godt, new coordinator of the USGS landslide hazards program, and former co-advisor of my Ph.D. student Silvia Simoni (her thesis here).  A must-watch for any guy in the field !

First talk: A brief conceptual history of soil hydrology and soil mechanics (from Chapter 6 of his book)

Third talk, part II: Hydro-mechanical properties of hillslopes (Chapter 8 of the book)

Fourth talk, part I: Failure surfaces  (Chapter 9 the book)

Fourth talk, part II: Field based stability analysis (Chapter 10 of his book)

Saturday, November 22, 2014

JGrass-NewAGE history - Version zero and version one

Jgrass-NewAGE (from now on, simply NewAGE) was conceived after the Adige River Authority was requesting a model for t he river Adige to help the managements of droughts. We decide to name the project Nuovo Adige. “Nuovo” since we already implemented a model, almost twenty years before for the same river, and that was the Adige model. Because the English translation of “Nuovo Adige” sounds very much like “new age”, this became soon the name of the model. The first model was implemented mostly by the group head by Alberto Bellin for hydrology and Aronne Armanini for the hydraulic part (I had some part in it, in designing the file organisation required by the model and suggesting some about the geomorphological unit hydrograph approach). The new model, however, had another ancestors: the real-time model operational at the Civil Protection of Province of Trento, also implemented by Alberto Bellin and collaborators, and including a snow model by some of my former students (Hydrologis, and Stefano Endrizzi). 

When I got the project, the first thought I did, was that, I had to abandon my beloved GIUH models for a more general one. We needed to estimate the discharges in multiple points of the river network, while GIUH gives the discharge just at the outlet of a basin, and this claimed, at least, for a generalisation of the usual GIUH for a system where multiple GIUH where used, each for any subbasin. (Well GIUH has also some limitations, but I will go back on this in a future post). 

If one browse the slides presented at the beginning of the project (In Italian) one could also notice that the emphasis of the project was not only on the models of physical processes, either conceptual or physical, but on the whole infrastructure of modelling, including a data base for storing the data (even the geographic data, of input as well as of output), and a visualisation system based on a GIS system (uDig) aimed to grow into a Decision Support System (DSS). 

Since the early nineties, I was in fact intrigued by the image of  a DSS system, found in a lost book of proceedings, which included a database, a system of visualisation, and models, and where all of the needed  concepts where already developed. Twenty years or so later I am still asking myself why the people who envisioned the picture, did not ever actually realised it in practice: but probably it was because a lot of tools (which I spent year to create consistently) were missing. 
Another key in the presentation was the inclusion in modelling of infrastructures like  water intakes, withdraw, dams and any other devices. An explicit treatment of flow in channel through a solid algorithm [Casulli, 1990] solving 1D de Saint Venant equation was also implemented.  
These features were known to be necessary to account for human action which, potentially, during low flows, could withdraw all river water for for irrigation and other uses [a clarification of the concept of Anthropocene at this local scale]. 

After the experience made with GEOtop, it was clear that modelling of such  systems could not anymore be implemented in the classical way as a monolithic program. We therefore look at OpenMI as a system to pursue a strategy of modelling by components.  However, presentation given at CUASHI 2008 biennial meeting (a must read including also some considerations about GEOtop) clarifies these and other design issues. 

From the point of view of the process mathematical descriptions, the plan was to “recycle” the snow model of the real time Adige model, to absorb the GEOtransf model [e.g. Majone et al., 2010] into the picture; to implement the estimation of Penman-Monteith (see also here the lectures by Dara Entekhabi) scheme for evapotranspiration, to recover all the tools of the Horton Machine for the geomorphometric analyses required for basin delineation. An ambitious idea was to include directly in modelling, through the use of GEOtools, the formats suitable for a direct GIS representation. 
Unfortunately, GEOtransf model was not available, due mainly to licensing issues, and we had to change the direction of our efforts, and we grabbed the Chris Duffy model [Duffy, 1996] as implemented in the Mantilla’s CUENCAS model. A couple of spatial interpolator for data measured at hydro-meteo stations were implemented, i.e. an ordinary Kriging, and Just Another Model Interpolator (JAMI), a simple bare-bone robust (which never break) method for making measure available in any catchment point. Finally an original description of the river networks, describing the topology, and governing the order of execution of the various modules, was implemented (here an early view of the watershed description, given as a Poster at EGU-Wien 2008, and here a more mature paper, just on the generalised Pfafstetter numbering in NewAGE).

We can call this above the zero version of NewAGE, which actually did not became never operational. It needed a testing calibration phase that, for various reasons could not be pursued. The financial support from the River Basin Authority terminated, and data base, model components, and whatever developed was closed in a drawer. A big lost occasion to have a new type of model working on River Adige!  What actually remains of version zero has to be archeologically retrieved. But we are doing it.

However, I (we) did not give up. Eventually we abandoned OpenMI in favour of OMS, and the reasons were explained in a previous post. Porting to OMS was done as well by Hydrologis. We also reduced our scope and concentrated just on the model components to improve them and verify their respondence to reality. 
To obtain this goal Giuseppe Formetta, in his start of Ph.D. implemented goodness of fit methods (GoF) and the calibration methods  DREAM and Particle Swarm that became two new OMS components. With them, a systematic analysis of the other components started.
We soon realised that Duffy’s model was not easy to calibrate (or, at that time, we were maybe not experienced in using the tools), and Giuseppe decided to implement a new entire module, based on Hymod. This was successfully accomplished, and the results are reported in Formetta et al, 2011.
Functional to this work was the incremental improvement of the Kriging (.) components, now including, besides Ordinary Kriging, Detrended Kriging, their local versions (accounting not for all the measurements points but just on the next neighbours), and five variograms models that can be fitted automatically at any time step having available data (also made by Giuseppe).

The refocused development of NewAGE was described in a concept paper actually published this year 2014.

The NewAGE zero had radiation simulation, however, the components was entirely rewritten for the short wave radiation on  the basis of Javier Corripio’s work.  The implementation was  initiated by Daniele Andreis, and enhanced, cleaned,  and completed with various contributions on how to estimate radiation attenuation by atmosphere and clouds by  Giuseppe Formetta. All of this work is summarised in this GMD paper (for some rehearsal on solar radiation, you can look at the slides referred here).

On the basis of the work on radiation, components for snow accumulation and melting were built and documented in another paper where the study was concerned about the Cache La Poudre basins close to Fort Collins, Co, another piece of Giuseppe Formetta's Ph.D. thesis

The actual state-of-art of NewAGE  includes also the implementation of a new version of the Penman-Monteith, its FAO counterpart,  and Priestley-Taylor formulas for the estimation of evapotranspiration (also implemented by Giuseppe F.). The potentialities in the latter components were not yet exploited as they could. But we will do it.

At present, therefore, NewAGE is constituted by a a set of components that can be used to simulated the whole hydrological cycle for any basin, from a few square kilometres to continental scale rivers. We can call the actual version JGrass-NewAGE version 1, but, actually is a set of components that can be arranged in various modeling solutions for various analyses that could not be easily obtainable with more traditional models.

The story is continuing and soon other components will be made available to the public, those available now are listed in a previous post, here.  A different view on the same concepts presented here, can be found here (seen from the Visualisation and Informatics perspective) and here (the idea of building a physico-statistical model).  If you resist and look the recent presentation given at Fort Collins you can see a rework of the ideas developed at CUASHI in 2008 and complete your overview.

The source code of the system can be found on Github.


V Casulli, Semi-implicit finite difference methods for the two-dimensional shallow water equations, Journal of Computational Physics 86 (1), 56-74, 1999

Formetta, G.; Mantilla, R.; Franceschi, S., Antonello A., Rigon R., The JGrass- NewAge system for forecasting and managing the hydrological budgets at the basin scale: models of flow generation and propagation/routing, Geoscientific Model Development Volume: 4 Issue: 4 Pages: 943-955, DOI: 10.5194/gmd-4- 943-201, 2011 

Formetta G., Antonello A., Franceschi S., David O. and Rigon R., The informatics of the hydrological modelling system JGrass-NewAge, 2012 International Congress on Environmental Modelling and Software Managing Resources of a Limited Planet, Sixth Biennial Meeting, Leipzig, Germany R. Seppelt, A.A. Voinov, S. Lange, D. Bankamp (Eds.) 2012-proceedings, 2012

Formetta G., Rigon R., Chavez J.L., David O., The short wave radiation model in JGrass-NewAge System, Geosci. Model Dev., 6, 915-928, 2013,

Formetta G., Antonello A., Franceschi S., David O., and Rigon R., Hydrological modelling with components: A GIS-based open-source framework, Environmental Modelling & Software, 5 (2014), 190-200

B Majone, A Bertagnoli, A Bellin, A non-linear runoff generation model in small Alpine catchments,  Journal of hydrology 385 (1), 300-312, 2010

Vrugt, J. A., ter Braak, C. J. F., Diks, C. G. H., Robinson, B. A., Hyman, J. M., Higdon, D., 2009. Accelerating Markov chain Monte Carlo simulation by differential evolution with self-adaptive randomized subspace sampling. International Journal of Nonlinear Sciences and Numerical Simulation 10 (3), 273-290. DOI:10.1515/IJNSNS.2009.10.3.273

Thursday, November 20, 2014

At the Disasters' School of Dr. Unavoidable (after the recent flooding in Italy) - by Stefano Benni

Tagged with #La Repubblica 19 novembre 2014#Stefano Benni
( I am sorry for the bad English, but here it is the satyric article by Stefano Benni)

About the latest climate catastrophes of our country, we interviewed an expert little known, but which plays an important role. Is Dr. Unavoidable, responsible  of UTMA, Ufficio Tutela Mutamento Ambientale (Protection Bureau against Environmental Change). 
- Dr. Unavoidable, first let us define the role of your office. That is clearly to protect the soil and Italian citizens against climate disasters -. 
- No, I correct. Our office is responsible for protecting and maintaining the situation of environmental destruction, preventing solutions that would create costly and utopian expectations -. 
- Excuse me, but why? 
- For many reasons. First, because the environmental change requires adaptation, and as long as the Italian people do not get used to  collapses,  flooding and landslides they will always be scared and insecure. And because the climate disaster is unavoidable, it becomes necessary a new culture, which is that the inevitability -.
 -Do some examples ...

- Our office is looking for new forms of communication to help the Italians to accept that climate change with patience. For example, we coined the term "water bomb". It is obvious that against the old showers once you could do something, but against a water bomb there is nothing to do. The fault lies with clouds militarized and bellicose. They used to say: here comes the bad weather. Now they say: it comes the cyclone Charon, the anticyclone Polyphemus, hurricane Cynthia. All feel inside an epic event, or like waiting for a nosy friend. Close the door firmly, Cynthia arrives. And. please, stop with ideological speculations, be quantitative! When I say that in a place they fell 200 mm of water, that is, as it usually rains in a month in Caracas, I explain mathematically the misfortune of what happened. It is not true that we do not bustle, we have  detection systems at state-of-art and tiring ... you know how much time is lost to collect two hundred millimeters of water with a spoon?
- Climate change is known. Against flooding, illegal buildings, landslides, can't you do prevention?
- Unfortunately we do not have the money for prevention. We need to spend them to repair the damage of what we do not have prevented. If we spent the money for prevention, then we would not have the money to repair the damage -.
- But perhaps preventing there would be not damages ...
- This is a bizarre aspect of the matter, we are studying it. But we do a lot of prevention. For example, in fifty years television weather forecasts increased from three to three thousand a day, and the graphics are much improved. Another example, if homes are built in a geologically hazardous us ...
- Do not allow to build, and move people away -.
- No, we can not intervene, it would take the army. But we immediately say that they are abusive. Then we condone them. Indeed, from now on we think of condone them even before they build illegally. It's not a great idea?
- You think in a strange way. And the flooding?
- We were not ready. Once the rivers "came out of the bed," "inundated", "overflowed", "overrun". But now do a new thing "esondano". We did not expect -.
- C'mon it's the same thing. The Po river "esonda" and floods, but it has already done so many times -.
- Of course, the Po can do it, it is a great river. But now any creek or channel feels entitled to overflows. We can not check them all, it seems that they do it on purpose -.
- And the banks? The work of containment? Reforestation?
- You see, if I have to build Milan Expo or ghost palaces at La Maddalena, I have no obstacles, large contracts must be shipped quickly and, with a little of bribes, everything speeds up. But every time there is a contract for a levee, for a consolidation effort to dredge a river, firms in competition sue, they appeal to the TAR, everything is delayed. It is not our fault. We should entrust the hydrogeological hazard prevention to a pool (in English in the original), or to the mafia or to FIAT, and then things would go quick. But they do not let you do it-.
- So in the future will be even worse?
- It depends on how we see the situation. We are preparing a new approach to scientific and media. First we created the event ω, omega event -.
- What is it?
- The event ω - omega is a type of occurrence very rare and unpredictable. For example, the rain of Genoa, a clash between comets, an internet connection to work properly, a soccer arbitrage without controversy. These exceptional events we can deal with only one way -.
- What's that?
- You see the shape of the omega, what would you remember? We hope in our ass, and above all, we politicians must have a face like an ass -.
- I do not see much of a prevention in it-.
- The government does not have to make prevention. It has already has too much to worry with European banks and shopping sprees in dildos. It is the people who must assume their responsibility with regard to climate change. We have forgotten that Homo sapiens comes from water, that we are born anfibi. We must be ready to return to our natural element. In any Italian house there must be at least a raft or a boat, life jackets for everyone and a wetsuit, ("muta" in Italian, word derived from mutare, "to change") and also snorkel and fins. Stop to complain that the subway is flooded! Dive! This means to be good citizens ...
- But it is years we are waiting for a new Hydro-geological Plan -. 
- And we have many new ideas. Against the otters that eat away the banks, we will introduce into rivers dozens of crocodiles. Bed and breakfast in the craters of volcanoes will be prohibited. The committee for the earthquake's risk will be replaced by a fortune-teller. Homes where there will be only the fifth floor will be built to prevent flooding. To avoid complaints about delays of the trains, schedules in the stations will be written in Chinese. But above all, from now on, all over the country applies the fuchsia code, which means that we are always in an emergency. If you go by car, on foot, by bike, fuck you. You were warned. - So you think Italians will have to get used to the disasters? - Yes, they will have to happily live them, because they are the unavoidable future. Farewell Mediterranean climate, we have entered the climate Omega. Excuse me, but call me on the phone -. 
- Dr. Unavoidable, is your secretary. They say that the road is flooded and your car was swept away ... 
- How? But it's a scandal! What has happened? 
- Excuse me, but have fallen 132 millimeters of rain has flooded the garage as usual and culverts are clogged -. 
- That's enough with this bullshit of millimeters of rain !. Where are the firemen? The culverts clogged, what a scandal! My new Mercedes. What does the government? 
- Excuse me doctor, but the government is you, and just told us that we need to adapt to climate omega -. 
- Who gives a damn, the car is mine. Where are my fishing boots and the duck-life- jacket ? But in that shitty country we live in? And as for omega event, you know what?
 - I can imagine ... thanks for the interview, Dr. Unavoidable.

Stefano Benni

and if you want some statistics on hydrological and geological hazards in Italy, see the report from IRPI here

Sunday, November 9, 2014

Design Patterns

Programming the object oriented (OO) way is not simply writing down algorithms that do “for" loops. The core concept to understand, for a OO programmer, is how many classes have to be implemented on the basis of the analysis of the problem under scrutiny, and to be eventually managed by a client (the “main” method in C family of languages)  to solve a task. Therefore, a series of questions arise:
  • Are there strategies for producing a minimum number of classes without loosing functionalities  in the work done and promote its extensibility  ? 
  • What goes into a class, and what in other OO features (like methods of a class) ? 
  • How to create classes with a minimum of generality that can be reused in other problems ? 
  • How to build classes that can be easily maintained, modified and evolved without disrupting other parts of codes ? 
(In practise the questions are those for which OO was born but knowing an OO language does not directly answer them, just offers an infrastructure for finding the answers).

An entire discipline, software engineering (SE), was established to find the answers, and various behaviors were codified to improve software writing practice and management (but software engineering also covers the organisation of software production, and the methods to give  clear specifications to pariah-programmers for practical coding).
The key actions implied by the answers, however, are not the ones a scientific programmer was used to face: s/he would expect to have a formalised mathematical problem and the scope of her/his work to consist in fact in finding the best (shortest, fastest, cleanest) algorithms to solve it (see the classical Knuth’s books, or the popularisation of Numerical Recipes - I hate their licensing scheme) but not to answer questions about the organisation of the code.
Maintainability and efficiency comes to a cost and OO adds a further level of complexity to the programming practice that scientists are not always ready to face: and it is quite paradoxical that these aspects of SE are not wide recognised as a fundamental task in our times when computer programs have entered in the daily practice of many scientists (see also the introduction to this paper of ours). As a matter of fact bad code practices easily develop in bad science.

Back to the general questions posed at the beginning of the post, proficient programmers observed that, certain problems were recurrents and that some solutions were better than others in term of maintainability and generality. These were called "design patterns”:

"The elements of this language are entities called patterns. Each pattern describes a problem that occurs over and over again in our environment, and then describes the core of the solution to that problem, in such a way that you can use this solution a million times over, without ever doing it the same way twice."  (from A Pattern Language by Christopher Alexander, that is said to be inspirational for the the Gang of Four book - see below).
Other definitions of patterns can be fond in the Portland Patterns Repository and in The Hillside group on patterns,.

The patterns idea gained popularity after the book of Gamma, Helm, Johnson & Vlissides, Design Patterns, which actually presents 23 free patterns, grouped in three categories, called

Beyond these initial patterns, other were identified, for instance for parallel computing (see Concurrency patterns), and  in other fields.
The patterns style and  use (and concept) are not immediate to grasp. They in fact derive from long practicing some specific software issues, and a subsequent conceptualisation and abstraction. As often happens, their abstraction makes them general but  quite difficult to be assimilated without going back to develop many examples.
A few guiding ideas are behind the selection of patterns: code should be made easy to modify without large refactoring efforts, and encapsulation of code parts maintained as much as possible. Subclassing maintained to a minimum. An explicit slogan was "program to interface and not to implementation”. Roughly speaking this means: first implement abstract classes or interfaces (in C-family of languages), the, for what is possible, delay the use of concrete classes at run time. Instead of creating subclasses, create other classes to which "delegate responsability”, in order to reduce coupling between classes. The reader is invited to browse the web to understand her/himself what it does mean, advising her/him that explanations are usually full of computer science slang (I think, actually that the language is part of the success of the book).
In any case, the right way to get used to Design Patterns, is, as I said, to use and practice them a lot by trial and error. Java aficionados can have several vulgarisations, many of them can be found on the web [Ava Java].

However, the relevant questions here, in this blog,  are:

  • which of the original (as well as other) patterns are useful in scientific programming ?
  • Are there any pattern that is characteristic to hydrological problems ?

Certainly there exists hydrologists that apply some patterns for the tasks the patterns were created. I do not know if experienced hydrology programmers apply  patterns to some specific of hydrology. Please let me know if you are someone of those, and I will really likely exchange ideas and a (very few) experiences.

In the general science framework, instead, I find a few more references. The first books I can reference are

Other resources come from the paper by the Izaguirre group, and are referenced below.

Further Readings

Erich Gamma, Richard Helm, Ralph Johnson and John Vlissides: Design Patterns: Elements of Reusable Object-Oriented Software
Thinking in patterns’ Notes by Bruce Eckel

[Portland Patterns Repository] - Patterns are the recurring solutions to the problems of design. People learn patterns by seeing them and recall them when need be without a lot of effort. Patterns link together in the mind so that one pattern leads to another and another until familiar problems are solved. That is, patterns form languages, not unlike natural languages, within which the human mind can assemble correct and infinitely varied statements from a small number of elements.

[The Hillside group on patterns] - Fundamental to any science or engineering discipline is a common vocabulary for expressing its concepts, and a language for relating them together. The goal of patterns within the software community is to create a body of literature to help software developers resolve recurring problems encountered throughout all of software development. Patterns help create a shared language for communicating insight and experience about these problems and their solutions. Formally codifying these solutions and their relationships lets us successfully capture the body of knowledge which defines our understanding of good architectures that meet the needs of their users. Forming a common pattern language for conveying the structures and mechanisms of our architectures allows us to intelligibly reason about them. The primary focus is not so much on technology as it is on creating a culture to document and support sound engineering architecture and design

- [Design Pattern for scientific software] by Dan Gezelter. He pointed to some papers on Design Pattern is scientific computing by the Izaguirre group of which I found [1] and [2]

Design Patterns in Wikipedia