Friday, November 6, 2020

The State-of-Art and the perspectives for next GEOframe research

These are the contents of an e-mail I sent to a friend and colleague to push forward our collaboration. Despite it has been written to a specific person, I think some of the topics can be on general interest for the topics it covers. At least for who is interested in Hydrological Modelling.

Dear Friend,
I am trying to simplify here the objectives of my research in order to see where we can find convergence of aims and goals.

Overall, I want to pursue a tight connection between the theory of hydrological processes and their sound (replicable, robust, reliable) implementations (see also here for other explanations). Actually, I work both on the theory and on the implementation. Soon in my career, I realized that a poor implementation of a correct theory often produces wrong results and moreover its incorrect falsification brings credits to flawed ideas (the case of  "tranchant" judgments on Richards equation, based on unreliable integrators is one of the cases). I also grew the idea that, differently from some other colleagues, I wanted to build not just doing-programming but programming-system products (e.g. - Brooks, 1975), i.e. reusable software on which other people can build new knowledge (yes, the idea that building on each others shoulders, and working that way, possibly having the couple of smart intuitions, we can arrive where only the giants, usually arrive). 

My main goal is to build better models than those existing, more controllable, and less prone to devastating bugs. My research tends to be more “methodological” than applied. It is exactly this approach that moved me towards OMS3. It, in comparison with other options, followed by successful colleagues, presents a clean design, support for models’ controllability, encapsulation of modelling solutions, easy and ordered reuse of modules, intrinsic documentation with annotations, support to technical issues, like parallelization and calibration, without overwhelming the hydrologist concentrated on physics. Besides, a wise use of components, could help to dose information to the users and set it visible only when and where required.

Coming to the practice of my research, I focused mainly to two types of models, those I call Hydrological Dynamical Systems (HDSys), mainly based on the solution of multiple coupled non-linear, non-autonomous, ordinary differential equations, and those that have the space variables explicit and solve partial differential equations. From a different perspective, my goal has been to cover entirely the various aspects of the hydrological budget (often including terrain and soil/sediment) not only focusing, as traditional models do, one some or one of the aspects, like discharge or evaporation, or infiltration, or groundwater. Coupling the water with the energy budget, has been an objective of both the types of modelling, especially, but not only, because temperature is easily measurable, even from remote sensing. Tracers, Nutrient and pollutants, were never deeply considered, except recently, but they are part of my modelling tools since many years.

For a community to grow around the previous ideas, some key tools are still missing. The points below summarize: i) what I envision is needed from different perspectives/users; ii) what futures OMS developments should potentially consider. Some of these points are already being reached with the work we did, some are planned to be implemented, for some others we need your support to reach a critical mass and make them happen.

Power User side:
  • Smoothing out some part of the process to deploy a modelling solution to a specific concrete catchment. With students of the GWSs and Hydrological modelling class the two processes of extracting the HRU and interpolating the data were instructive but too much detailed and cumbersome.
  • Using the console is easy and usually hassle-free. However, because we use mostly Jupyter notebooks for the treatment and analysis of inputs and outputs this is a further environment to learn. Using Docker and the command line inside Jupyter could be a choice but with Docker we had hard times on Windows. A console inside Jupyter would be the best choice.
  • In general, a convergence of our tools with those tools people use the most, as Jupyter, decrease the learning curve and developer commitment to bring in and maintain tools.
  • Some ancillary tools for “joining” catchment studied by different people are required
  • calibration revealed to be a time-consuming effort that needs parallelization and speed-up.
  • Probably a server or a “hub” to store and retrieve the collective work, including parameterisations, inputs and so on.
  • Manage the possibility of having multiple treatments of the same catchments would also be required sometimes in the future.

For institutional user
  • Connections to Delft-Fews could be an option to investigate.
  • Our group needs also to experiment with CSIP .
  • A distribution of all the material and the code, through some tool like Anaconda would be desirable.
  • They need dedicated interfaces. For they, modifying parameters and models structure should be not be an option as for researchers or power users
  • Scalability of the computing effort should be the standard (and this should include Net3)


  • Source code should be available on public repository like Github
  • To improve developer appropriate/specific documentation for them should be a continuous effort

Potential developer/researchers

  • When they come from Environmental Engineering or sciences, they usually are familiar with Python, R or Matlab. They do not have notions of OO programming, nor of basic software engineering background. Therefore, appropriate material providing all of this knowledge should be produced. I started with a Java for Hydrologist 101 but I am far to have completed it.
  • They are not comfortable to use tools like Git, Github, Docker, Unit tests, and other commons tools which are necessary for software carpentry and collaborative work. Therefore, some training course on these should be also provided .

The above is more a wish list which we are keeping in mind. Frankly we do not have yet the all the competence to treat them all. As you see I did not list any machine learning tool: but this does not mean that we are not looking to them. For the moment is just safe for us to concentrate to enhance and bring to an optimal state what we have and publish that ten of papers that we have in production on he work we have already done. We are looking for resources though and, resources arriving, we could also think to statistical/machine learning methods to be introduced. One thing to be remarked is that GEOframe-NewAGE can easily replace PRMS. The module we have, usually, are different from those PRMS has, but implementing them the very same way PRMS does should be VERY easy, if this is the goal. A greater integration with AGEs would be also advisable. The main differences to be treated for compatibility are the IO. For now, we often stick with complex data formats but abstract the algorithms from them is an objective we have in mind. Mostly we had to follow our way so far to be sufficiently comprehensive. To be sincere, IMHO, some parameterisation of the processes inside AGEs are simply old hydrology, not currently supported by researchers but, yes, still in use by practitioners (which worldwide use SWAT, though). In all I mentioned I forgot to mention the work by Daniele Dalla Torre that ported SWMM to OMS3. That is a thread that is, at present in a dead end but it can come back alive any time.

Below, I give further information on: i) the reasons why we use OMS; ii) the new components I mentioned before or we are going to develop.

When I arrived to OMS3, my most recent achievement was a stable version of GEOtop, a model that solves the water and energy budget, as its foundational paper told. After 15 years, GEOtop remains quite unique in the panorama of “process-based” models. It in fact includes what is usually present in other process-based models, i.e. an integrator of Richards, Groundwater and Surface water equations, with what usually appears in soil, vegetation, atmosphere models. Besides it has a solid model for snow height evolution, used operationally all over the Alps, and freezing soil, which constitutes a third type of process-based model usually cared by a different scientific community. I’ve certainly sinned arrogantly in doing what others still not do, even with much larger resources, and I will probably go to hell for that. GEOtop has a decently extended literature and I could have capitalized better its treasures, but I preferred to move on, because while I was getting GEOtop stable I’ve been also touching its limits.

Its monolitic structure made of thousands lines of code, made it not easily modifiable and improvable with incoming research and understanding.

Its ambition to cover all the areas of hydrological modelling have made exploding the number of input parameters a fact that most researchers found overwhelming. Introducing competing ideas to model some of the processes became practically impossible and any science advancement nullified.

That’s why I was looking from an intrinsically modular system that could resolve the above issues and boost collaborative work, and that’s why I moved to OMS3 in 2008.
I would have stick with that GEOtop objective but in the same year had quite unexpectedly financial support for studying the management of draughts of river Adige. GEOtop was impractical for that use because of its inability to be calibrated and some flaws in its subsurface-surface water interactions. Necessity brought to the implementation of GEOframe New Age version 0.

In the subsequent decade I and collaborators worked on the GEOframe model perspective, faster to calibrate and, nevertheless quite complete from the point of view of processes integration. For many ancillary parts of the system, it was reinventing the wheel again from scratch but this finally produced the mature product that GEOframe-NewAGE is today. It was conceived for using the natural spatial fractal - graph-like structure of rivers for distributing spatially the hydrologic response unit (HRU) physics and computation. The main driving idea behind GEOframe is that we do computation on a graph nodes which exchange mass and energy according to the interactions among parts described by graph's connections. These nodes can be spatially distinct entities, like hillslopes and HRU, or concurrent processes like discharge and transpiration. In principle an engine under the hood is responsible for distributing the computation along the graph, while the hydrologist takes care of describing the processes with the appropriate degree of refinement. This had a first implementation with Net3 but I believe can be improved in several directions. Before Net3, sure, river networks were schematized as graph, but their topology was hardcoded and no variation was possible in the spatial structure of the model without disrupting the whole. With Net3 the topological structure of the connections can be modified just before the run time, inserting or eliminating human infrastructures, diversions, new nodes of calculation, lakes, reservoirs.

Net3 opened also to the possibility for different researchers to work simultaneously on different part the catchments (actually of the graph) enabling the possibility for a sort of “crowd modelling action” to cover the whole Earth with GEOframe based modelling performed by a crowd of researchers or simply trained people. Clearly for this an infrastructural work is still missing but potentially it could provide a collective works that highly surpasses the present global scale hydrological applications which are based on rough characterization of parameters and scanty local reanalysis of data which is not possible, even for large research groups. The first application of this modeling strategy will be the application of the model to the river Adige, separated in a thousand or more HRU of which we have, so far, some work done by the students of my course of Hydrological Modelling. River Adige is relatively small (10^4 square kilometers) but a variety of climate situations and anthropic activities and settlements that make it very challenging to be modeled. Eventually the simulations will be extended to the whole Alps and beyond.

From the point of view of interacting components, GEOframe has many: the traditional set of tools coming from Hydrologis for terrain analysis; a set of Krigings for interpolation of hydrometeorological variables, the estimation of shortwave and longwave radiation including shadows and topography effects, interception of rainfall by canopies, three simplified models for snow water equivalent modelling, various tools for reservoirs-like modelling (a la PRMS), Muskingham-Cunge and 1d deSaint Venant propagation, Priestley-Taylor, Penman-Monteith and a new model called Prospero which implements a revision due to Penman-Monteith by Schymanski and Or. All the models can make use of LUCA and PSO tools for calibration and the dedicated papers constitute a guideline for their use.
Giuseppe and I just hired a coupled of Ph.D. student and, they, among the other stuff, will work on data assimilation and possibly on some OMS issues.
Did I abandoned then process based modeling? Not at all, the original plane to build the new GEOtop 4.0 is actually very alive.
The nucleus are the tools growing around WHETGEO. At present we have:
  • An integration of Richards 1D with and without temperature, decoupled and coupled with the Energy budget (return the soil temperature profile)
  • An integration of Richards equation 2D (hillslope profile, for instance). No coupling with the energy budget yet.

These tools make leverage on terrain analysis, radiation estimation, interpolation of data, estimation of Evaporation and Transpiration already present in GEOframe but have a gridded domain instead that an HRU separation. With respect to GEOtop, the integration algorithms are completely redesigned around the Newton-Casuli-Zanolli (NCZ) algorithm for Richards, and an appropriate implementation of the grids where topology and geometry are separated. In GEOtop a more traditional Newton-Krylov (NK) algorithm was used whose convergence is not granted a priori and the equations were written for a structured (regular) grid which causes artifacts into the results. Using appropriate design patterns, a twofold objective was obtained: to make room for changes in parameterisations of the equations and, and to maintain as simple as possible (but not too simple) the contents of the inputs. The use of standard formats like NetCDF for outputs contained the number of otherwise exploding output files. The input and output format were decoupled from the algorithms though, in order to maintain flexibility for changing the outputs format. As seen many components developed for GEOframe-NewAGE could be reused for WHETGEO and, in fact, in the foreseen future, also the Net3 infrastructure could be possible used to aggregate various hillslopes simulated by independent WHETGEO runs.
Because I have experience with them and notwithstanding the opinion of many colleagues, I do not see in fact that the lumped reservoir models can cope with those processes which have a well definite spatial history. Remote data is a new frontier and, besides, WHETGEO is fully able to exploit them.

Evolving Prospero will bring into WHETGEO green waters and vegetation and GEOtop 4.0 will be much greener than GEOtop 3.0. This achievemente is almost there. Carbon cycle evolution, forestry and crop, will be a set of ODEs attached to sites, either described as a grid cell or a HRU. Their mathematics is quite the same that for HDSys, with just a different interpretation of the parameters. Transport of tracers and pollutants, via the advection dispersion equation is also almost obtained, because these equations belong to the same family of the heterogeneous transport of heat in porous media that we have already implemented.
Giuseppe in his magic hat has already setup models for hillslope stability analysis that just wait for the 2D and 3D WHETGEO to be tested (the first) and fully implemented (the second). But the 3D solution is more a problem of drawing the grid than everything else with respect the 2D solver.
A possible threat to this WHETGEO is its computational burden. The algorithms are efficient but a high-resolution three-dimensional grid has potentially millions of nodes and simulations are time consuming. Parallelizing its core routines in a way that does not clash with the other forms of parallelism present if OMS3 will be a challenge.
Therefore, what I foresee in the next years is these tools to get maturity and to be used by a potentially large set of users. GEOtop 4.0, built on WHETGEO, Prospero, and other tools would be a really operational tool for instance for landslide risk early warning; for the detailed soil moisture account for precision, regenerative agriculture; for small catchments runoff and sediment production (the latter a feature to be implemented), and obviously would be a great tool for studying any aspect of the critical zone, in any climate past present or future. At the present we cannot foresee when we could add accurate modules for snow (like those or better than those already in GEOtop) but the modules already present in GEOframe could be used easily on a pixel base to surrogate them. Calibration tools for WHETGEO process-based modules, I think, will require some adjustment of the calibration tools now present in OMS3 and we did not try anything about yet.
This mail was pretty long but, I hope it serves to clarify my point of view and the legacy I have with my previous research, I also included Tim in the mail, because I thin he can be interested on many of the research I exposed. Thanks to the friendship we have, I hope the way to strengh our past collaboration in a few future objectives where we can find reciprocal satisfaction.

All the best,

ric (and the guys)

No comments:

Post a Comment