Monday, January 29, 2018

Grids - Notes for an implementation

This post talks about the same subject already analyzed in a previous post but from a slightly different point of view, hoping to add clarity to the concepts. We assume to already have the grid delineated, as for instance the one in Figure. Some other program or someone else provided to us. All the information is written in a file, maybe in a redundant form, but it is there and we just have to read it.
Assume we are talking about a three-dimensional grid. Nodes, edges, faces, and volume are identified by a number (key, label) which are specified in the grid’s file.

Therefore the problem is to read this file and implement the right (Java) structures/objects to contain it, keeping in mind that our goal, besides to upload the data in memory is to estimate the time marching of a variable $A$ (and, maybe some other variable) in a given volume. Its time variation depends on fluxes of the same quantity (mass, to simplify) that are localised at the face that constitute the boundary of the volume.

Getting the things done

The simplest thing to do is then, to associate a vector whose entries are the values of $A$ for any of the volumes in the grid. Let say, that forgetting any problem which could be connected with execution speed, caching [1], boxing-unboxing of variables, we use a Hashmap to represent these values.
We will use also a Hashmap to contain the fluxes in each face. This hasmap contains $F$ elements: as many as the number of faces. The file from which we started contains all of this information and therefore we do not have any problem to build and fill these “vectors”.
Let’s give a look to what our system to solve can look like. The problem we have to solve changes but, schematically it could be:
For any volume (we omit the index, $i$ of the volume for simplicity):
$$ A^t = A^{t-1} + \sum_l a_l^{t} *i_l^{t}*f_l^t/d_l $$
$t$ is time (discretized in steps) and $t-i1$ is the previous step;
$l$ is the index of faces belonging to the volume
$d_l$ is the distance between the centroids of the two volumes that share the same face;
$i_l$ is a sign, +1 or -1, which depends on the volume and the face we are considering (volume index omitted);
$a_l$ is the area of the face $l$ or some function of of it.

For generality, the r.h.s. member of the equation is evaluated at time $t$, i.e. the equation is assumed to be implicit, but at a certain moment of the resolution algorithm, the function will be expressed as depending of some previous time (even if from the point of view of internal iterations). For a more detailed case than this simplified scheme, please see, for instance [2].
The Hashmap of $A$ contains the information about the number of volumes, i.e., $V$.
(I) an indication of the faces belonging to each volumeIl vettore (hash map) and
(II) the information about which volumes are adjacent
To obtain this, we have to store information about the topology of our grid. In the previous posts, we tried to investigate and answer to the question: which is the most convenient to store these informations ? (Right, more from a conceptual point of view than from a practical one).
From our previous analysis, we know that that for encoding the number of faces for any volume, we have to introduce a second (2) container that has has many position as the number of volumes, and for any volume a variable number of slots, each for any face of that volume (if the grids is composed by volumes of the same shape, the latter number of slots is constant for the internal elements of the grids, and variable just for the boundary volumes).
In this preliminary analysis, a Hasmap seems appropriate to contain this information, letting, for the moment, unspecified what types or objects contains this topology Hashmap, but eventually, they will contain a key or a number which identifies in a unique way a given face.
In this way the information about any face is present in two slots, belonging to the volumes that share the same face.
We have then the various quantities to store in each face:

  • $a_l$ (3) 
  • $f_l$ (4) 
  • $d_l$ (5) 

Anyone of the above quantites require a container with as many elements as the faces. We could, then, use three Hasmaps, whose indexes (keys) coincide with the numbers (keys) that in the topology Hasmap (2) realate faces to volumes.
To elaborate our equation we need then five containers, of which the topology one has a structure to be specified later. Well, actually all the hashmap internals has to be specified.
The elements of $a$ and $d$ are geometrical quantities that can - and has- to be specified outside the temporal cycle, if the grid structure is not modified during the computation. However, to be estimated they require further topological information that we still do not have (but can be in the grid file).
To estimate faces’ area, we need to know the nodes of the grid [3] which can be a sixth (6) container, and the way they are arranged in the faces, which is a seventh (7) container. Since the choices we did, we still choose to use Hashmaps to contain them. The Hashmap of nodes just contains the number (or the key) of nodes (and is, maybe, in most problems, pleonastic). The Hasmap of faces need to contain the arrangement of nodes, ordered in one of the two direction (left-hand -rule or right-hand-rule, clockwise and counterclockwise depends on the side you observe the face, so what is clockwise from a volume is counterclockwise for the other).
The (7) container has to have as many elements as the faces and each element contains the ordered nodes (a link, a reference, to). To estimate the area of the faces we need actually the geometry of the nodes, meaning their coordinates in some coordinate system. Usually, in most of the approaches, nodes are directly identified by their coordinates, which therefore are inserted directly (in the appropriate way) in container (7) instead that the link/reference to nodes' number (key, label).
However, I think that probably keeping the geometry separated from topology could be useful, because topology has its own scope, for instance in guiding the iteration in the summation that appears in our template equation.
Therefore we need a further container (the eight, 8) for the geometry, containing the coordinates of points. This container has $N$ elements, as many are the nodes.
The container of distances, d, to be filled needs to know between which volumes distances have to be calculated. This information, about volumes adjacency, needs another, further container (the nineth, 9) with length as the faces, i.e. with F elements. Every element, in turn, must contains the index of the elements between which is estimated.
This information that goes into the container 9, should already be in the file from which we are reading all the information. However, we should recover it by scanning all the volumes and finding which have a face in common. The latter, is a calculation that can be made off-line and we can, in any case consider it an acquired.
At this stage, we do not have much information about $f_l$. Certainly it will need to know which are adjacent volumes and requires the knowledge in container (9). Because $f_l$ is time varying it implies that information in (9) has to be maintained all along the simulation.
Every other information will require a further container. To sum up, we have a container of:

  1. quantity A;
  2. topology of Volumes;
  3. the area of faces;
  4. fluxes;
  5. distances between volumes’ centroids;
  6. nodes number (label, key)
  7. nodes that belong to a face
  8. coordinates of nodes
  9. topology of faces (referring to the volumes they separate)

Towards generalizations that look to information hiding and encapsulation

We can observe that we have three types of containers: the ones which contain topological information (2,6,7,9), those which contain physical quantities (1,4), those which contains geometric quantities (3,5,8).
If, instead than a 3D problems, we would have a 2D or 1D one, the number of container change, but not their types.
To go further deep, the first problem to deal with could be to understand how, in the topology container, for instance of volumes (2) how to make room for the slots indicating their faces, since they are of variable dimension. In traditional programming, usually they would have adopted a “brute force” approach: each slot would have been set to have the dimension of the larger number of elements to be contained. The empty element replaced by a conventional number to be check. Essentially all of it would have resulted in a matrix whose rows (columns) would correspond to the the number of elements (volumes, faces) and whose columns (rows) to the variable number of elements they contain (in the case of volumes, faces; in the case of faces, edges, and so on).
In a OO language, like Java, the sub-containers of variable dimensions can be appropriate object, for instance called generically “cell” containing an array of int[ ]. Therefore the global container of a topology could be a hashmap of cells.
In principle we could use the container defined above without any wrapper, directly defining them in term of standard objects in the library of Java 9.
However, we would like, maybe, to use other types eith resepcts to those we defined. For instance, in some cases, for speed reasons, we could substitute ArrayList to Hasmap or, someone of us, working on the complexity of caching could come out with some more exotic objects.
To respond to these cases, we would like then to introduce some abstraction which, without penalizing (too much) performances. Sure, we can define wrapper classes, for instance:
  • for topologies (essentially used to drive iterations)
  • for geometries
  • for physical quantities (used to contains data, immutable for parameters, and time-varying for variables)
These three classes would allow to fit all the cases for any dimension (1D, 2D, 3D): just the number of topology element would be varying.
However, this strategy could not be open enough to extensions which do not require breaking the code (be closed to modifications).
Using instead of classes, interfaces or abstract classes could be the right solution.
Classes, or BTW, interfaces could have also the added value to contain enough field to specify the characteristics of the entities, (es. if they work in 2D or 3D, their “name”, their units, all those type of information requested by the Ugrid convention). All these types of information are, obviously, also useful to make the programs or the libraries we are going to implement more readable and easier to be inspected by an external user.
While the topology class is self-explanatory, the geometry class (interface) has a connection to its topology. Therefore the geometry class should contain a reference to its topology to make explicit its dependence. A quantity object, for the same reason, should contain a reference to both its topology and its geometry.
The simplicity to use classes directly could be tantalizing, however, the investment for generality made by interposing interfaces or abstract classes is an investment for future.
Berti [3] advise, in fact, to separate the algorithm from the data structure, allowing therefore to write a specific algorithm once forever, and changing the data it uses, as we like. This would be a ideal condition maybe impossible to gain, but working to maintain in any case the possible changes in limited parts of the codebase is an add value to keep as reference. That is why “encapsulation” is one of paradigms of OO programming.

Some final notes

1 - In using cw-complexes to manage topology there could be overhead for speed. For instance, for accessing the values in a face of a volume, vi have to

access the volume,
access the address of the face
redirect to the appropriate quantity container to access the value
It could be useful then to eliminate one phase and once accessing the volume, having directly associate to it not the address of of the faces but the values contained in it.
If we have more than one value for face to access, related to different quantities and parameters, than maybe this added computational overhead could be considered negligible with respect to the simplicity of management of many quantities. In any case, an alternative to test.

2- At any time step, it is not only requested the quantity at time $t$, $A^{t}$, but also at the previous time, $t-1$, $A^{t-1}$. The two data structures share the same topology (which could represent a memory save). During time marching an obvious attention that the programmer needs to have is not to allocate a new grid to any time step. We can limit ourself to use only two grids across the simulation.

As an example, let us assume that time $t-1$ is going to be contained in vector $A^1$ and time $t$ in $a^2$. Then the above requirement could be obtained by switching the two matrixes as schematized as follows:
  • Create A1and A2,
  • Set A1 to initial conditions
  • For any t
  • A2=f(A1)
  • cwComplex.switch(A1,A2)
The switch method exchanges the names, but does now write anything in memory of $A^1$ and $A^2$. It could be schematised as follows
  • cwComplex.switch(A1,A2)
  • B = A1;
  • A1=A2;
  • A2=B;
It is clear that, in this way, all the vectors are always filled by values, while, for some operation, cleaning them could be worth.

3 - At the core of the method os solution of the equation under scrutiny, there could usually be a Newton method, e.g. [3], Appendix A, equation A8. Any efficiency improvement for the solver is then reduce to improve the speed of this core, that, eventually can be parallelised.


[1] - Lund, E. T. (2014). Implementing High-Performance Delaunay Triangolation in Java. Master Thesis (A. Maus, Ed.).

[2] - Cordano, E., and R. Rigon (2013), A mass-conservative method for the integration of the two-dimensional groundwater (Boussinesq) equation, Water Resour. Res., 49, doi:10.1002/wrcr.20072.

[3] - O'Rourke, J., Computational geometry in C, Cambridge University Press, 2007

[4] - Berti, G. (2000, May 25). Generic Software Components for Scientific Computing. Ph.D. Thesis

Wednesday, January 24, 2018

My Questions for the 23 Hydrological Questions initiative

In November 2017 IAHS launched the new initiative to generate the 23 unsolved problems in Hydrology that would revolutionise research in the 21st century with the following YouTube video:

I probably have to formulate them differently. However at present my points are

1- What future for process based modelling beyond persistent dilettantism ? How can we converge towards new types of open models infrastructures for hydrology where the crowd can contribute, big institutions do not dominate, and reinventing the wheel will not be necessary anymore ?

2 - How to solve the energy budget, the carbon budget and the sediment budget together to constrain hydrologic models results ?

3 - Which new mathematics to choose for the hydrology of this century ? Does new hydrology (Earth System Science) needs new mathematics ?

4 - Will machine learning have a real role in hydrological modelling ?

5 - How can we really cope hydrological modeling with remote sensing measures ?

6 - How plants and grass work and interact with soil and atmosphere to produce evaporation ? Can we converge to unifying concepts that overcome present fragmented understanding ?

7 - How can we detect and measure spatial hydrological patterns ?

8- Does hydrology needs non-equilibrium thermodynamics or even a new type of thermodynamics ?

9 - How can we do hydrology science more open and replicable ?

10 - How dominant hydrological processes emerge and disappear across the scales. What tools are needed to follow the entanglement of processes ? Will we be finally able to cope with  feedbacks among processes?

Tuesday, January 23, 2018

My Hydrology Class 2018

Foreseen schedule



Grades/Voti Prova in Itinere
If you are interested, here you can find the statistics on your answers to the survey on the first part of the course.

Tuesday, January 9, 2018

Project: La gestione del sedimento nella realizzazione di servizi ecosistemici e nel controllo dei processi alluvionali.

The propoposal "La gestione del sedimento nella realizzazione di servizi ecosistemici e nel controllo dei processi alluvionali" was submitted yesterday for the call of MATTM.
The call is at this link (and it is for Geologists ?!). Actually the topics require some geology and a loto of hydrology and hydraulics. This is how the world goes.
The proposal can be found in this OSF site, called: "Gestione del Sedimento".  It is in Italian, but I will provide the translation of the following:

Abstract: The management of sediments for providing  ecosystem services and control alluvional processes. 

The project is about the management of sediments in mountains catchments with the quantitative determination of erosion and mass transport. The research is made looking at the applicatio of 2000/60 and 2007/60 EU directives.
In the project's first phase:
Hydrological analysis utilises a multi-model strategy based on GEOtop and GEOFRAME-NewAGE and other open-source models.
It is estimated the sediment availability and its connectivity to the river network, by using field surveys, data made available from previous research and models.
Transport of sediments will be will be obtained with obtained with biphasic models where water and sediment are treated separately.
Objective of the above phases is to localise the sources and the sediment residence time, to detect its interaction with anthropic works and infrastructures and determine how they (the sediments) can interact with the climatic forcings.

Objective of the application phase are:
  • the production of flooding hazard and risk maps;
  • the forecasting on the proximate and long period of the morphologic chages or river beds, under climate change simulated through “weather generators”.
  • The estimation of the impact of hydraulic works, also back in the years. 
In the present project we will use a connectivity index to estimate the connection between hillslope (source sediment areas) and some target catchments’ elements (the river network, specific streams, the outlet). Sediment source areas are, partially already available from existing databases (CNR IRPI, Provincia Autonoma di Trento, Regione Sicilia), from field surveys and from remote sensing. These data are partially already available from previous projects (ASI MORFEO, CLIMAWARE, AQUATERRA, GLOBAQUA) and by the local Institutions (Geological Service of Trento Province and Regione Sicilia).

Terrein analysis will be coupled with models of landslide triggering, able to account for climate and soil use variability (in space and time) as described as variation of:

  • intensity and frequency of precipitation,
  • precipitation from snow to rain,
  • phenology of vegetation cover

Two areas will be studied, one in the Alps and another in Apennines. The first is the Avisio torrent, and in in particolar the subcatchment closed at the Stramentizzo dam (Molina di Fiemme, TN), analysed with detailed especially in some specific parts.

The Apennine basin is the Giampilieri torrent in Messina Province.

References (that appears in the State-of-Art):

Badoux, A., Andres, N., and Turowski, J.,M., Damage costs due to bedload transport processes in Switzerland, Nat. Hazards Earth Syst. Sci., 14, 279-294, 2014.

Bertoldi et al., 2006 Bertoldi, G., Rigon, R., & Over, T. (2006). Impact of Watershed Geomorphic Characteristics on the Energy and Water Budgets. Journal of Hydrometeorology, 7(3), 389–403.

Berzi, D., Fraccarollo, L., Turbulence Locality and Granularlike Fluid Shear Viscosity in Collisional Suspensions (2015), Physical Review Letters, 115 (19), art. no. 194501. Comiti F., and

Farabegoli, E; Morandi, M.C.; Onorevoli G.; and Tonidandel, D.; Shallow landsliding susceptibility in a grass mantled alpine catchment (Duron valley, Dolomites, Italy), in preparation, 2018

Mao, L., Recent advances in the dynamics of steep channels, in Gravel-bed Rivers: Processes, Tools, Environments, John Wiley&Sons, Chichester, UK, 351-377, 2012.

Bracken, C., B. Rajagopalan, and E. Zagona (2014), A hidden Markov model combined with climate indices for multidecadal streamflow simulation, Water Resour. Res., 50, 7836–7846, doi:10.1002/2014WR015567.

Montgomery D.R., and Buffington J.M., Channel-reach morphology in mountain drainage basins. Geol. Soc. Am. Bull, v. 109, no. 5, pp. 596–611, 1997.

Renard, 1997 Renard, K.G., G.R. Foster, G.A. Weesies, D.K. McCool and D.C. Yoder. 1997. Predicting Soil Erosion by Water: A Guide to Conservation Planning with the Revised Universal Soil Loss Equation (RUSLE). Agr. Handbook No. 703. Washington, D.C.: USDA, Government Printing Office.

Rigon et al., 2006, Rigon, R., Bertoldi, G., Over, T. M., & Over, T. (2006). GEOtop: a distribute hydrological model with coupled water and energy budgets. Journal of Hydrometeorology, 7, 371–388.

Rosatti, G., Zorzi, N., Zugliani, D., Piffer, S. and Rizzi, A., Web Service ecosystem for high-quality, cost-effective debris-flow hazard assessment, 33-47, Env. Modelling & Software,  2018.

Smith, T.R., e F.P. Bretherton. «Stability and the conservation of mass in drainagebasin evolution.» Water Resource Research 8 (1972): 1506-1529. 

Sofia, G., Di Stefano, C, Ferro, V., Tarolli, P. (2017). Morphological similarity of channels: from hillslopes to alpine landscapes. Land Degradation & Development, 28, 1717–1728, doi:10.1002/esp.4081. 

Tarolli, P. (2016). Humans and the Earth’s surface, Earth Surface Processes and Landforms, 41, 2301–2304, doi:10.1002/esp.4059. 

Tucker et al., 2001 Tucker, G. E., Lancaster, S. T., Gasparini, N. M., & Bras, R. L. (2006). The Channel-Hillslope Integrated Landscape Development Model (CHILD), 1–32.

Wainwright, J., A. J. Parsons, J. R. Cooper, P. Gao, J. A. Gillies, L. Mao, J. D. Orford, and P. G. Knight (2015), The concept of transport capacity in geomorphology, Rev. Geophys., 53, 1155–1202, doi:10.1002/2014RG000474.

Saturday, January 6, 2018

Miles Traer - It is time for superheores to be environmentally concerned ;-)

I could not do it, to go to New Orleans Fall Meeting this year (but we had a couple of presentations). 
Among others, ut came to my attention a funny session entitled: PA13C Science and Sci-Fi: Using Real Science to Explore Fictional Worlds Posters, with which is nice to begin the series od 2018 posts.

The argument is made to attract attention on Climate Change and Earth Sciences, and have some fun in doing it (see here the Washington Post report)

A couple of poster of the session are available: the first one by the Convener, Traer himself analizes the energy requirements of some superheroes and you can see  the poster in the figure above. His arguments remind me the history of the banned superheroes in The Incredibles

The science below is kind of weak because you have to do some violation of physics (at least the known one) since the beginning when you accept that they can exist (but see the celebrate Kakalios book which takes another route to is), and actually many concerns can be raised on calculation (Geoscientists are nerds too). 

A second poster of some interest is the Engelman and Chure’s one concerned about T-Rex and Godzilla.

Miles Traer’s blog is nice to visit too, either for the comics and the rest.