Here below you can find some provisional notes, to be improved in the next days about our Deployment of the GEOframe system to the river Po for the basin Authority of the river Po.
Basin extraction
it's not a straightforward operation. In fact, it has never been done systematically all over Italy. It serves two opposing needs: to be objective and to align with the official grid provided by basin Authorities and Regions. The initial phase relies mainly on slope analysis and requires processing digital terrain data, which have become available only in recent years, especially if we refer to data produced with laser altimetry. The starting point is the Digital Elevation Models (DEMs) provided by the regions, which have been reprojected and standardized to correct reference systems. The initiation of the hydrographic networks is determined by an area threshold, while sub-basins, for the Po river, are delineated to have an average area of 10 km2. Procedures have been standardized in geographic information systems (GIS) over the last twenty years, but for this specific task, the Horton Machine library developed by Univrsity of Trento and HydroloGIS was used (Abera et al., 2016, serving as reference), incorporating some innovative elements: a parser to aggregate smaller basins into adjacent larger ones and addressing certain topological situations, especially those in flat areas for the subsequent use with GEOframe.
The tools was named GEOframeInputBuilder.
The extraction of lakes, particularly the large Lombard lakes and Lake Garda, required special attention and made the process less automated. Visual analysis reveals a differentiated geometry between mountain basins and lowland inter-basins, since the early years of fluvial geomorphology, but now objectively observed. The database, now available, enables statistical analysis of their geometry and topology, which previously relied on more qualitative cartographic analysis. The basin initiation with an area threshold is functional to the hydrological modelling but the reader should be aware that this topic is a very alive hydrological research topic, especially along with the work by Gianluca Botter and coworkers [insert CITATION].
The grid, as currently constructed, will be distributed for free use and will serve as a fundamental standard for further cartographic-digital and hydrological analyses and developments.
Photo by Luigi Ghirri |
Interpolation
Interpolation techniques have seen significant development between the 1980s and 90s [insert citation], but especially geostatistical methods have slowly made their way into the practice of digital analysis of meteorological forcings in the hydrological cycle. These require the definition of an estimation model of the correlation between measurements, known as a variogram, the robustness of which is fundamental to the reliability of the result.
The starting database is made up of measurements collected by ground stations from regional entities operating on the Po basin. These data have been analyzed, cleaned, and subsequently interpolated, currently on each centroid of the sub-basins identified in the first phase of the work. The interpolation was carried out for precipitation and temperatures on a daily scale, as a first step to produce hourly or sub-hourly interpolation at any point of a suitable one-kilometer grid.
The interpolation technique used was kriging with drift to account for orographic effects, especially on temperature. For the interpolation of the experimental variogram, a ?linear? Exponential? What else? model was used using the interpolators implemented in GEOframe.
The interpolation covered the entire period from 1990 to today, and the data are stored in CSV files in folders containing the data for each individual sub-basin.
It is clear that the procedure is a first approximation that will serve as the basis for future improvements. For example, the extension of the interpolation on the one-kilometer grid is one aspect. The next improvement could be to introduce high-resolution reanalysis data, combining geostatistical techniques with simulations of atmospheric circulation and any data coming from radar and satellite. Convergent research come from atmospheric physics and meteorology whose resolution is arrived at the scales useful for hydrology. Some work should be done for connecting better the two communities.
Setup:
GEOframe-NewAGE allows numerous configurations, as various components are available for the same phenomenon. For the basic configuration of each single Hydrologic Response Unit (HRU), the one already partially tested in [insert citation] called Embedded Reservoir Model (ERM) was chosen, the description of which can be found in the cited bibliography or in the linked videos. In summary, the ERM model is composed of a component for interception, one for snow, when present, a fast surface runoff separator based on the Hymod model, a nonlinear reservoir for the description of the root zone, and a second nonlinear reservoir for groundwater. Structurally, it is not much different from the HBV Model [insert citation]. In the basic configuration, flood propagation is neglected.
For the part of evapotranspiration, a simple Priestley-Taylor model was used, where however the radiation is provided through a rather accurate model [insert citations].
Each of these ERM models was then connected to the others through the Net3 infrastructure [insert citation] to form a directed acyclic graph in which each node represents an HRU. Potentially, each HRU can be characterized not only by its own topographic and meteorological data, but also by its own models.
In the basic configuration, however, the same model structure is usually used for all HRUs while the values of the model parameters are obtained by subsequent calibration with spatially differentiated parameters, if the available data allow it.
The potential setup variants are numerous, encompassing at least three options for snow modeling, three for evapotranspiration modeling, and an array of choices for reservoir modeling. The inclusion or exclusion of flow propagation modules, as well as the potential elimination or addition of compartments to be modeled and their diverse connections, further expand the possibilities. An overview of potential topological configurations is presented, for instance, in [insert MaRmot citation]. As even a novice reader can comprehend, the possible combinations multiply far beyond exponentially with the number of connected Hydrological Response Units (HRUs), which can, in turn, be linked in various manners. This complexity underscores why our comprehensive study on the Po River necessitates distribution and further refinement by others to enhance the precision of the results and better align them with local needs which cannot be gained by a single yet very productive team of people. In turn this open the question on how the re-analysis performed by external researchers or teams can be accepted and inserted back into the main project.
The analysis of multiple configurations is therefore entrusted to later phases of the project.
Calibration
Among the phases of a simulation, the calibration phase is the most time-consuming. It essentially consists of a large number of attempts to combine the model parameters to reproduce the measured data as faithfully as possible. The space of possible parameters is generally very large, even for a single simulation HRU. Therefore, the tools for calibration try to use intelligent strategies (including ML) to quickly guess which are the best parameter configurations.
The goodness of the simulated values' fit to the measured ones is usually quantified through some goodness of fit (GOF) algorithms. In our case, these are generally the KGE [insert citation] or the NS [insert citation]. An analysis of the various GOFs can be found in [insert citation], whose result can be further detailed, in the validation phase (see below), with additional indicators such as those presented, for example, in Addor et al., 2017. Another method of analysis, post-hoc of the goodness of the simulations, much more refined, is that presented in [insert Shima work citation]. The latter can also serve as a Bias corrector of the final result and it is going to be systematically applied to the results of the Po project.
From an algorithmic point of view, the calibration carried out in the project is based on the LUCA model [insert citation], which is a sophisticated implementation of SCEM-UA [insert citation], but a particle swarm [insert citation] could also be used. The calibration procedure follows some standards. Having a set of data to base the calibration on, the data are usually divided into two subsets, one used for calibration and another for the so-called validation phase. In the former, the problem of having available input and output data is solved, determining the parameters (or models) in a way similar to what is done in normal ML techniques (which, for this purpose, could probably be used profitably). In the latter, the performance of the model solution on data not used for parameter determination (and should be "independent" of the former) is evaluated. As already mentioned, in the validation phase, additional GOF indicators can be used to better discern the performance of the adopted solution.
A note concerns the word "validation". This is the term used but does not imply any ontological meaning about the nature of the truth described by the model, but only a practical meaning related to the reliability of the model in predicting a certain sequence of numbers.
The calibration/validation procedure can be implemented for a single variable, in the specific case, usually the flow in a section of the hydrographic network, or for more variables, for example, snow cover, soil water content, evapotranspiration, if these measurements are available. These latter possible measures, however, have a different character from the discharge as, while discharge is an aggregate variable, resulting from the concentration of the fallen water on the watershed area in a single point, the others remain variables distributed spatially, before being aggregated for the purposes of the watersheds budget, and therefore the methods of determining the goodness of reproduction of the measured data follow more articulated paths, if not more complex. The good thing is that GEOframe allows you to calibrate the various quantities separately, as each of them is modeled by "different components" that can be used separately from the overall model. The use case is performed throufh quite a lot of manual intervention so far and could be made more automatic.
In any case, if the target variables are more than one, we speak of multi-objective calibration, while if there are variables measured at multiple sites, we speak of multi-site calibration [insert citation].
I would like further to suggest an enhancement to our analysis and move from the daily to hourly time scale. This is particularly crucial for understanding processes within smaller watersheds, approximately on a 1km^2 scale, where many significant phenomena demonstrate sub-daily dynamics.
Simulation/ Analysis/ECP
The validation phase is already a simulation stage (with predetermined parameters) and represents the normal completion of operations in a production phase. This production phase is usually understood in the hydrological literature as hindcasting, that is, as functional to the understanding of past events for which an explanation is sought in a quantitative framework. This involves the use of more accurate analysis and indicators than those used in the calibration/validation phase which require a certain speed. One of these is the analysis through empirical conditional distributions, as illustrated in Azimi et al., 2023. These analyses can eventually lead to a rethinking of the setup and calibration/validation phases in order to obtain more accurate results. As shown in Azimi et al (2023, 2024), ECPs can also be used as bias correctors and improve the overall statistical performance of the model's results, at least if it shows a certain stationarity of temporal behavior, that is, if, for example, the effects attributable to global warming do not significantly impact the structure of the model (including its parameters). The determination of the "reliability" of the models is then a key concept in the development of digital twins of the hydrological system (Rigon et al, 2022).
Another matter, and much less frequented by hydrologists, is that of forecasting future events. These future events, obviously, have to do with the time series input to hydrological models and therefore require forecasts of precipitation, temperature, wind speed, and air humidity. It is known that the meteorological system (global and local) is affected by a lack of predictability (predictability) due to deterministic chaos effects [insert citation]. To date, weather predictions have reliability, with respect to weather categories, of a few days, they have the ability to predict atmospheric temperatures, but they are still very imprecise in determining the amount of precipitation, in essence, they can be used to predict the hydrological future but with questionable quantitative value. The theoretical reason for this debacle has been somewhat said, but there are also others, for example, the heterogeneity of ground conditions and the absence of a description of the soil-atmosphere feedbacks, both conditions not described in meteorological models. Hydrological forecasts can therefore only be of a statistical nature and produce scenarios [insert citation], which are not devoid of meaning and practical value. In this area between Hydrology and meteorology the search for a common ground is mandatory for any evolution. In GEOframe, however, the input data treatment/modelling is quite well separated from the hydrological computation and any new source of data can be easily (but not without person/months work) included.
Distribution of results and participatory science
A fundamental aspect, already widely discussed in Rigon et al., 2022, is to understand how the results of a model can be shared with various users, but also how the model, its setup (including a very expensive phase of DEM analysis, weather data interpolation, and calibration/validation) can be shared, saving other researchers time. GEOframe is built in such a way that this is possible (share ability is by design of the informatics) and some experiences have already been made in this sense. Some within the Trento working group, others with external research groups from the University of Milan (whose work is to be incorporated) and the Polytechnic of Turin, where the basic data and models already pre-digested by the University of Trento served for further developments and analysis on some parts of the Po basin already processed.
The question on how to preserve, make use of multiple contributions to code, data, simulation configurations and simulations, is still open though.
It should be clarified that the GEOframe system is not only a set of data and models, but also a library of analysis tools, especially developed through Python Notebooks and often documented through a series of slides and video lessons [add the links here] and Schools [https://abouthydrology.blogspot.com/2021/10/the-geoframe-schools-index.html]. Although this system can be improved and automated, it has allowed the group from the Polytechnic of Turin to dramatically shorten the modeling times of a series of basins in Piedmont and will allow, for the moment in the planning stage, the sharing of the setup and analysis of the Lombard area of the large Alpine lakes. Other analyses, developed in parallel on areas such as Friuli by the University of Udine, can easily be inserted into a possible national system that covers all of Italy, even though they were developed separately.
From the informatics point of view organizing all of this information through appropriate repositories would be mandatory in the future for an effcient use of the resources.
Conclusions
The GEOframe-Po project is more than just a collection of models; it envisions a comprehensive system that encompasses a variety of input and output datasets, model configurations, and the flexibility to operate on diverse platforms such as laptops, servers, and the cloud (leveraging the OMS/CSIP platform). The interfaces, as evidenced by the available self-instruction materials, can range from simple text-based designs to more sophisticated visual tools, including augmented reality devices.
The system is designed for continuous improvement and customization, with the ability to implement changes with minimal overhead. This was a strategic requirement pursued at various levels of the information technology aspect of the project [insert citations]. The current models can be broadly categorized as physically based, with the majority of the implementation comprising what is referred to in literature as "lumped" models. However, the system is designed to accommodate extensions to more distributed models, a possibility that has already been partially realized in some research lines of the Trento group.
The integration of machine learning techniques into the system is also possible [insert citation], even though they have not been utilized to date. The design of the GEOframe-Po project, therefore, represents a flexible, adaptable, and forward-thinking approach to modeling and data analysis.
No comments:
Post a Comment