Wednesday, August 11, 2021

Causal Inferences and times series

 In the last post,  In the last post, moved by the necessity to compare time series, I browsed literature and my library of papers to find solutions to my needs (essentially I tried to understand if two time series are related by a time lag). In the search I found other things and my literature grew beyond the original scope. One direction that actually I had frequented previously was the one of distinguishing causal connections beyond just correlations. In my previous searches, I has been fascinated by the work by Judea Pearl and part of the findings inherited from his work. The theory of Pearl has been expressed in various part, including the 2000 paper and some books, that you can find in the references. His teachings were directly absorbed by Hannart and Noveau, themselves good statisticians working in climatology, who produced some papers (2016, 2017) using Judea's theory and notation.


The idea can be generalized fro two to multiple time series, as Eichler (2013) actually shows. Eichler actually is know to have produced such analysis in 2003. A trend of more recent paper on the topic are represented by  Jacob Runge (GS)  work , who also have the merit to have implemented and shared his TiGraMITe package. He also has got a prestigious ERC research program on this topic called Causal Earth. On the concepts he wants to develop in the ERC, he gave talks and produced may interesting papers, among those one in Nature and a second on in Science affiliated Journal (see below)

Because we like to do calculation not just read of write abstractly, we can find relief in the Causality handbook that can be a viable (open source) way to put in practice some of the ideas seen in the previous papers. A final, honorable mention goes also the the San Lian (2014) paper.

References

Dahlhaus, Rainer, and Michael Eichler. 2003. “Causality and Graphical Models in Time Series Analysis.Oxford Statistical Science Series, 115–37.

Eichler, Michael. 2013. “Causal Inference with Multiple Time Series: Principles and Problems.Philosophical Transactions. Series A, Mathematical, Physical, and Engineering Sciences 371 (1997): 20110613.

Hannart, A., J. Pearl, F. E. L. Otto, P. Naveau, and M. Ghil. 2016. “Causal Counterfactual Theory for the Attribution of Weather and Climate-Related Events.” Bulletin of the American Meteorological Society 97 (1): 99–110.

Hannart, A., and P. Naveau. 2017. “Probabilities of Causation of Climate Change.arXiv.v, December, 1–54.

Pearl, Judea, 2000. “Models, Reasoning and Inference.Cambridge, UK: CambridgeUniversityPress.

Runge, Jakob, Sebastian Bathiany, Erik Bollt, Gustau Camps-Valls, Dim Coumou, Ethan Deyle, Clark Glymour, et al. 2019. “Inferring Causation from Time Series in Earth System Sciences.Nature Communications 10 (1): 2553.

Runge, Jakob, Peer Nowack, Marlene Kretschmer, Seth Flaxman, and Dino Sejdinovic. 2019. “Detecting and Quantifying Causal Associations in Large Nonlinear Time Series Datasets.Science Advances 5 (11): eaau4996.

San Liang, X. 2014. “Causality between Time Series.arXiv [stat.ME]. arXiv. http://arxiv.org/abs/1403.6496.

Relations between 2 time series (thinking to rainfall-runoff)

 In investigating hydrological quantities, one interesting issue is to understand if two time series are correlated and especially if the correlation comes with a lag time, and, in case which is this lag time. This is nothing different than in many other analysis and, in fact the tools developed are ubiquitous in science. Looking for the how to correlate rainfall and discharges I stumbled in this ready-made post, "Four ways to quantify synchrony between time series data" by Jin Hyun Cheong, PhD.

The added value of this post is that the tools described are also available as open source Python scripts embedded in Jupyter Notebook and therefore anybody can re-execute them easily and learn as they work. I believe that when you go to apply the notebook to your data set it will not be hassle-free. However it is a good starting point. Certainly also you'll have to dig a little in literature to get the sense of what you were doing but this is a great starting point for those who needs to cope with this type of analyses. Jin Hyun material is available on OSF. Please, if you use it, cite it.  

A second way to see their relation is to use the Kullback-Leibler mutual information, a concept derived form Information Theory (see also here) that you can find a little illustrated in the Veyrat-Charvillon and Standaert (2009) paper cited below. Here a notebook that teaches how to estimate it in Python using pyTOrch. Here a bottom-up calculation with standard Python.

The above time series analysis performed are quite interesting because they can also suggest new type of comparison between modelled and simulated time series if you start to get bored by the standard indicators of goodness of fit, like Kling-Gupta-Efficiency and Nash-Shutcliffe

If your main focus is the rainfall-runoff times series relationships, a recent paper to mention, is the one by Giani et al, below in References. But also the work of Serinaldi and Kilsby (2013) that seems quite complicate (boring or interesting? I still do not have read it) contains information. 

References

Giani, G., M. A. Rico‐Ramirez, and R. A. Woods. 2021. “A Practical, Objective, and Robust Technique to Directly Estimate Catchment Response Time.” Water Resources Research 57 (2). https://doi.org/10.1029/2020wr028201.  

Veyrat-Charvillon, Nicolas, and François-Xavier Standaert. 2009. “Mutual Information Analysis: How, When and Why?” In Cryptographic Hardware and Embedded Systems - CHES 2009, 429–43. Springer Berlin Heidelberg.

Serinaldi, Francesco, and Chris G. Kilsby. 2013. “The Intrinsic Dependence Structure of Peak, Volume, Duration, and Average Intensity of Hyetographs and Hydrographs.” Water Resources Research 49 (6): 3423–42.

Tuesday, August 10, 2021

How to learn (La)TeX

 If you want to know an interesting story, go and see what is TeX and why it was produced by Donald Knuth. It is a typesetting system with a language behind it, and the way most scientists who use mathematical formulas, write their paper (and equations). Actually, most of us use LaTeX the Leslie Lamport TeX, which is usually customized to obtain the desired layout by many journals. Native digital used to WYSIWYG can find strange the way it works but after a little practice, no one can really avoid to use it for formulas.



Assuming that I have convinced you (but my students SHOULD agree 😉 ) you have to learn it now. On the web there are many resources. Starting from the quickest,

Obviously there are several video tutorial available. The best thing for gettin them is that you Google "Latex Tutorial Video" by yourself. Any one for beginners can be found here:
Because TeX and Latex have their glorious history, there are several groups promoting them. The oldest one is the TeX user group, or, in brief, TUG

P.S. - Italians can also read the beautiful:

Friday, July 2, 2021

Chandigarh 2021-07-05 International Faculty Development Program

 I was pleased to be invited to talk at the 2021 Faculty Development Program of the Chandigarh (the beautiful) University in India.  I obviously talk about Water and Hydrology and the provisional title was: 

"Science and practice of river basin modelling in the next future. From Hydrology to Informatics and back on four acts"



Given the time I have, I decided to divided my presentation in four parts:

As usual I uploaded the slides of my presentation in advance and they will be available later. The talk summarizes other talk I gave during the years, and while getting technical in some parts is dedicated to a wider audience than hydrologists. 

Tuesday, June 15, 2021

How CO2 and H2O Flux Mesurements have contributed to our understanding of Global Change Biology

 Dennis Baldocchi (GS) is a myth for me, for his knowledge of the processes and issues related to the soil atmosphere interactions, for the wide range of materials he shared with the community and for his efforts in the FLUXNET. So it was a great pleasure when I heard that he accepted to give a talk to our class (held by Mirco Rodeghiero, GS) "Biosphere Atmosphere and Climate Interactions" of the Master in Environmental Meteorology. His talk was very dense and illuminating and, fortunately, he agreed with us to share the videos that are presented here below.


The talk is very dense of suggestion, rich of references and knowledge of the physics of the field and good food for thinking for all of us.  Please find below, the video of the talk and the discussion that followed. 

  • The  talk (Vimeo)
  • The Discussion (Vimeo)

Monday, June 7, 2021

Where do we stand with theory, tools and methods and where to go with applications

AS the followers know, time to time I have to summarize what we are doing and where we stand. Clearly for me is a necessity to frame what we are doing and look forward without wasting too much energies in unsustainable directions. Please also see the complementary discussion  here


 HDSys (Hydrological Dynamical System studies

A further theoretical effort has to be made to get the main characters of the structure of such types of models pushing as far as possible the use of tools commons with other branches of science, like environmental science, chemical reactions, cell biology, theory of populations, system and control science, spreading of diseases. The rest of the effort has to be done in generalizing the implementation of solvers, in such a way they can be applied seamlessly where they can. The history about how spatially distributed systems can be described as compartmental systems is still to be written but it is the first time in hydrology history we can massively try alternative modelling solutions.

Evapotranspiration and ecohydrology

Prospero model has to be refactored and enhanced with the introduction of the Ball-Berry parameterization. The Rosalia (plant's hydraulic model) model pursued and finally the LysGEO 1D finished. LysGEO 2D should be derived as soon as WHETGEO 2D is stable. Evapotranspiration has to be connected with travel times for the needs of the WATZON project. Simple models of carbon production have to implemented (see HDSys) to interoperate

Soil and Critical Zone

WHETGEO 1D has been made. It needs to be cleaned up and made easier. Systematic tests should be performed against experimental data and its success is tied to this. Future developments should aim to introduce preferential flow in soil by using the clone scheme. Similarly, it is possible to remove the constraint of thermal equilibrium between soil matrix and soil water. The coupling of the water and energy budget should be completed by considering the phase change of water, as well as the modelling of snowpack at the soil surface. Thanks to its robust numeric WHETGEO 1D can be used to investigate the soil celerity deeply. Interesting can be the of WHETGEO 1D with the concept of laterally coupled tiles in which lateral fluxes between interactive tiles are defined through some transport laws. This represents an intermediate solution between WHETGEO 1D and WHETGEO 2D, that we can call WHETGEO 1.5D.

WHETGEO 2D must be cleaned up and brought to the same operability that 1D has. The first step concerns the completion of the coupling between surface and subsurface flows. This is necessary to properly simulate run-off generation processes, and numerically speaking to properly define the boundary condition at the soil surface. A key aspect in WHETGEO 2D is then related to the optimization of the computational cost and in tandem with an efficient strategy to save data. This will require abandoning the netCDF-3 in favor of a more efficient file format.

The extension to 3D simply must be implemented, but we are on the ball.

…all that remains is to play with WHETGEO.

Travel Time and Tracers modelling

The basic theoretical framework has been clarified. Now we have to push it to tracers and systematically apply it to all the the cases. Tracers description has to be added to the whole set of models.

Information technology

OMS has to be cleaned up and brought to work with Java 17 and its building possible with Maven or Gradle. Net3 and intrinsic parallelization has to be improved. GEOframe bottlenecks eliminated for its greater usability. The Github site made more fancy and usable. Maybe all of it should work commanded from Jupyter Notebook. R and Python should be used more extensively for data IO. Classes for teaching programming GEOframe  has to be finally completed.

Applications/Deployments

Po project, Nera work, Ressi modelling, various hillslope experiments modelling have to be brought to an end. Being concentrated on theory, methodology and software, just a little time remains for applications. This does mean that people working with me has to carry applications on their own shoulders, on a larger extent than I do. For what regards me, a decadal objective could be in bringing our tools to work progressively on Po, Adige, Italy, the whole Mediterranean area with an eye to both the water and energy budget (and, forthcoming, the carbon budget) with unprecedented detail and precision.
From Allam, Antoine, Roger Moussa, Wajdi Najem, and Claude Bocquillon. 2020. “Chapter 1 - Hydrological Cycle, Mediterranean Basins Hydrology.” In Water Resources in the Mediterranean Region, edited by Mehrez Zribi, Luca Brocca, Yves Tramblay, and François Molle, 1–21. Elsevier.



Overall keywords

Remote sensing - Machine Learning - Information theory- Data assimilation

Slowly but steadily remote sensing and machine learning have to be introduced among the GEOframe tools. We are going to pursue it within a few doctoral efforts. A deeper data assimilation into models is to be seen as mandatory, for the good of modellers and experimenters. New techniques of analysis of data and models outputs have to be deployed ... and so on. 


Saturday, June 5, 2021

Theoretical and Numerical Tools for Studying the Critical Zone from Plots to Catchments

 What a valuable work is the thesis by Niccolò Tubini, here presented in its draft.  It covers works in hydrology of the critical zone, numerics, programming, software engineering, open science methods. Having a so wide horizon of interest it could  not be easy to grasp in all of these details, but it is well written and, we hope inspiring. As the Author, Ph.D. candidate says: "In the following we suggest that studying the CZ requires tools that are not yet readily available to researchers; then we propose one of our own. These tools should be flexible enough to allow the quick embedding of advancements in science"

Who wants to access the draft, can click on the figure of the Thesis first page below.

The Thesis included the work present in the two submitted paper by Niccolò,  on The Cryosphere and a second one presented in GMDD regarding WHETGEO-1D.  Whilst a thesis being considered kind of a definitive work, this one remains very much a work in progress with the extensions of the codes foreseen to arrive soon and whose informatics has already been implemented. All the tools developed during the Thesis are open source and freely available both as executable and source codes on Github.  Any comment or suggestion to the  Thesis as well as to the papers are welcomed. 

The Video of the defense is here.