## Monday, January 20, 2014

### Learn Statistics and Probability !

In one of my previous posts, I talked about the necessity to use (and therefore learn) statistics. As said in "What is Statistics?", anyone working with data is using statistics. This simplifies a lot the approach. Actually I arrived to statistics mostly from the teaching side. As a scientist, indeed, I often overlooked statistics.  Even if, in part of my research statistics appears, it would be a gross exaggeration to say that I approached it consciously (kind of take it for granted). Time to time I used (ripped off) methods to fit data, but I never had a systematical approach to it.
From the teaching side, I had instead to communicate some concepts to students, and thus I tried to be more methodical. My efforts of synthesis  produced my slides on probability and on statistics. In fact I solved the dualism between the two by saying that statistics has to do with reality, while probability is an axiomatic theory, which leave out the identification of what the probability itself is (De Finetti teaches). What people usually do is to search for a "model" among the one available in the "models market" (and therefore there is a phase where these models are "invented" analysed theoretically). The models in turns are distribution functions, regression functions, or whatever function(al) is necessary. In a second phase, you have then to see how the model of your choice adapts (fits!) to real life. In this adaptation you rely on statistics and statistical methods and on Bayes theorem. You can interpret the procedure according to a frequentist approach or through a Bayesian one. The latter procedure is becoming dominant in the field I frequent, but probably has to escape some inductive traps (i.e. the idea that just from induction one can get knowledge, while the scientific method has a hypothetical-deductive structure: look at Bayes here). In fact, there could be a third approach, where "the machines" find the model for you (see Breiman,  2001 and discussions therein ^1).

While the concept of distribution remains always under the hood of any approach (maybe less evident in Machine Learning) and can probably used as a “connecting principle”, in my ignorant perception of the matter the whole picture remains a little obscure.

The fact is that the principles of statistics (and of probability) are taught abstractly thinking to a A -> [0,1] application where A is some undefined set, and, most of the time, when thinking to applications we are using some subspace of R^n, the real numbers, into [0,1], but the object of investigation is a single value of a single quantity (let say a unique measure - do not charge here the word of mathematical significance - of a quantity). Using the concepts of hydrology, this would cover a zero-dimensional domain or (modelling).

In fact hydrology, and reality, are perceived as multidimensional. So important applications and important measures vary, for instance, in time.  This fact confuses your  ideas since in principle we have to analyse many quantities (as many as the instant of time)  so our application is not anymore, at the very general stage, the study of a single quantity but of many quantities. However,  either for practical or for physical reasons we often conceive these quantities as the manifestation of a sigle one (as the realisation of the same hidden probabilistic structure repeated many times, not necessarily with whatsoever relation between them).
For good or for bad, this is usually ignored theoretically while, in practice, it brings to a separate subfield, of which time series analysis is an example.  Also fitting one variable versus another (or others) falls in the same dimensional domain. It appears in books smoothly, as it would be natural, but it always let me with some discomfort. Only in the statistical book by von Storch and Zwiers, my dimensional distinction appears (especially looking at the book's index). In classical books, this passage is actually mediated by looking at random-walks, Markov chains, martingales and other similar topics. The key of this passage of dimensionality is the introduction of some correlation -in the common language sense, but also in the probabilistic sense - that ties one datum to another (the subsequent one).

A further, and consequent, passage happens when one moves to analyse not a line of events but, a space of events, with the further complication that multiple dimensions cannot even exploit the ordering of 1-D problems.

Nowadays patterns in two or multidimensional spaces, in fact, are discovered by machines (at least if I properly understand the concept of Machine Learning), with, again, some danger to fall in an excessive inductivism

Going to the point how to learn this stuff. I would start from a book on probability where the axiomatic structure of the field would be clear.  In my formation, this role was accomplished by the old classic, Feller’s (1968) book. (Let say the first two chapters, which are now reproduced in almost all the textbooks. Then the following chapter, but skipping the * sections. Possibly section XI concludes this first -0-D- part (Waiting times appears: but they are not actually related to “time” but to an “ensable” of trials). Looking for on-line resources, I also found the book by Grinstead that covers more or less the same topics.

The probabilistic part should be complemented, at this point by some statistics. Most  of the good statistical books simply redo all probability theory from a more practical point of view, before going to their specific, how to infer from data their distribution (if any exists),  but these can be skipped or just browsed, then.  As J. Franklin says “Mathematicians, pure and applied, think there is something weirdly different about statistics. they are right. It is not part of combinatorics or measure theory, but an alien science with its own modes of thinking. Inference is essential to it, so it is, as Jaynes says, more a form of (non-deductive) logic.”

Being prepared to controversies, a  couple of good books for learning statistics with climate and/or hydrologic orientation are the book by Hans von Storch and Francis W. Zwiers or Kottegoda and Rosso (expensive) books.  These have the advantage to  use hydro-meteorological datasets examples. In these, after the first chapters, the subsequent chapters follow a perspective where the goal is to  choose “a model, a distribution or process that is believed from tradition or intuition to be appropriate to the class of problems in question”, and subsequently “statistically validate” it using data to estimate the parameters of the model. “That picture, standardised by Fisher and Neyman in the 1930s, has proved in many ways  remarkably serviceable. It is especially reasonable where it is known that the data are generated by  a physical process that conforms to the model. As a gateway to these mysteries, the combinatorics of dice and coins are recommended; the energetic youth who invest  heavily in the calculation of relative frequencies will be inclined to protect their investment through faith in the frequentist philosophy that probabilities are all really relative frequencies.” (also from J. Franklin, 2005).

My favorite reading on many of these statistical computing techniques  are the Cosma Shalizi’s notes which certainly presents the topics in an original way that cannot be found elsewhere. Shalizi’s  notes, as well as Gareth et al (2013)  ones have also the advantage to use R as computational tool, and to present some modern topic like a chapter on Machine Learning.  Hastie et al., 2005 is instead an advanced lecture on the same topics. These books are actually more decisely oriented to statistical modelling, as well as is the Hyndman and Athanasopoulos (2013) free (and simple) on line book (also using R).

Kottegoda and Rosso book hosts also a chapter on Bayesian statistics, which is the other way to see statistical inference. A brief introduction to the Bayesian mistery is Edward Campbell’s brief technical report that can be found here. The possibly longest one, which present a different approach to probability, is the posthumous masterpiece by Jaynes (2003), which is probably a fundamental reading on the topic.

However, my personal understanding of the Bayesian methodology gained some consistency only after the reading of G. D’Agostini (2003) book. Actually D’Agostini can be defined as a Bayesian evangelist, but its arguments, even if some examples in high particle physics remain to me actually unclear, convinced me to a mild conversion.
As a matter of fact, my real understanding of the Bayesian approach is still poor. Not because I did not understand the theory, but because between the theory and its application there is a gap which I still do not have filled. (Practice it!)

Looking at all of these contributions, sum up to thousands of pages. Possibly many of these pages are repetitions of the same concepts. Sometimes from slightly different point of view. To the reader the choice of what to do.

A last note regards how to made calculations.  For doing it R is certainly a good choice, that some of the cited books’ authors already did, and the support for doing it really is large and growing.

Notes

^1 - The paper is also remarkable for sir David Cox (he also has his introductory and conceptual book) answer. Cox, besides being a prominent British statistician, has quite a carrier in hydrology, especially looking at his work on rainfall and eco-hydrology together with Ignacio Rodriguez-Iturbe.

References (with some additions to the text)

Berliner, L. M., & Royle, J. A. (1998). Bayesian Methods in the Atmospheric Sciences, 6, 1–17.

Breiman, L. (2001). Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199–231. doi:10.1214/ss/1009213726

Campbell, E. P. (2004). An Introduction to Physical-Statistical ModellingUsing Bayesian Methods (pp. 1–18).

Cox, D. R., & Donnelly, C. A. (2011). Principles of applied statistics. Cambridge University Press.

Durrett, R. (2010). Probability: theory and examples.

Feller, W. (2007). The fundamental limit theorems in Probability, 1–33.

Fienberg, S. E. (2014). What Is Statistics? Annual Review of Statistics and Its Application, 1(1), 1–9. doi:10.1146/annurev-statistics-022513-115703

Gelman, A. (2003). A Bayesian Formulation of Exploratory Data Analysis and Goodness‐of‐fit Testing*. International Statistical Review. doi:10.1111/j.1751-5823.2003.tb00203.x

Grinstead, C. M. and Snell JL (2007). Introduction to Probability (pp. 1–520).

Guttorp, P. (2014). Statistics and Climate. Annual Review of Statistics and Its Application, 1(1), 87–101. doi:10.1146/annurev-statistics-022513-115648

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning, 103. doi:10.1007/978-1-4614-7138-7

Kharin, S. (2008c, May 19). Statistical concepts in climate research - I. slides
Kharin, S. (2008b, May 19). Classical Hypothesis Testing. -II slides
Kharin, S. (2008a). Climate Change Detection and Attribution: Bayesian view, 1–35. III slides

Madigan, D., Stang, P. E., Berlin, J. A., Schuemie, M., Overhage, J. M., Suchard, M. A., et al. (2013). A Systematic Statistical Approach to Evaluating Evidence from Observational Studies. Annual Review of Statistics and Its Application, 1(1), 131125173259005. doi:10.1146/annurev-statistics-022513-115645

Shalizi, C. R. (2014). Advanced Data Analysis from an Elementary Point of View (pp. 1–584).

Storch, von, H., & Zwiers, (2003) F. W. (n.d.). Statistical analysis in climate research. Cambridge University Press

Zwiers, F. W., & Storch, von, H. (2004). On the role of statistics in climate research. International Journal of Climatology, 24(6), 665–680. doi:10.1002/joc.1027