Friday, February 15, 2013

The mononota song as a paradigm for writing a scientific paper

This year at Sanremo festival (which I do no follow since many years) it happened that a particular song attracted my attention (because it is actually very tasty): the mononota song. It is from Elio e le Storie Tese. Obviously only the Italians can truly appreciate because the crazy and ironic text is integral part of the whole.

I thought it has a lot of structural similarities with a good scientific paper.  There is a simple statement to show (demonstrate): one can build a nice song using a single note (mono nota). The text of the song contain a literary review (Rossini^1 did it, Bob Dylan^2 too; Tintarella di Luna^3 again) where examples of wrong applications are presented (Jobim^4 implemented a samba where, however, it moved away from the single note: "he did not have the b***s they say"). The musics, besides the text is full of citations.
 It shows various examples of how the statement can be violated.  EelST also demonstrated the possibility of doing it, showing various possible declination of the statement, whistling it, changing rhythms in a way that reminded me certain Frank Zappa compositions, changing the tone, and so on.  The song has also a conclusion were it is remarked that the fidelity to the initial statement was respected, except for the very last note.

Naturally the result is good not for the fact that it follows a schema, but because the song  is , not necessarily the best ever, and not for all tastes, but overall enjoyable. The same applies also to any good paper that needs also to be possibly pleasant to read^5. Eventually the song is likely also to remind a set of other beautiful songs which is also part of the pleasure.

... Just an example.

References

^1 Adieux à la vie (G.Rossini)
^2 Bob Dylan's Subterranean Homesick Blues, I think is mononota. They say that actually many other blues sang by Dylan follow the same type of intonation, for instance  Levee's gonna break which is kind of pertinent with this blog
^3 Mina
^4 Samba de uma nota so (N.Mendonca / A.C.Jobim)
* See also the nice comment by Cesare Picco in his Video (unfortunately in Italian)
^5 I know also papers that are certainly not well written, and not formally enjoyable but that nevertheless eventually convey a lot of information and knowledge: so at the ends the contents are also important by themselves
** Finally in a comment to a The Rolling Stones' song Doom and Gloom  I also got this: "... So do a thousand punk and hip-hop songs; so did Louis Armstrong in his solo on the seminal recording of “West End Blues”; so did Cole Porter in the verse to “Night and Day”; so did Harold Arlen in the refrain to “Come Rain or Come Shine”; and so has Bob Dylan in dozens and dozens of songs since “Subterranean Homesick Blues. Jagger, like both Armstrong and Dylan, is an interpretive artist with the skill and audacity it takes to extract a scale’s worth of colors in just one note. "

Thursday, February 7, 2013

About doing a Ph.D.

Doing a Ph.D. is a totally absorbing activity, but it can be one of the most exciting periods of your life. However, I realize that students do not have usually the idea of what it is about.
Actually I put a brief post times ago about what is the core of Ph.D. studies, which is well summarised by the  picture below:
The Figure and the idea is by Matt Might whose blog is a nice reading experience (here the full story).
Recently, I discover that my colleague Davide Geneletti usually provides his students with a set of links of useful readings.  In my best tradition of robber I post them here below with some addition, since I found them  quite informative. They comes from authors from various disciplines  and roles, but their contents are of quite general interest. So here it is the list:



Last but not least he suggest the reading  of the book which I found amusing:

that more or less complete the picture. I would not avoid also to give a look to these "slides offering advice that is wickedly and memorably to the point"

The web is full of other good link and you can certainly find your favorite web site or blog. A suggestion do not hesitate too much on this stuff: eventually you have a Ph.D to pursue ;-)

Friday, February 1, 2013

Paraglacial geomorphology

We often talk about landscape evolution (and I have a little history on the subject). However, we usually forget that from 65Ky and 12Ky of years ago a lot of our Earth was covered by a glaciation. The image below, robbed from a Mr. Kurt Werth presentation at "At North of Trento and South of Bolzano"(1,2,3) Meeting, illustrates the situation in the place where I live, the river Adige basin.
I do not know which is the precision of the map, but it illustrates clearly which was the geneal situation. Cause of this,  many of the geomorphic features we see nowadays where created by the glacier retreat and by subsequent land-sculpting. Alluvial fans (some hundreds of them) were formed later. Big rock avalanches (according also to isotopic measurements) crumbled down among 10Ky and 3Ky ago.
Therefore it is time that  alpine geomorphology (and modelling) put Paraglacial situations inside its horizon. Actual landslides and sediment production is strongly affected by what the glaciers left.

Reference
Ballantyne, C.K. - Paraglacial geomorphology, Quaternary Science Reviews 21 (2002) 1935–2017




Tuesday, January 29, 2013

The law of small numbers

I did know the law of large numbers (and its violations) but I never reflected about the law of small numbers.

You can learn about following this link. It is mostly about Poisson distribution which is, indeed ubiquitous also in Hydrology. So the reading of this R-related post is certainly interesting and useful also for us.


No code No paper

This is entirely from: Simply Statistics » R, and I completely agree with it. It applies the very same way to hydrological literature.

"I think it has been beat to death that the incentives in academia lean heavily toward producing papers and less toward producing/maintaining software. There are people that are way, way more knowledgeable than me about building and maintaining software. For example, Titus Brown hit a lot of the key issues in his interview. The open source community is also filled with advocates and researchers who know way more about this than I do.


This post is more about my views on changing the perspective of code/software in the data analysis community. I have been frustrated often with statisticians and computer scientists who write papers where they develop new methods and seem to demonstrate that those methods blow away all their competitors. But then no software is available to actually test and see if that is true. Even worse, sometimes I just want to use their method to solve a problem in our pipeline, but I have to code it from scratch!

I have also had several cases where I emailed the authors for their software and they said it “wasn’t fit for distribution” or they “don’t have code” or the “code can only be run on our machines”. I totally understand the first and last, my code isn’t always pretty (I have zero formal training in computer science so messy code is actually the most likely scenario) but I always say, “I’ll take whatever you got and I’m willing to hack it out to make it work”. I often still am turned down.

So I have a new policy when evaluating CV’s of candidates for jobs, or when I’m reading a paper as a referee. If the paper is about a new statistical method or machine learning algorithm and there is no software available for that method – I simply mentally cross it off the CV. If I’m reading a data analysis and there isn’t code that reproduces their analysis – I mentally cross it off. In my mind, new methods/analyses without software are just vapor ware. Now, you’d definitely have to cross a few papers off my CV, based on this principle. I do that. But I’m trying really hard going forward to make sure nothing gets crossed off.

In a future post I’ll talk about the new issue I’m struggling with – maintaing all that software I’m creating."

Wednesday, January 23, 2013

Object Modelling System Resources

As the readers know from previous posts we  (I and my collaborators and students) use OMS3 (and we will use even more in the future embedding in it all of our modelling efforts) in collaboration with OMS3 developer in chief Olaf David and others.  Any involvement with OMS must  start with browsing the OMS3 web site and the information available there (for instance, but not only, this).
For using it, first  download the console,  then read the installation notes, and read console FAQ (well we will provide a brief description of its use soon) which remain the main information about the tool.

However, during the BioMA summer school, Olaf, Jim Ascough, Jack Carlson and Giuseppe Formetta gave some further material, which finally you can find below.

Jgrasstools use OMS3 (even if a version older than 3.1) and one can find relevant information also browsing their site. 

Other examples of using OMS3 console and scripting will follow soon.

Monday, January 21, 2013

PostgreSQL your data

Science is matter of hypothesis and data. Hypotheses becomes formal models and then you have to  acquire data to prove  (a big word indeed) them at the feeble light of statistics. At the beginning you start to colelct data everywhere in your hard disk (I assume that the data were digitised). After a few months you are submerged by them. You thrown them away and restart it from the beginning again. 
Fortunately some institution store the data in databases and the reboot is relatively easy. However, this cover the primary data sets, and does not cover the data that yourself produce by  running your models and doing your inferences.
So, sooner or later, you have to face the reality that you should store your stuff in a more ordered way, and build your own database.  This opens various questions. It is really necessary to use a database software (C'mon learning another tool!) ? Obviously not: a database, in its general meaning can be just an ordered set of data. So you can just use your filesystem for it (I say it: but I do not really believe it). However, then you have to remind where the data are, and use the search utility of your operating system to find what you are searching (assumed that you documented every step you made in a searchable way).
Databases helps to do that and often use a query language (usually SQL) that helps to find and select the data you need again and again.  So, at a certain moment, one has to take seriously the hypothesis to use a database.
Nowadays there exist many free and open source database solutions (besides, obviously, to the commercial solutions, Oracle's, IBM's and others). Among the most diffuse I cite MySQLPostgresSQL and H2. Each one is a valid choice, with different characteristics.
In the last years we focused our attention on PostgreSQL for its completeness and for having been the first to include the way to manage geographic (geometric data) as shapefile^*. This is done actually by a plugin, called PostGIS, developed by the same Refraction guys who also promote uDig.
Alban De Lavenne, a Ph.D. students from Rennes Agricampus, who spent a few months among us, gave a talk about the use he does of PostgreSQL for supporting his research. His presentation is, as usual on slideshare (I am working to provide the data to run his examples).

The first step is certainly to install PostGIS. The first time I (am a Mac guy and) used the Kyngofchaos instructions  for installing Postgress. However, I noticed that nowadays there are various other possibilities, supported in the main PostgreSQL page.

Alban instructions and suggestions follow the installations and cover some typical hydrological problems.  For a complete understanding, certainly the Tutorial at PostgreSQL site can help.  Around the web, one can also find other video tutorials, as this one provided by David Fetter, or this comprehensive set  on ITunesU screencasts by Selena Deckelmann and others.

Obviously I am open to any contribution to improve this post.

^* - Recently PostgreSQl/PostGIS acquired the capability to store and manage "raster data" and images, which makes it even more appealing.