Monday, January 21, 2013

PostgreSQL your data

Science is matter of hypothesis and data. Hypotheses becomes formal models and then you have to  acquire data to prove  (a big word indeed) them at the feeble light of statistics. At the beginning you start to colelct data everywhere in your hard disk (I assume that the data were digitised). After a few months you are submerged by them. You thrown them away and restart it from the beginning again. 
Fortunately some institution store the data in databases and the reboot is relatively easy. However, this cover the primary data sets, and does not cover the data that yourself produce by  running your models and doing your inferences.
So, sooner or later, you have to face the reality that you should store your stuff in a more ordered way, and build your own database.  This opens various questions. It is really necessary to use a database software (C'mon learning another tool!) ? Obviously not: a database, in its general meaning can be just an ordered set of data. So you can just use your filesystem for it (I say it: but I do not really believe it). However, then you have to remind where the data are, and use the search utility of your operating system to find what you are searching (assumed that you documented every step you made in a searchable way).
Databases helps to do that and often use a query language (usually SQL) that helps to find and select the data you need again and again.  So, at a certain moment, one has to take seriously the hypothesis to use a database.
Nowadays there exist many free and open source database solutions (besides, obviously, to the commercial solutions, Oracle's, IBM's and others). Among the most diffuse I cite MySQLPostgresSQL and H2. Each one is a valid choice, with different characteristics.
In the last years we focused our attention on PostgreSQL for its completeness and for having been the first to include the way to manage geographic (geometric data) as shapefile^*. This is done actually by a plugin, called PostGIS, developed by the same Refraction guys who also promote uDig.
Alban De Lavenne, a Ph.D. students from Rennes Agricampus, who spent a few months among us, gave a talk about the use he does of PostgreSQL for supporting his research. His presentation is, as usual on slideshare (I am working to provide the data to run his examples).

The first step is certainly to install PostGIS. The first time I (am a Mac guy and) used the Kyngofchaos instructions  for installing Postgress. However, I noticed that nowadays there are various other possibilities, supported in the main PostgreSQL page.

Alban instructions and suggestions follow the installations and cover some typical hydrological problems.  For a complete understanding, certainly the Tutorial at PostgreSQL site can help.  Around the web, one can also find other video tutorials, as this one provided by David Fetter, or this comprehensive set  on ITunesU screencasts by Selena Deckelmann and others.

Obviously I am open to any contribution to improve this post.

^* - Recently PostgreSQl/PostGIS acquired the capability to store and manage "raster data" and images, which makes it even more appealing.

No comments:

Post a Comment