Friday, March 10, 2017

The tale of open source codes

Prologue

Why did I choose to produce with the people directly working with me (ph.D students, master students, postdocs) open source software ?

- because is good for science
- because I am paid by a public institution
- because it is a neutral conditions that can serve the rights of all the participants (in particular mine of freely use and modify the software at my will an defend myself from who, people or institution, would like to close the software, even against me). On the other side,  my intention is clearly that my projects serve as a seed for developments of my students (or others) who can freely use the products of my research and maintaining it alive beyond me and despite me. *

I use GPL (for its interpretation, see here) but many others licences could work.

A declaration

In this way, I think, I have the right to claim to be able to use or peruse the software outcomes from my group. I declare that I want to use “fair play” rules, but, it should be clear that these rules cannot extend to limit my research freedom. People who claim the participation to papers where they give no contribution, except having producing the code that we produced together, have wrong arguments. People who claim to be involved in projects or researchers, without any other reason that I want to use the software they contribute (under GPL), have wrong arguments.
Neither they can claim that I have to warn and tell them personally what I am going to do in my research with the common code, for having their consent.
They would be right to protest, only if I would not enlighten their contribution on previous work properly.
My research for my own belief is actually very public and its evolution too. It cn be found at the abouthydrology blog. My core research is shared with my teamwork. This includes just the people of whom I have direct responsibility for age and rule (Master students, Ph.D. students and postdocs) and whom I sustain with funding, my own time and ideas.

With all others, including my masters, and my former students, colleagues, friends, women and men that like my research topics and achievements, and me, I can have collaborations. This means that we can share part of our views, beliefs, discussions, fightings, friendship, papers, parts of code. However our own agendas, in this imperfect world, do not coincide, and if they do, this happens for an incredibly short time. It seems it is a declaration of distance, but it is just consciousness of how life works, and the first step to start an effective and respectful collaboration.

Q&A

Can my students refuse to develop OS software ?
No, as soon as it is the product of common intellectual efforts in which they, maybe, write the code, but I will say what to write.

Do I start collaborations in which not OS code can be developed ?
Never say never. However there should be very strong reasons because, from my side, I to support this. Certainly in projects there could be partners that develop non open source software, but this falls in the responsibilities of who gives the financial support.

Is the requirement of open sourceness enough ?
No, it isn't. Open Sourceness is useless if not followed by good practices of using open repositories and collaborative modalities of action.

Can my students refuse to learn these practices ?
For the common work no. I am not responsible for the rest. I tend to fully book their time, though.

Do I start collaborations where these practices are not followed?
I would prefer not, but I do. Certainly collaborations can be at different levels and rarely they are about co-producing software. I would not participate to joint projects where I put ideas and expertise and others write closed codes, unless they pay me or my group a lot. Really a lot. I can participate to projects where other subjects put their ideas, or ideas from literature, in their own closed code, and I put mine in OS codes. However, the situation I prefer would be a common production, as a community, of open source codes.

My own use cases

Here below I summarised (with quite large simplifications) my software history in order to further justify what I wrote above.

Professor means who put science, time, and money (as funds derived by projects). Student means who puts time and science. Companies means they put time and money and business related efforts. Agents, Subjects are generic actors of the play (they can be either students, professor or someone else). Community is the informal group that happened to be gathered around the projects and, eventually,  evolve them.

Case 0

Professor Z writes the initial library. On top of that A builds radiation budget. Student B writes surface water flows. Student C implements soil-atmosfere interactions. Students D writes vadose zone components. Student E writes snow treatment. Student F rewrites snow components, then rewrites most of the codes interacting with student G and student H. Student I writes codes for landslides triggering treatment. Student G writes a small but important portion of the a little but successful part of the freezing soil hydrology. Professor L hires F. F continues to rewrite parts. G start a huge operation of cleaning the code, moving it to C++, uploading it to an open repository. C comes back and starts to use the code in his research and occasionally hires H to do some ancillary work for treating data. In meantime G  has founded a company where the common code is the basis of the business. M company, initially hired by L, works on the code to refactor and enhance it. M works collaboratively with G and F. M to setup continuos integration. Student N starts to produce executables for the main operating systems and eventually on Cooker (fictional name). M embraces immediately this philosophy.


Case 1

Professor Z writes the initial library. Z writes more than fifty tools for terrain analysis. Student A (not the same as above) ports them to a major Open Source GIS. Z and A start the construction of a new GIS, say JG. Initially JG contains just the the terrain analysis tools and some simple hydrological model. They start to do schools for financing their project. This works for some years. Student B, in the meanwhile, has joined the crew and A & B funded the company AB. They live with schools, supports from a main project of Z and other resources (a main research projects). A cleans the tools' suite and inaugurates the name JGT for them. Z uses JGT in his classes.
Z, A, and B decided to join the development of UGIS. Some research projects supports them together with resources raised by the company AB on its own. Students C and students D write some further modules. UGIS  funding disappears, and UGIS slowly becomes an almost inactive project. AB brings JGT to an intermediate product ST. Z continues to USE JGT in his classes. AB finally joins the development of a new GIS, say GS.
(In the middle,  A adds new tools, AB wrote an Android app, Aapp, and expands its business. Aapp is not  related to JGT, but worth to mention). During the years A and B get a Ph.D. whose topics are related to the GIS work. Student E with a small effort brings back JGT also to a platform, OI that Z uses with his students.

Case 2

Thanks to an unexpected financial support from project 00, Professor Z hires a five students to build from the scratch a new modelling platform. For this new software enterprise, he and company AB (funded by his former students A and B) chooses the open source framework OI. He hires former student C, to help software developments and former student D and E for the general management of the project and data gathering, respectively. C works more on improving and enriching JGT (see case 1 above) which serves as a basis for the terrain analysis functional to modeling. A and B develop a full suite of model components (the new paradigm) for: temperature and rainfall interpolation, rainfall-runoff, evapotranspiration and various tools to visualise components' inputs and outputs. AB also designs and populates an SQL database that contains all the data of the projects. The projects 00 ends. The Institution that supported the project close it in a drawer.

With other financial support, former 00 project's tools are maintained in life. Open source framework OI is changed for open source framework OM with a notable reduction of code lines (but it is a huge code effort, indeed, almost entirely on AB shoulders). With embracing OM, also starts a research collaboration with professor U and W.

Student E comes into play. He realises that rainfall-runoff does not work well. AB company has to survive on its own and cannot give very much support (https://vimeo.com/144089061). E implements a new rainfall-runoff model. AB, however, hires F for a small project where he works on radiation. Eventually, E refactors F's work and highly expands it. E adds a new snow modelling component and does/refactors evapotranspiration. In doing this (pouring sweat and blood) he, however, has the guidelines of the open sources codes already written. E spends some periods at U and W. E also refactors and enhances the Kriging code. Eventually E graduates and starts his career as post-doc elsewhere. In the meanwhile he finalises his research in a series of papers.
Student G comes. He does not have programming skills, but quietly learns to use the components of E and produces some interesting papers where E is co-author.
A new student, H, comes into the game. She works first on radiation on top of E code, then she starts to implement tools for travel time analysis and another rainfall-runoff component.
Student L comes. He  has a strong attitude for informatics. He brings-in new ways to manage projects. H and L implement the OpenOpenSoftware repository, and the site BeatifulGEO (names are fictional, but tools real). H refactors the old code,  and together with L (who, sort of, leads the learning process), introduces design patterns for increasing code reusability. L provides the trickery to have continuous integration on OpenOpenSoftware using GETIT and connects software deployment to ISTOREIT to store official versions of the components. Students M and N come in to stage and start to use the code. Professor Z (with the help of H) starts to use the components with his students for his classes. Student L evolves the original OM capabilities to allow for more flexibility and to increase the computational power of the models. H brings-in her models into the new infrastructure.

Discussion and Conclusions

The above is a summary (where, I say again, I simplified many passages) of my main software enterprises. Could have they been evolved all differently (and better) if I would not have applied an open source strategy ? Probably yes, but I should have constrained the students to a contract about the property of the software. In this way I would have deprived my students of parts of their own work.
At the same time, I could not have left the software simply to them. The histories themselves show that I built my own work and research on the software we develop, and being free to use it and modifying it was a necessity. If I have needed to ask permission to use it, to sign a contract or so with someone (for instance who gave financial support), all the development would have been much more difficult to pursue. The same apply for other Actors who invested time and resources in the software development just because it was open. They are usually singles or low budget companies that could not have afforded expenses related to other type of licenses and be subjected to limitation of the software use.
Other researchers used the model. Being it free and open source was a clearly an added value for them.

Keeping the software close and commercial, besides not having scientific reasons (which require the contrary), would have obliged me to change myself in a businessman and turned away from my science. There are several cases of scientists that turned to captains of companies. But, for instance Stephan Wolfram, a gifted scientist, did not give very much contributions to science after he devoted his energies to MathematicaMathematica (probably the best computing environment ever) itself is his main achievement (which is not depreciable), despite his own claims on "New kind of Science"s.
The overburden required for managing a commercial software is not for all and has its own dynamics, that personally I  could not bear.

The fact that my code is free and open source has allowed (not without difficulties) self-instruction of new incomers. Various Agents had the possibility to start experiments and investigate new directions of development. Nobody needed to ask for starting them. Asking is a process that would have decreased dramatically people or groups pro-activity.

The Community had benefits from this policy. In some cases, single Actors could have thought that their contribution was not recognised enough and did not give to them an advantage. Their argument is  flawed. All of them had advantages from the collaborative environment and nobody (me included) could have produced what s/he has achieved without building on the shoulder of others and other open source projects.

Forgetting the above, some feel that their work is not enough protected, and being all open source, newcomers can more easily jump in and take advantage of their work.
Uncertainty on future, a competitive society, the pure necessity to find something that pays you for a decent life incline even to bests to a moderate selfishness or a moderate parasitic behavior. They do not want to give back to the community, after having got a lot from it, and act defensively.

Well, this behavior is absolutely possible if their developments do not use the original code that was produced as GPL. In particular, the components strategy used in project 2 above allows for building on top of the open source material new, undisclosed material, that anyone can use for his/her own profit, with a non open license.
I have to warn, however, that if the moderate parasitism grows too much, enthusiasm that is always necessary decreases,  the projects die, the source of benefits disappears and the community falls.

I would say that a mild parasitism is functional to the community if it is necessary to sustain the collaborative Subjects, and if eventually the Subjects give something back to the community. Parasitic Subjects themselves act in favour of the community by spreading and advertising the products, and sooner or later this will be bring benefits back (so do not blame them, they are, in any case, part of the stream).

Some Subjects actually wants an opaque management of the GPL philosophy in which people maintain an informal (but they pretend recognized) property of the software that goes beyond the copyleft and the intellectual recognition of their contribution. This would imply, in their mind: preferential redirection of funds towards them; involvement in papers or conferences contributions that use their code; veto power towards actions of thirds.
These desiderata are based on misunderstandings. It is clear that they will be involved in papers, conference, and decision. Any (wo)man and community of good-will will apply this policy in their favour, if they do not grow too greedy. But these actions are not mandatory and not even necessary. GPL does not implies them.

To be more clear, especially in hydrology, the market out there treats our model and softwares as a fungible commodity, that is, the market tends to treat all the codes as equivalent or nearly so with no regard to who produced them. (I think this is wrong, highly wrong, when brought to an excess).
But also the internal market, inside the community, treats them as commodities, meaning that, it would be dysfunctional, it would cause a waste of precious time, but any contribution is perceived as a thing that can be replaced (this is part of the not said history of 0,1,and 2 projects). Everybody is important nobody is necessary.


The A. paradox

One common argument of reluctant open sourceres  is: “I did not have still tapped the results of my own work and I should share it (statement 1)”, or "if I share it, others will use it without me and I will have no personal gain(statement 2)".
The first danger can be overcome, by an appropriate delay of the disclosure of documentation and explanatory material (I would not argue that keeping industrial secrets is useless, in general, however). That is: it is matter of having strategies that prevent the negative cases. In our field, however, being everything perceived as a commodity (see above) nobody will care to use our model or achievement instead than another one that gives what is (wrongly perceived) as similar, especially if our code is not known. Being open source with proper support actions helps model spreading.
Besides, looking at my histories (see also here), software changes fast and is, by no means, immutable. Histories 0,1, and 2 are signed by change. So the advantage one has with a new code in hands is ephemeral. In my own estimates you have just a a year of advantage for small codes, and a few years of advantage with a large and complex code. This small advantage, if you are smart, can be appropriately managed and used to produce new and more innovative code and so on. (Open sourceness is against stagnation).
Often, however, it is not the the fear of far away threats that makes problems, but the fear of close by Agents. Guy A fear that B in the group who came in after her/him, will get positions or funding with his/her work. I would say that this could happen but it is difficult. In a fair (not fear) competition A always wins over B, if the quality of B can just be attributed to codes that A developed. The real problem is when B is much better that A. But in that case, having A work for B is not important. B will get rewards instead than A almost always. For A, the best thing, in the medium range, is to collaborate with B.
What, finally I really call the A. paradox is in statement (2). If it is so easy to grab your work, then it would be equally easy to anyone to replicate it. Therefore your work is not giving to you any competing advantage, even if you keep it secret for a while. If it is not easy to grab, then, who wants to use it proficiently needs you. So you are the winner, not because you keep your code top secret, but because all the issues it solves require a complex expertise that only you, the author can have. So ….

Epilogue

Professor eventually Z disappears. Not because he dies (please do exorcisms), but because his role, in the growing group of people around projects has become more and more marginal. Subjects also acquired maturity and as well as the will to maintain the advantages that the work has produced with respect to competitors.
This passage requires that the initially informal community establish as a formal Community (they wrote here for Academics) with its rules, etiquette, and wise management. This, in turn, requires Subjects coordinate and share alike their views, plan together new developments, plan events to make the common work to grow. Balkanisation of the code (which GPL could allow) and internal conflicts (never avoidable, having the Subjects different agendas) should be managed appropriately, and this requires clear agreements, smart actions, good will, and wise arguments.
If the community grows, everybody would be safer, because cooperating is better than competing (see also coopetition).
A partial adoption of the Open Source strategy is instead very useless. Open source codes that are practically not available (as those that are open source but not freely downloadable) cannot grow a healthy community and, sooner or later, die.

* A final note

Actually even if in my intention is a project also for my students, not a few of my students do not deeply endorse it. Reasons for this can be, maybe found in their personal history, the chemistry of their bodies and minds, or something else, which is hidden to me. So far, I  overreacts feeling myself betrayed, when they dismiss in what I believe it is right. So, probably my attitude is not is not correct. Sons do whatever they want, and probably they are right to try to find their way. So I have to conclude that the above is MY dream, and I will not be upset anymore, if my academic sons search their own in a different way.

Friday, February 24, 2017

Scale & Hydrology in 2020

This is the second lecture given at Potenza. The topic Salvatore Manfreda (GS) gave me is about how to move from one scale to another in Hydrological Modeling. I started from distribute modelling. My point there is that, for some tasks, distributed modelling can be scaled up to millions of square kilometers, and so,  upscaling theory is, in principle, not necessary.
But clearly this is a provocative statement I made just for pushing away some misconceptions.
Then I passed to consider other ways to upscale problems. Through simplifications, integration, heuristic thinking.  Eventually I gave a sight to "theories of all" that were so popular the last dacades and still remain possibilities and ideas to explore. By clicking on the figure you go on the presentation.

Wednesday, February 22, 2017

JGrass-NewAGE: the first Potenza lecture

This is the presentation of JGrass-NewAGE structure and achievements. A lot of posts were dedicated to it. But there is always space for new perspectives and details, since it is a work in progress where talented students of mine put all of their efforts.
JGrass-NewAGE has grown to a stable and operational set of OMS components, documented in the GEOframe blog. We also developed good practice for software design and traceability of our efforts meanwhile that could be interesting to know.
Aficionados will recognize (by clicking on the figure) that the presentation contains various topics already largely spread in other posts. However, there are a few small little things that could be interesting. Or, BTW, the arrangement given here to the matter, can clarify some choices that could have been seen obscure in other occasions (the slides are in Italian but contain link to other material and papers in English).

Monday, February 13, 2017

GRAL

Times ago we had the Gruppo Italiano delle Catastrofi Idrogeologiche (GNDC, Italian Group of Hydrological and Geological Hazards). But as the site testifies it languished. When Civil Protection beaome more dominant, not maybe the knowledge, but certainly the funding went in other directions (or was it just the natural fate of all things ?), and the all the initiatives stopped. The discussion never slept, and, BTW the Italian Hydrology is much stronger now than used to be.
So it is now the time for a new scientific initiative, with renovate objectives, to fill the gap between research and practice in defending our beautiful country from flooding. This is Gruppo Alluvioni. If they're roses they'll bloom (Time will tell).

Monday, February 6, 2017

Hydrology 2017

This year I decided to introduce strong news in my Hydrology course.  Not only a change of topics, but also a change of perspective. I increased widely the hours in the lab (up to 60%) of the class, and I arranged the lectures in a way that they could be followed by a three hour laboratory. Almost no lecture will be without numerical experiments. Another innovation is the use of Python instead of R.
I made this because of the large endorsement Python had among hydrologist and because:

  •  its object oriented structure is much more firm than the R one. 
  •  Besides, Python seems to be easy to learn by engineering students. 
  • Some of my colleagues seem to agree to converge toward the use of Python in their classes
R remains the first choice to do statistics. However, we have limited time. The class is 60 hours, and the material to convey a lot.
Here it is the foreseen schedule of the class:
Corso di Idrologia 2017

Legend: T - Theoretical lecture  - L - Laboratory class (this can include theoretical parts, but mostly students will exercise with tools)
  1. T - Introduction to the class
  2. T - A terrain analysis  primer. 
  3. L - Introduction to QGIS. Introduction to the JGrasstools in OMS.
  4. T - A little of Statistics and Probability. 
  5. L -  Delineation of catchments' characteristics with JGrasstools and QGIS.
  6. T - Precipitations. Mechanisms  of formation of precipitation. Ground based statistics. Extreme precipitations. 
  7. L - Intro to Python - Loading/reading files. Time series and their visualisation. (See Notebook 0 an 1 here.)
  8. T - Extreme precipitation statistics (parameters' estimation)
  9. L - Estimation of extreme distributions parameters. (See Notebook 2 to 5 here.)
  10. T -  Radiation
  11. L - Estimation of shortwave and longwave radiation in a catchment. 
  12. T - Spatial interpolation - Some concepts about the spatial representation of hydrological quantities. Inverse distance weighting. Ordinary Kriging. Detrended Kriging. 
  13. L - Practical spatial interpolation of rainfall and temperature.  
  14. T - Water in soils. - Darcy-Buckhingham law- Soil water retention curves and hydraulic conductivity. 
  15. L - Numerical experiments on soil water retention curves and hydraulic conductivity.
  16. T -  Richards equation and its extensions.
  17. L - Simulation of infiltration with the Richards equation (1d)
  18. T - Water movements in a hillslope and runoff generation. 
  19. L - Runoff estimation at hillslope scale.
  20. T - Elements of theory of evaporation from water and soils - Dalton. Penman-Monteith. Priestley-Taylor
  21. L - Estimation of potential evapotranspiration with Penman-Monteith and Prietley-Taylor.
  22. T - Vegetation role in the hydrological cycle and transpiration.
  23. L - Estimation of transpiration at catchment scale.
  24. T -  Snow. Snow water and energy budgets. 
  25. L - Degree-Day/Regina Hock's models of snow budget
  26. T - On the impact of climate change on the hydrological cycle

Saturday, February 4, 2017

Water supply systems and Stormwater management infrastructures 2017

Work in progress !!! Starts 02/27

This year I decide to renovate the teaching of my class of "Hydraulic Constructions".  Usually, under this name, one thinks to dams, levees, or other infrastructures. In fact, what I will  teach is how to design a water supply system for a city or for a city district, and how to design the infrastructures for storm water management.

This the foreseen schedule of the course. L Means a laboratory class, where the students are asked to calculate, think or project something. Actually it will be that I will do stuff for them, introducing some tools and asking them to repeat and complete the task on their dataset. Tentatively, it will be a "learning by doing approach" which I used also the last years but to a minor extent. 


I have 60 hours in total over thirteen weeks. So the schedule could be the following one

Storm waters
  1. T - Introductory Class
  2. T - Statistical properties of ground precipitations. Mechanisms  of formation of precipitation. Ground based statistics. Extreme precipitations.  
  3. L - Explorative data analysis. Investigating data with Python (or R).  
  4. T - Extreme precipitations. Around the concept of return period. Extreme distributions. 
  5. L - Estimation of Extreme distributions with Python (or R)
  6. T - Element for the design of storm water management infrastructures.  
  7. L - Short introduction to GIS for representing urban infrastructures. 
  8. T - Urban flood wave: a primer. 
  9. L - Introduction to EPA SWMM
  10. T - Designing a sewer system with a small synthesis of  pipes hydraulics. 
  11. L - Designing some part of a sewer network with SWMM and Python. 
  12. T - Pumping stormwaters.
  13. L - Discussion and analysis of students projects 

Clean water supply - Aqueducts
  1. T - Aqueducts in 2020
  2. L - Introduction to EPANET and related GIS
  3. T - Introduction to intakes  for water supply
  4. L - Some hydraulic infrastructure for aqueducts
  5. T - External aqueducts
  6. L - Water buildings.  EPANET
  7. T - Aqueducts' distribution networks - Theory and numerics
  8. L - Design and verification of distribution networks with EPANET - I 
  9. T - Houses' infrastructures
  10. L - Design and verification of distribution networks with EPANET - II

Tools

During the class I will introduce sever tools for calculations. 
  • Python - Python is a modern programming languages. It will be used for data treatment, estimation of the idf curves of precipitation, some hydraulic calculation and data visualisation. I will use Python mostly as a scripting language to bind and using existing tools. 
  • SWMM - Is an acronym for Storm Water Management System. Essentially it is a model for the estimation of runoff adjusted to Urban environment. I do not endorse very much its hydrology. However, it is the most used tools by colleagues who cares about storm water management, and I adopt it. It is not a tool for designing storm water networks, and therefore, some more work should be done with Python to fill the gaps.
  • EPANET Is the tool developed by EPA to estimate water distribution networks. 
  • LaTeX: il sistema per la scrittura e la composizione di testi matematici ed ingegneristici. Il testo di Lorenzo Pantieri e Tommaso Gordini è un piccolo gioiello