Friday, March 10, 2017

The tale of open source codes


Why did I choose to produce with the people directly working with me (ph.D students, master students, postdocs) open source software ?

- because is good for science
- because I am paid by a public institution
- because it is a neutral conditions that can serve the rights of all the participants (in particular mine of freely use and modify the software at my will an defend myself from who, people or institution, would like to close the software, even against me). On the other side,  my intention is clearly that my projects serve as a seed for developments of my students (or others) who can freely use the products of my research and maintaining it alive beyond me and despite me. *

I use GPL (for its interpretation, see here) but many others licences could work.

A declaration

In this way, I think, I have the right to claim to be able to use or peruse the software outcomes from my group. I declare that I want to use “fair play” rules, but, it should be clear that these rules cannot extend to limit my research freedom. People who claim the participation to papers where they give no contribution, except having producing the code that we produced together, have wrong arguments. People who claim to be involved in projects or researchers, without any other reason that I want to use the software they contribute (under GPL), have wrong arguments.
Neither they can claim that I have to warn and tell them personally what I am going to do in my research with the common code, for having their consent.
They would be right to protest, only if I would not enlighten their contribution on previous work properly.
My research for my own belief is actually very public and its evolution too. It can be found at the abouthydrology blog. My core research is shared with my teamwork. This includes just the people of whom I have direct responsibility for age and rule (Master students, Ph.D. students and postdocs) and whom I sustain with funding, my own time and ideas.

With all others, including my masters, and my former students, colleagues, friends, women and men that like my research topics and achievements, and me, I can have collaborations. This means that we can share part of our views, beliefs, discussions, fightings, friendship, papers, parts of code. However our own agendas, in this imperfect world, do not coincide, and if they do, this happens for an incredibly short time. It seems it is a declaration of distance, but it is just consciousness of how life works, and the first step to start an effective and respectful collaboration.


Can my students refuse to develop OS software ?
No, as soon as it is the product of common intellectual efforts in which they, maybe, write the code, but I will say what to write.

Do I start collaborations in which not OS code can be developed ?
Never say never. However there should be very strong reasons because, from my side, I to support this. Certainly in projects there could be partners that develop non open source software, but this falls in the responsibilities of who gives the financial support.

Is the requirement of open sourceness enough ?
No, it isn't. Open Sourceness is useless if not followed by good practices of using open repositories and collaborative modalities of action.

Can my students refuse to learn these practices ?
For the common work no. I am not responsible for the rest. I tend to fully book their time, though.

Do I start collaborations where these practices are not followed?
I would prefer not, but I do. Certainly collaborations can be at different levels and rarely they are about co-producing software. I would not participate to joint projects where I put ideas and expertise and others write closed codes, unless they pay me or my group a lot. Really a lot. I can participate to projects where other subjects put their ideas, or ideas from literature, in their own closed code, and I put mine in OS codes. However, the situation I prefer would be a common production, as a community, of open source codes.

My own use cases

Here below I summarised (with quite large simplifications) my software history in order to further justify what I wrote above.

Professor means who puts science, time, and money (funds derived by projects). Student means who puts time and science. Companies means they put time and money and business related efforts. Agents, Subjects are generic actors of the play (they can be either students, professors or someone else). Community is the informal group that happened to gather around the projects and, eventually,  evolve them.

Case 0

Professor Z writes the initial library. On top of that A builds radiation budget. Student B writes surface water flows. Student C implements soil-atmosfere interactions. Students D writes vadose zone components. Student E writes snow treatment. Student F rewrites snow components, then rewrites most of the codes interacting with student G and student H. Student I writes codes for landslides triggering treatment. Student G writes a small but important portion of the a little but successful part of the freezing soil hydrology. Professor L hires F. F continues to rewrite parts. G start a huge operation of cleaning the code, moving it to C++, uploading it to an open repository. C comes back and starts to use the code in his research and occasionally hires H to do some ancillary work for treating data. In meantime G  has founded a company where the common code is the basis of the business. M company, initially hired by L, works on the code to refactor and enhance it. M works collaboratively with G and F. M to setup continuos integration. Student N starts to produce executables for the main operating systems and eventually on Cooker (fictional name). M embraces immediately this philosophy.

Case 1

Professor Z writes the initial library. Z writes more than fifty tools for terrain analysis. Student A (not the same as above) ports them to a major Open Source GIS. Z and A start the construction of a new GIS, say JG. Initially JG contains just the the terrain analysis tools and some simple hydrological model. They start to do schools for financing their project. This works for some years. Student B, in the meanwhile, has joined the crew and A & B funded the company AB. They live with schools, supports from a main project of Z and other resources (a main research projects). A cleans the tools' suite and inaugurates the name JGT for them. Z uses JGT in his classes.
Z, A, and B decided to join the development of UGIS. Some research projects supports them together with resources raised by the company AB on its own. Students C and students D write some further modules. UGIS  funding disappears, and UGIS slowly becomes an almost inactive project. AB brings JGT to an intermediate product ST. Z continues to USE JGT in his classes. AB finally joins the development of a new GIS, say GS.
(In the middle,  A adds new tools, AB wrote an Android app, Aapp, and expands its business. Aapp is not  related to JGT, but worth to mention). During the years A and B get a Ph.D. whose topics are related to the GIS work. Student E with a small effort brings back JGT also to a platform, OI that Z uses with his students.

Case 2

Thanks to an unexpected financial support from project 00, Professor Z hires a five students to build from the scratch a new modelling platform. For this new software enterprise, he and company AB (funded by his former students A and B) chooses the open source framework OI. He hires former student C, to help software developments and former student D and E for the general management of the project and data gathering, respectively. C works more on improving and enriching JGT (see case 1 above) which serves as a basis for the terrain analysis functional to modeling. A and B develop a full suite of model components (the new paradigm) for: temperature and rainfall interpolation, rainfall-runoff, evapotranspiration and various tools to visualise components' inputs and outputs. AB also designs and populates an SQL database that contains all the data of the projects. The projects 00 ends. The Institution that supported the project close it in a drawer.

With other financial support, former 00 project's tools are maintained in life. Open source framework OI is changed for open source framework OM with a notable reduction of code lines (but it is a huge code effort, indeed, almost entirely on AB shoulders). With embracing OM, also starts a research collaboration with professor U and W.

Student E comes into play. He realises that rainfall-runoff does not work well. AB company has to survive on its own and cannot give very much support ( E implements a new rainfall-runoff model. AB, however, hires F for a small project where he works on radiation. Eventually, E refactors F's work and highly expands it. E adds a new snow modelling component and does/refactors evapotranspiration. In doing this (pouring sweat and blood) he, however, has the guidelines of the open sources codes already written. E spends some periods at U and W. E also refactors and enhances the Kriging code. Eventually E graduates and starts his career as post-doc elsewhere. In the meanwhile he finalises his research in a series of papers.
Student G comes. He does not have programming skills, but quietly learns to use the components of E and produces some interesting papers where E is co-author.
A new student, H, comes into the game. She works first on radiation on top of E code, then she starts to implement tools for travel time analysis and another rainfall-runoff component.
Student L comes. He  has a strong attitude for informatics. He brings-in new ways to manage projects. H and L implement the OpenOpenSoftware repository, and the site BeatifulGEO (names are fictional, but tools real). H refactors the old code,  and together with L (who, sort of, leads the learning process), introduces design patterns for increasing code reusability. L provides the trickery to have continuous integration on OpenOpenSoftware using GETIT and connects software deployment to ISTOREIT to store official versions of the components. Students M and N come in to stage and start to use the code. Professor Z (with the help of H) starts to use the components with his students for his classes. Student L evolves the original OM capabilities to allow for more flexibility and to increase the computational power of the models. H brings-in her models into the new infrastructure.

Discussion and Conclusions

The above is a summary (where, I say again, I simplified many passages) of my main software enterprises. Could have they been evolved all differently (and better) if I would not have applied an open source strategy ? Probably yes, but I should have constrained the students to a contract about the property of the software. In this way I would have deprived my students of parts of their own work.
At the same time, I could not have left the software simply to them. The histories themselves show that I built my own work and research on the software we develop, and being free to use it and modifying it was a necessity. If I have needed to ask permission to use it, to sign a contract or so with someone (for instance who gave financial support), all the development would have been much more difficult to pursue. The same apply for other Actors who invested time and resources in the software development just because it was open. They are usually singles or low budget companies that could not have afforded expenses related to other type of licenses and be subjected to limitation of the software use.
Other researchers used the model. Being it free and open source was a clearly an added value for them.

Keeping the software close and commercial, besides not having scientific reasons (which require the contrary), would have obliged me to change myself in a businessman and turned away from my science. There are several cases of scientists that turned to captains of companies. But, for instance Stephan Wolfram, a gifted scientist, did not give very much contributions to science after he devoted his energies to MathematicaMathematica (probably the best computing environment ever) itself is his main achievement (which is not depreciable), despite his own claims on "New kind of Science"s.
The overburden required for managing a commercial software is not for all and has its own dynamics, that personally I  could not bear.

The fact that my code is free and open source has allowed (not without difficulties) self-instruction of new incomers. Various Agents had the possibility to start experiments and investigate new directions of development. Nobody needed to ask for starting them. Asking is a process that would have decreased dramatically people or groups pro-activity.

The Community had benefits from this policy. In some cases, single Actors could have thought that their contribution was not recognised enough and did not give to them an advantage. Their argument is  flawed. All of them had advantages from the collaborative environment and nobody (me included) could have produced what s/he has achieved without building on the shoulder of others and other open source projects.

Forgetting the above, some feel that their work is not enough protected, and being all open source, newcomers can more easily jump in and take advantage of their work.
Uncertainty on future, a competitive society, the pure necessity to find something that pays you for a decent life incline even to bests to a moderate selfishness or a moderate parasitic behavior. They do not want to give back to the community, after having got a lot from it, and act defensively.

Well, this behavior is absolutely possible if their developments do not use the original code that was produced as GPL. In particular, the components strategy used in project 2 above allows for building on top of the open source material new, undisclosed material, that anyone can use for his/her own profit, with a non open license.
I have to warn, however, that if the moderate parasitism grows too much, enthusiasm that is always necessary decreases,  the projects die, the source of benefits disappears and the community falls.

I would say that a mild parasitism is functional to the community if it is necessary to sustain the collaborative Subjects, and if eventually the Subjects give something back to the community. Parasitic Subjects themselves act in favour of the community by spreading and advertising the products, and sooner or later this will be bring benefits back (so do not blame them, they are, in any case, part of the stream).

Some Subjects actually wants an opaque management of the GPL philosophy in which people maintain an informal (but they pretend recognized) property of the software that goes beyond the copyleft and the intellectual recognition of their contribution. This would imply, in their mind: preferential redirection of funds towards them; involvement in papers or conferences contributions that use their code; veto power towards actions of thirds.
These desiderata are based on misunderstandings. It is clear that they will be involved in papers, conference, and decision. Any (wo)man and community of good-will will apply this policy in their favour, if they do not grow too greedy. But these actions are not mandatory and not even necessary. GPL does not implies them.

To be more clear, especially in hydrology, the market out there treats our model and softwares as a fungible commodity, that is, the market tends to treat all the codes as equivalent or nearly so with no regard to who produced them. (I think this is wrong, highly wrong, when brought to an excess).
But also the internal market, inside the community, treats them as commodities, meaning that, it would be dysfunctional, it would cause a waste of precious time, but any contribution is perceived as a thing that can be replaced (this is part of the not said history of 0,1,and 2 projects). Everybody is important nobody is necessary.

The A. paradox

One common argument of reluctant open sourceres  is: “I did not have still tapped the results of my own work and I should share it (statement 1)”, or "if I share it, others will use it without me and I will have no personal gain(statement 2)".
The first danger can be overcome, by an appropriate delay of the disclosure of documentation and explanatory material (I would not argue that keeping industrial secrets is useless, in general, however). That is: it is matter of having strategies that prevent the negative cases. In our field, however, being everything perceived as a commodity (see above) nobody will care to use our model or achievement instead than another one that gives what is (wrongly perceived) as similar, especially if our code is not known. Being open source with proper support actions helps model spreading.
Besides, looking at my histories (see also here), software changes fast and is, by no means, immutable. Histories 0,1, and 2 are signed by change. So the advantage one has with a new code in hands is ephemeral. In my own estimates you have just a a year of advantage for small codes, and a few years of advantage with a large and complex code. This small advantage, if you are smart, can be appropriately managed and used to produce new and more innovative code and so on. (Open sourceness is against stagnation).
Often, however, it is not the the fear of far away threats that makes problems, but the fear of close by Agents. Guy A fear that B in the group who came in after her/him, will get positions or funding with his/her work. I would say that this could happen but it is difficult. In a fair (not fear) competition A always wins over B, if the quality of B can just be attributed to codes that A developed. The real problem is when B is much better that A. But in that case, having A work for B is not important. B will get rewards instead than A almost always. For A, the best thing, in the medium range, is to collaborate with B.
What, finally I really call the A. paradox is in statement (2). If it is so easy to grab your work, then it would be equally easy to anyone to replicate it. Therefore your work is not giving to you any competing advantage, even if you keep it secret for a while. If it is not easy to grab, then, who wants to use it proficiently needs you. So you are the winner, not because you keep your code top secret, but because all the issues it solves require a complex expertise that only you, the author can have. So ….


Professor eventually Z disappears. Not because he dies (please do exorcisms), but because his role, in the growing group of people around projects has become more and more marginal. Subjects also acquired maturity and as well as the will to maintain the advantages that the work has produced with respect to competitors.
This passage requires that the initially informal community establish as a formal Community (they wrote here for Academics) with its rules, etiquette, and wise management. This, in turn, requires Subjects coordinate and share alike their views, plan together new developments, plan events to make the common work to grow. Balkanisation of the code (which GPL could allow) and internal conflicts (never avoidable, having the Subjects different agendas) should be managed appropriately, and this requires clear agreements, smart actions, good will, and wise arguments.
If the community grows, everybody would be safer, because cooperating is better than competing (see also coopetition).
A partial adoption of the Open Source strategy is instead very useless. Open source codes that are practically not available (as those that are open source but not freely downloadable) cannot grow a healthy community and, sooner or later, die.

* A final note

Actually even if in my intention is a project also for my students, not a few of my students do not deeply endorse it. Reasons for this can be, maybe found in their personal history, the chemistry of their bodies and minds, or something else, which is hidden to me. So far, I  overreacts feeling myself betrayed, when they dismiss in what I believe it is right. So, probably my attitude is not is not correct. Sons do whatever they want, and probably they are right to try to find their way. So I have to conclude that the above is MY dream, and I will not be upset anymore, if my academic sons search their own in a different way.

No comments:

Post a Comment