Tuesday, September 15, 2020

DevOps - Or about streamlining what is needed to do modelling carpentry

This morning I learnt a new word: DevOps, which seems to be the contraction of software Development and IT Operations. I.ve got it from the presentation my former student Daniele Dalla Torre gave at the Biennial iEMMs 2020 conference (find his presentation and others about his work here). He captured my attention with the following picture you find below


Wikipedia comes to help in understanding what exactly this means by listing the following items:

  1. Coding – code development and review, source code management tools, code merging.
  2. Building – continuous integration tools, build status.
  3. Testing – continuous testing tools that provide quick and timely feedback on business risks.
  4. Packaging – artifact repository, application pre-deployment staging.
  5. Releasing – change management, release approvals, release automation.
  6. Configuring – infrastructure configuration and management, infrastructure as code tools.
  7. Monitoring – applications performance monitoring, end-user experience.

The figure contains some further information by annotating some tools. Also note that he intelligently added "Plan". So the steps are actually 8, not 7. Planning in his figure is under the OSF , the Open Science Framework, the place where we upload the material of our projects, literature, thinking. Some other colleagues use Slack for this, but on the latter I do not have much experience. Certainly a place where all the material regarding a project and the interactions among people is preserved is necessary. In Coding he puts Git. Github is the public repository where all of our software is uploaded. Git is a "Control Version System" (CVS), a place where  researchers upload the versions and modification of their software, maintaining. Ghere exist other, like, for instance, Mercurial, but we use Git.  There is actually a misuse of Git among my students. While it is thought to be used continuously along the process of development, they use it only at the end of the process, to upload a reasonable version of their work. I think it does not exploit the good of a CVS but that is the state-of-art at the moment. When you have written your software (in contemporary practice is quite obvious that you used a IDE for doing it - Eclipse, Netbeans or IntelliJ in our case) you have to build your software, which means, compiling and assembling it in some executable.  Naive and simple building is made inside the IDEs but complex software building needs a builder. Our chain of tools uses Gradle, whose symbol is a small elephant. In turn, Gradle means learning a further DSL language, for making the build. Testing means two things: preparing tests for assessing the correctness of the code and running the tests. Also this practice is not so common in scientific practice. One reason is that our models usually have complex outputs which is difficult to characterize but this happens mostly because hydrologists are ignorant of good software building practice. For instance, in the Figure is mentioned JUNIT 5 which is quite a natural choice for who is working with Java. If the software pass its tests, it is supposed to be packaged in a releaseTravis CI symbolized by the a mustachioed face should do this pass. In fact, I believe it does only part of the job, which goes back to run the test and providing a final compiled version of the code. Actually this sort of packaging is not completely operational, since the software can be eventually brought to a computer and installed in it with the right switch for working. Docker  (the whale in Deploy) simplify it by providing a standard client to any computer (with some problem for MS Windows, indeed). It works fine in my Mac Computer, and calling it from within Jupyter lab notebooks provides a nice solution. However, I feel the problem of the distribution of the software (through a tool like Anaconda's conda can be)  and treating all the dependencies among the libraries, is still an open question.  Configurations and maintenance of the code and the details of configurations of the software is the further step. 

Finally comes the application performance management (APM) is the monitoring and management of performance and availability of software applications. APM strives to detect and diagnose complex application performance problems to maintain an expected level of service. APM is "the translation of IT metrics into business meaning ([i.e.] value)." This latter maybe is not so central in scientific applications for which other types of tests, screening and case studies are probably more relevant. 

1 comment: