Monday, June 6, 2016

Java, Python, C/C++ or FORTRAN in scientific programming ?

I found this simple post by Michael Scharf on Quora  that discusses part of the question. It is easy to read, clear, and probably true. So I publish it verbatim (with some comment of mine, especially for FORTRAN)

"I have used  (Java, Python, C/C++ - my note) each of them for 15-20+ years. There is no best. They have different strengths and weaknesses.
  • C and C++ require a lot of discipline because you have to do memory management yourself.
  • C++ is extremely powerful but also very complex.
  • C and C++ are "dangerous" because, if you are not careful, your program can access and modify data that it is not supposed to touch.
  • Python is elegant and designed to be easy to use and read. It has the least distractions when it comes to syntax.
  • The syntax of C, C++ and Java look somewhat similar. Python looks different, it  uses indentation instead of {} to group code
  • Java has the best IDE support (e.g. eclipse or IntelliJ)
  • C and C++ are also statically typed, but the preprocessor can add a level of complexity that can make it difficult to be sure what actually happens.
  • In terms of speed C/C++ are fastest, but for most problems Java is very close in speed. Python can be slow, but if needed critical parts can be written in C. On modern execution speed is rarely the limit - cache behaviour, memory and disc access are the limits.
If you want to learn programming, I would learn Python first, then Java, then C and finally C++.
I personally would not recommend C++ because of its complexity. However, if your are disciplined and have a strict set of rules for a project, C++ can be fantastic.
I would use C only for low level stuff, like writing device drivers.

Java is good for large projects, provided you write good APIs and you are carefully modularizing your software.

Python is good for small projects. If the team and the software gets bigger, it can become hard to maintain unless you have a very good test coverage."

There is no mention of FORTRAN  because since many years, it is just a language that regards the niche of users that do numerics (of any type).  Here a discussion in StackOverflow about this language compared (more or less) to the other languages mentioned above. 
I never really used FORTRAN, so I am not the best person to  talk about it.  I actually moved out of it in 1988 when I decided to use a language with dynamic memory allocation, and I never went back to it.  Generically I believe its velocity is overestimated, especially considering that the average scientific programmer is a bad programmer, and a bad program in FORTRAN can be much slower than a program in any of the above languages.

I saw a lot of tools made in FORTRAN and some large models. But I am not sure it is the best choice for a large project.  Anyway, FORTRAN plus Python is the choice of many in these days. Still I believe it is a sub-optimal choice for managing software projects that go beyond programming a few algorithms, and personally I do not like its syntax and its convoluted object orientation. But I will never personally test if this is actually the truth. 

Moreover, it is true that "It's (FORTRAN's) array handling is nice, with succinct array operations on both whole arrays and on slices, comparable with matlab or numpy but super fast. The language is carefully designed to make it very difficult to accidentally write slow code -- pointers are restricted in such a way that it's immediately obvious if there might be aliasing, as the standard example -- and so the optimizer can go to town on your code. Current incarnations have things like coarray fortran, and do concurrent and forall built into the language, allowing distributed memory and shared memory parallelism, and vectorization. (cit. from Stack Overflow)". However,  it is this attitude that favors such a certain laziness in software design that I do not appreciate. 

For most people, however, is just matter, of what they were taught, i.e. a matter of legacy. They just have a lot of material to which they got used to, and do not want (or have the time) to change. So think that your initial choice can stick with you for most of your life, because what Peter Norvig says, has a lot of foundations (Teach yourself programming in then years).  On the other hand,  consider that, if your activity has to do with programming, during your life you will embrace  more than one programming language (but one will remain your main one). Not because it is mandatory (and for some jobs it is), but because you are this type of guy (gal) that likes this sort of things. 

So what would be my choice for students ? Well, for students in hydrology, I will choose Python and R. But for real life applications, I would chose Java first (as I did).  Eventually, my recent reflections tend to support the idea that the language (with its syntax and attitudes) is just one part of the problem, and that the real focus should be (without making of it an idol) the design of the code and its proper documentation (which has some peculiarities when the concern is a scientific activity).


  1. Dear Riccardo,

    thank you for making me shift attention on this topic once again. As you know, I agree with each single char of your post.

    I strongly support OOP because I think that the speed of a code is relative in someway, while a good design is not!

    How do you measure the speed, for instance, of an hydrological model: just evaluating the time spent to run the executable or measuring the time spent to adapt the model with the last discovered in research?

    I hope in this way to open a discussion here, in this web-page, because I'm really interested in hearing other ideas and opinions about this definitely relevant topic. And this blog is the right place.

    Best and GWH!


  2. f2py library (converts fortran subroutine to python function) makes python supercharged for software evolution purposes. Basic multiprocessing and numpy libraries perform real speed up in calculations that makes python best for common modelling applications. I think that SUMMA framework proposed by M. Clark is the best way for model blocks connection using open API (simpler than OpenMI) - and we will use statically typed languages for solvers and dynamically typed languages for block connections.

    1. I know Martyn work (see: Hower not so deeply.

    2. Firstly, thanks Riccardo to making me shift my attention to this topic once again.

      Unfortunately I do not know anything about python, so I'm not really able to reply to hydrogo in a technical way.
      However, I can share my small experience in programming environmental models (especially hydrological ones).

      Some months ago I had to deal with a monolitic software originally written in C. It was supposed to be fast because of the peculiarities of the language and the algorithms implemented. But modifying it was a nightmare.

      In this sense, my question is:

      what is speed for a software: the time spent to get results from running the executable or the time you need to adapt the model to the last discoveries in research?

      Thus, in my opionion, the speed of a software is something relative, while well thought design is not. Thinking about the speed of research evolution, we can wait some more minutes to get results, but we can't wait months to get ready the implementation of new modelling solutions.

      Concluding, I believe that Object Oriented Programming (in the true sense of the words) is what research needs to speed up the modeling evolution. It is not matter of programming language.

      But I'm an open mind, I'm looking forward to opening an interesting discussion about this topic. And this blog is the right place.

      Thank again Riccardo for the food for thought.



  3. This comment has been removed by a blog administrator.

  4. Good and interesting information shared here.

  5. This comment has been removed by a blog administrator.

  6. This comment has been removed by a blog administrator.