Thursday, February 3, 2011

Characteristics of good modeling software

Making good models is just one part of the whole job of a hydrologist. It is a tradition in our research field to make good research with not very good computer codes. Please do not misunderstand me. I do not mean that the algorithms used are wrong: I mean that the overall simulations machinery is usually not very well engineered, and using the code produced by researchers is not as easy as it could be (and actually is for many   industrial programs). This eventually makes scientists (and users too) loose a lot of time in redoing the same things, even when the original codes are available, simply because these codes are not well documented or do not provide those  functionalities that make them usable. The following paper (that was addressed to me by the Author of the Csparse library, T. Davis) covers some of the topics of making a good and usable code, and is a must to read for who does modeling.

Please follow the link below for getting the paper (last accessed February 3rd, 2011)
Characteristics of Industrial strength software.

The main conclusions by the Authors are summarized, and a little edited below for the laziest.

" … It is important to design a ... software to be easy to use and robust. Often it is better to assume that the user is not an expert in … algorithms, but someone who has a problem to solve and wishes to solve it accurately and efficiently with minimal effort. After all, even experienced users were once novices and a user’s initial experiences of using a solver are likely to determine whether he or she goes on to become an expert user. Based on our experiences …., in addition to the requirements of good performance (in terms of memory and speed) and the availability of comprehensive well-written documentation, in our opinion the following features characterize an ideal …. solver.

• Simplicity: the interface should be simple and enable the user to be shielded from algorithmic details (note: this is called in OO information hiding). The code should be easy to build and install, with no compiler warning messages. During the building of the software from supplied source, minimum effort and intervention by the user should be required. …. dynamic memory allocation should be used so that the user need not preallocate memory. In fact, the software developer needs very good reasons for not selecting a language that includes dynamic memory allocation.

The software developer should consider providing interfaces to popular high-level programming environments, such as Matlab, Mathematica, and Maple (note: and I add R, because Open Source is an add value.. Besides offering an appropriate interface is also behind the whole JGrass Project).

• Clarity: …. Furthermore, to allow repeated solves and iterative refinement there should be a clear distinction between (note by RR:) preprocessing and solve phases. … Developers should consider offering simple (all-in-one) interface as well as an interface with the greater flexibility of access to the different phases of modeling.

• Smartness: good choices for the default parameters and of the algorithms to be used should be automatically made without the user having to understand the algorithms and to read a large amount of detailed technical documentation. There should be an option to check the user-supplied input data, particularly for any assumptions that the code relies on. (Note by RR:) Parameters of the models should be as much as possible explained in documentation and code.

• Flexibility: for more experienced users and those with specific applications in mind, the solver should offer a wide range of options, ….. There should also be options for the user to specify the information that he or she requires …. The software should …. support 64-bit architectures, (note by RR) and be platform independent.

• Persistence: the solver should be able to recover from failure. For example, if it is found that there is not enough memory, a code that contains both in-core and out-of-core algorithms should automatically switch to out-of-core mode. Reverse communication should be designed to allow corrections to the input data.

• Threadsafety: The code should be threadsafe to enable the user to safely run multiple instances of the package simultaneously in different threads or on different processors.


No comments:

Post a Comment