In the recent submitted manuscript about DARTHs (Digital eArth Twins of Hydrology) we delineated five categories of models in a possible increasing adaptability to be part of a DARTH or of a DARTH component:
- MaaA, Model as an Application
- MaaT, Model as a Tool
- MaaS, Model as a Service
- MaaR, Model as a Resource
- MaaC, Model as a Commodity
I'll try to explain what the acronyms mean here below. The characteristics listed, it should be remarked, are not connected to the domain science contents of models but to to the software architecture characteristics and requirements.
|
Tom Hagen's photo click on it for more |
MaaA - Model as an Application
MaaA - Model as Applications - Full fledged models that have close architecture, that includes the data formats and the visualization tools. What follows for MaaA is taken with little modification from Rizzoli et al. (2006).
- A MaaA bundles data, algorithms and the graphical user interface of a model in an application. This makes the model very hard to re-use out of its original context. Most MaaTs are also monoliths composed by hundreds of thousands lines of code.
- A MaaA works just on one operating system MS Windows, Mac OS or Linux.
On MaaA, Knoben et al. (2021) provides the following description:
"These tools are typically provided as self-contained packages. Packages tend to be easy to use for their intended purpose but take time to understand and do not necessarily provide much flexibility to deviate from their intended purpose. Layering additional functions on top of an existing package or modifying a package’s source code is certainly possible, but can be outside the comfort zone of many users."
Other usual characteristics of MaaA are:
- Applications evolution is totally in the hands of the original developers. This is a good thing for intellectual property rights and in a commercial environment, but this is absolutely a bad thing for science and the way it is supposed to progress. Independent revisions and third-party contributions are nearly impossible.
- MaaA often do not come with associated data sets for testing. Moreover, the adoption of object-oriented programming, while it is a good thing for model reusability and portability, it makes things more complex for testing, because of a number of problems such as observability in virtual method calls and state dependent behaviour of objects.
- The way they are coded (as monolithic entities) displays a strong level of internal cohesion, and, if a modeller is interested in reusing a particular function within a bigger model, they can find it very hard to isolate and extract it, given the strong dependencies existing in the source code parts.
- Their data formats do not come from a community agreement and, their developers typically decide to have output data in a format relevant to their own application, which may not be a format that is widely used by others. It is cumbersome for developers to have their tools ingest multiple different data formats and such functionality is therefore somewhat rare (slightly modified from Knoben et al 2021).
It could be observed that a MaaA could be evolved to eliminate the various characteristics in the bullet list. In fact there exists a variety of MaaA which, especially recently, pursued such achievement (modular code, open to common data formats, separation between the graphical model interface and the rest of the code).
MaaT - Model as a Tool (Mainly From Nativi et al., 2021)
In MaaT, differently from MaaA there areat least two level of abstraction: the interface is abstracted from the model, in a client-server way, and the model is loosely coupled to the data. The interacting tools is distinct from the model itself and can eventually be changed.
However, a given implementation of the model runs on a specific server, and the interact with the model through the user interface. In a MaaT the models are preloaded on a specific machine.
Besides, it is not possible to modify the interaction between the server and the client which is kept fixed by the user interface. Benefits include a strong control of the model use and execution (which could be useful to control what happens in a operational service). There are limitations on the usability and flexibility of the model, as well as its scalability due to the limitation of the specific server. Machine-to-machine interoperability (chaining capabilities) is not allowed. Knoben et al (2021), without knowing the acronym, defines MaaT well when talking of some web-based services: "... several of these tools are provided as web-based services. This can be appealing because, for example, data can be pre-downloaded to speed up model configuration and model simulations can be easily shared. The advantage of such approaches is that they can be combined with some form of server-side data transformations (e.g., subsetting or averaging), which minimizes data transfers. Storing the inputs for and outputs of large-domain simulations can, however, be cumbersome, and keeping pre-downloaded data up-to-date and sufficient for all user needs takes sustained, long-term effort. A further complication is that it is regrettably common that such web-based services require some form of manual interaction with the webpage, limiting opportunities to automate data acquisition tasks".
In a MaaT there is a certain level of abstraction is implemented to make the data and the models loosely coupled but MaaTs do not necessarily provide tools for automation of data acquisition. However, it can be said that the models' core in a MaaT is agnostic with respect the data source and formats (for a detailed explanation see Knoben et al., 2021).
In an open MaaT,
- model evolution should be in the hands of a community
- models should come with an appropriate set of tests both for the informatics and the physics.
- a modular structure for the code should be the rule
- Tools for data brokering should be available
MaaS - Model as a Service (Mainly From Nativi et al., 2021 and David et al, 2014)
A “Model-as-a-Service” provides the capability to execute simulation models as a service. As
Wikipedia reports: "In the contexts of
software architecture,
service-orientation and
service-oriented architecture, the term service refers to a software
functionality or a set of software functionalities (such as the retrieval of specified information or the execution of a set of operations) with a purpose that different
clients can reuse for different purposes, together with the policies that should control its usage (based on the identity of the client requesting the service, for example)."
As for the previous case of MaaT, a given implementation of the analytical model runs on a specific server, but this time, APIs are exposed to interacting with the model. Therefore, interoperability consists of machine-to-machine interaction through a published API, e.g., for a run configuration and execution. Nevertheless, it is not possible to move the model and make it run on a different machine (without having to "manually" install the model and its managing software on these machines). Concerns deal with a still limited flexibility and possible scalability issues (depending on the server capacities). To note, this time, the existence of possible concerns for less control on the model (re-)use.
There are two main usage patterns: (i) The model can be pre-deployed, has a well-known service endpoint, and is supported by supplemental data services. This is quite common for operational models used in a production environment. Moreover, (ii) the model can be dynamically deployed from the client before execution (implying that a MaaS is made up of a pool of modelling components that can be linked just before run time with a scripting language). Model service development for research purposes needs such a behavior. Both approaches address a different workflow, need for availability and security. A certain model execution method may also be specified in such a service.
MaaR - Model as a Resource
The interoperability level resamples the same patterns used for any other shared digital resource, like a dataset.
- This time, the model itself (and not a given implementation) is accessed through a resource-oriented interface, i.e., API and
- a software infrastructure layer manages (with some constraints whose invasiveness should not be relevant) a set of compliant models.
- That allows to effectively move the model and make it run on the machine that best performs for a specific use case.
Cloud services can distribute the model runs on various architectures (like cloud services, high performance computing machines, multicore machines, cluster of computers) dynamically adapting the request of resources to the demand.
There are clear benefits in terms of flexibility, scalability, and interoperability. The main concerns, maybe, are about the model sound utilization.
MaaC - Models as a Commodity
They are MaaS or MaaR that in addition have some controls on the Science and their explanation.
Differently from the other, previous, classifications the Model as a Commodity definition does not imply information technology issues but programming and science issues that are related to the DARTHs working. Are MaaC models a mass-produced unspecialized product (the meaning of commodity) ? They are obviously not but their use inside Digital Earth Twins, once they will be produced, will be like they were such. The most of the people who will access it, will do for taking decision on other aspects of science and social life. Therefore the hydrological modelling in DARTHs has to acquire some features that make their use more safe and less prone to introduce fake information in the public. This is envisioned in providing DARTHs with error (uncertainty) estimations for all the quantities hindcasted and forecasted. The topic is difficult one with a large amount of literature (e.g. Beven, 2016), often difficult and obscure (e.g. Nearing et al., 2016) and the requirement here of having a quantification of uncertainty does not enter the dispute of the origin of errors, while staying on Cox (1946) statement that "Purely empirically, probability and statistics can, of course, describe anything from observations to model residuals regardless of the actual sources of uncertainty as an expression of our reasonable expectations" (taken from Beven, 2016) that, at least an empirical estimation of the error on the base of recorded data is possible.
Because modelling require a first phase of training/calibration/ on the past, them error of modelling must include an analytic performance over the past data of the model. Therefore a MaaC is a MaaS or a MaaR provided with error estimations on any of the quantity hindcasted or forecasted, a warning for the use of any quantity and a major effort for modellers.
The MaaCs inherit form MaaS and MaaR their composable structure, however with a purpose. Components are self-contained building blocks, modules or units of code. Each well designed component usually implements a single modeling concept. Multiple algorithms can be implemented within the same component or in various components, and inserted in modeling solutions as alternatives, thus opening the way to compare, inside the same chain of tools, different approaches. This respond also to a science requirement, i.e. the idea that model should be used in DARTHs as hypothesis to be tested among various possibilities (Clark et al, 2011). This flexibility will be usually not directly available to the more unaware end-users, but will be certainly useful for scientists to provide more reliable modelling. Therefore MaaC requires tools for supporting the workflow of hypothesis testing. These tools are usually provided at "literate computing" workflows, as those explained in Rigon et al. 2021.
DARTHs in their essentials are modeling infrastructures that deploy the MaaC paradigm.
References
Beven, Keith. 2016. “Facets of Uncertainty: Epistemic Uncertainty, Non-Stationarity, Likelihood, Hypothesis Testing, and Communication.” Hydrological Sciences Journal 61 (9): 1652–65.
Clark, Martyn P., Dmitri Kavetski, and Fabrizio Fenicia. 2011. “Pursuing the Method of Multiple Working Hypotheses for Hydrological Modeling.” Water Resources Research 47 (9). https://doi.org/10.1029/2010wr009827.
Cox, R. T.,1946. Probability, frequency and reasonableexpectation.American Journal of Physics, 14, 1–13.doi:10.1119/1.1990764
David, Olaf, Wes Lloyd, Ken Rojas, Mazdak Arabi, Frank Geter, James Ascough, Tim Green, G. Leavesley, and Jack Carlson. 2014. “Modeling-as-a-Service (MaaS) Using the Cloud Services Innovation Platform (CSIP).” In International Congress on Environmental Modelling and Software. scholarsarchive.byu.edu. https://scholarsarchive.byu.edu/iemssconference/2014/Stream-A/30/.
Knoben, Wouter Johannes Maria, Martyn P. Clark, Jerad Bales, Andrew Bennett, S. Gharari, Christopher B. Marsh, Bart Nijssen, et al. 2021. “Community Workflows to Advance Reproducibility in Hydrologic Modeling: Separating Model-Agnostic and Model-Specific Configuration Steps in Applications of Large-Domain Hydrologic Models.” Earth and Space Science Open Archive. https://doi.org/10.1002/essoar.10509195.1.
Nativi, Stefano, Paolo Mazzetti, and Max Craglia. 2021. “Digital Ecosystems for Developing Digital Twins of the Earth: The Destination Earth Case.” Remote Sensing 13 (11): 2119.
Nearing, Grey S., Yudong Tian, Hoshin V. Gupta, Martyn P. Clark, Kenneth W. Harrison, and Steven V. Weijs. 2016. “A Philosophical Basis for Hydrological Uncertainty.” Hydrological Sciences Journal 61 (9): 1666–78.
Rigon, Riccardo, Giuseppe Formetta, Marialaura Bancheri, Niccolò Tubini, Concetta D’Amato, Olaf David, and Christian Massari. 2022. “HESS Opinions: Participatory Digital Earth Twin Hydrology Systems (DARTHs) for Everyone: A Blueprint for Hydrologists.” Hydrology and Earth System Sciences Discussions, 1–38.
Rizzoli, A. E., M. G. E. Svensson, E. Rowe, M. Donatelli, R. M. Muetzelfeldt, T. van der Wal, F. K. van Evert, and F. Villa. 2006. “Modelling Framework (SeamFrame) Requirements.” SEAMLESS.