Monday, June 29, 2026

What is a DARTH, and which is its way to implement a digital twin of Hydrology ?

So what, exactly, is a DARTH? Not the software, not the cloud — the thing itself. This post is my attempt to answer, in thirteen tenets.


Digital twins are everywhere now, at least we see several contributions that name themselves such. They appear in the great international programmes, in the roadmaps of agencies, in the slides of almost every keynote about the future of Earth science. Many of them are magnificent feats of engineering: high-resolution solvers, elastic cloud, fast emulators. And yet, watching talk after talk, I kept feeling that something was missing. Technology is not the same thing as a vision. A high-resolution engine and a cloud are not, by themselves, a scientific instrument.

The precedent I keep returning to is FAIR. Years ago the FAIR principles changed the way our community treats data — not by inventing any new technology, but by writing down, compactly and citably, what good stewardship requires: that data be findable, accessible, interoperable, reusable. Their power was in the act of definition. We need something similar for twins. But a twin is neither only data nor only software. It is a model bound to a real basin, kept in correspondence with it, and used to make claims about its past, present, and possible futures. So I sat down and tried to write what makes such a twin trustworthy.

I came out with seven points. Talking them over with colleagues, the seven became thirteen, and they arranged themselves naturally into five movements — from what a twin is, through the record it keeps and the machinery that runs it, to the commons that builds it and the ends it serves.

I · Representation — what a twin is

1. A multi-resolution, multi-process, representation of the hydrological cycle, built from interchangeable modelling solutions that close mass and energy budgets and quantify their own error — at least in a subjective, Bayesian sense — propagated across scales.

2. Bound to a real system: calibrated, validated, and — where data allow — kept synchronised with the basin through data assimilation. This binding to reality is what distinguishes a twin from a mere simulator.

3. Transparent — no black boxes. Every component is inspectable, and even the machine-learning parts carry a stated structure and an interpretable role.

II · Data — the record it keeps

4. All data, consumed and produced, are geo-registered, versioned, and FAIR.

5. Every result is reproducible end to end — code, parameters, environment, and data lineage recorded as provenance. FAIR governs the data; this governs the computation.

III · Software — the machinery that runs it

6. Model parts are interchangeable by construction, so alternative descriptions can compete as falsifiable hypotheses.

7. Deployable over the web and, through shared standards and interfaces, interoperable across infrastructures — components and twins from different groups composing together, running from laptop to HPC.

8. Open source: built on, released as, and developed in the open, and free of lock-in.

IV · Community — the commons that builds it

9. A shared infrastructure, not isolated codes, that serves the widest community and grows with knowledge.

10. Organised, by construction, for participation and cooperation — for the collective action of scientists.

11. Stewarded: it carries its own documentation, training, and transparent governance, so that it outlives any single project or generation of its makers.

V · Purpose — the ends it serves

12. General-purpose by design — prediction, scenarios, decisions, education, or pure inquiry alike.

13. Bounded by an ethical purpose: to serve the public good and the stewardship of water and land, and to do no harm.

Please observe that Number 1, excludes practically all exiting model, which are certainly not multi-resolution, neither multiprocess. Therefore, for now, we have to consider DARTHs as an "organizing metaphor", i.e. a working program.  If you make me choose the tenets I care most about, they are these. Number 2, the binding to observation: a model that never meets a measurement is a beautiful animation, not a twin. Number 3, no black boxes: I am not against machine learning — I use it — but a learned component still has to say what it is and what role it plays. Number 7, interoperability among infrastructures: no platform is a twin of the planet on its own; the digital Earth appears only when many of them interoperate. One Earth, many infrastructures. And number 13, the ethical bound, which I added last and on purpose — openness and transparency are not only engineering choices, they are how we stay accountable.

Here is the part I most want to insist on: these are not wishes. The means to honour most of them already exist, in the open. The Basic Model Interface (CSDMS) makes components interchangeable and couplable; the OGC API – Processes family lets us deliver and chain models as web services across infrastructures; and the Object Modeling System with its cloud companion CSIP turns model components into open services at scale. GEOframe and GEOtop are simply our own way of walking that path. What no standard hands you for free is the last rung — transparency, honest uncertainty, components that behave as hypotheses rather than as fitted curves. That rung is the science, and it stays our job.

I am posting this as a draft, deliberately, because arguing about it in the open is rather the point of tenet 10. We are also writing it up — in the spirit of the FAIR paper — as a short article, anchored to those existing standards. If you think I have a tenet wrong, or that I have missed one, please tell me. That is how a manifesto earns the name.


The one-page manifesto (PDF): here.  ·  The short paper, in preparation: draft.  ·  The original blueprint: Rigon et al., HESS Opinions: Participatory Digital eARth Twin Hydrology systems (DARTHs), HESS 26, 4773–4800 (2022).

No comments:

Post a Comment