AboutHydrology: Leveraging LLMs in Hydrological Research: A Personal Workflow and Supporting Literature

Sunday, January 25, 2026

Leveraging LLMs in Hydrological Research: A Personal Workflow and Supporting Literature

In the rapidly shifting landscape of scientific research, Large Language Models (LLMs) have emerged as more than just productivity tools; they are powerful engines for augmenting human creativity. As a hydrologist navigating complex phenomena like percolation theory and soil water dynamics, I have developed a hybrid workflow that integrates a suite of LLMs—including Gemini, Grok, ChatGPT, and Claude—into my daily practice.

This post outlines my approach, refined through cross-model interaction, and contextualizes it within recent literature. My goal is to demonstrate how we can exploit these "information reservoirs" with ease while maintaining the rigorous standards required by the physical sciences.

The Workflow: From Intuition to Iteration

My process is built on the principle that LLMs should act as collaborative aids, not replacements. This ensures human oversight counters the inherent risks of .

1. Conceptualization and Targeted Drafting

Rather than starting with exhaustive preliminary reading, I begin by formalizing concepts that have been "simmering" in my mind. I draft notes with highly specific questions. Precision is the key: the more targeted the query, the more useful the response. For example, I might prompt a model to bridge adjacent fields, such as connecting Minkowski functionals to percolation theory—a link I conceive independently but use AI to flesh out and expand.

2. The Multi-Model Reasoning Loop

Once a draft is mature, I submit it to an ensemble of models to leverage their unique strengths (e.g., Claude’s nuanced reasoning, Gemini’s expansive context window, or Grok’s real-time information access):

Structural Refinement: Improving the logical flow and hierarchy of arguments.
Cross-Verification: I use a multi-model approach, asking ChatGPT to critique the mathematical derivations provided by Gemini, or vice versa.
Fact-Checking: Any discrepancies identified between models are looped back for revision until the logic holds across all platforms.

3. Traditional Validation

The AI output is never the final word. I validate the substance through traditional scientific means:

Deep reading of primary peer-reviewed literature.
Direct discussion with colleagues.
Implementation of toy models or full-scale simulations to empirically verify the AI-assisted hypotheses.

4. Use impressions

The use of these tools has significantly boosted my search productivity and provides real comfort through the assistance they offer. While large language models (LLMs) often capture the general ideas and broad directions quite effectively, they frequently make errors in equations and produce interpretations that can feel hallucinated—mixed among many correct suggestions.

Nevertheless, because LLMs draw from a vastly wider knowledge base than any individual can access, they enable rapid retrieval of relevant information that would otherwise take weeks (or longer) to uncover through manual searching. On balance, I consider the overall impact positive.

Recent versions of several LLMs now provide accurate references in the vast majority of cases (often 99%), which represents a major advantage. I routinely retrieve and verify each cited source one by one, then store them in my personal papers organizer. As a valuable side effect, LLMs demonstrate strong recall of older literature and can point directly to the original sources of key ideas—helping improve the accuracy and integrity of citations.

The Environmental "Metabolism" of Research

A recurring theme in my work is the tension between the "easy exploitation" of information and the environmental cost. To provide a concrete example, I have estimated the footprint required to produce a substantive research commentary, such as my recent work on the Richards Equation.

Metric	Per Standard Query	Per Reasoning Query	Total for Research Cycle (est. 15-20 interactions)
Energy	~0.34 Wh	4.3 – 33.6 Wh	~110 - 250 Wh
Carbon ($CO_2e$)	~0.15 g	~1.5 - 12.0 g	~25 - 50 g
Water Withdrawal	~0.26 ml	~3.5 - 25.0 ml	~150 - 500 ml

Contextualizing the Impact:

Energy: The ~150 Wh consumed is roughly equivalent to leaving a 10W LED bulb on for 15 hours.
Water: At the upper end, the research for a single deep-dive commentary "drinks" about 500ml of water (one standard bottle), used for cooling data centers.
Carbon: 50g of
$CO_2$
is equivalent to driving a gasoline car for approximately 250 meters.

References

1. LLM4SR: A Survey on Large Language Models for Scientific Research Authors: Various (survey paper) arXiv preprint arXiv:2501.04306, 2025 https://arxiv.org/abs/2501.04306

2. Exploring the role of large language models in the scientific method: from hypothesis to discovery npj Artificial Intelligence (Nature Portfolio), 2025 https://www.nature.com/articles/s44387-025-00019-5

3. A Survey of Human-AI Collaboration for Scientific Discovery \Preprints.org, 2026 https://www.preprints.org/manuscript/202601.0405/v1

4. Scientific Discoveries by LLM Agents OpenReview (conference/journal paper) https://openreview.net/pdf?id=fxL6eFPsd1

5. How Much Energy Do LLMs Consume? Unveiling the Power Behind AI ADASCI Blog / systematic reviews on LLM energy consumption Approximate figure: GPT-3 training ~1,287 MWh https://adasci.org/blog/how-much-energy-do-llms-consume-unveiling-the-power-behind-ai

6. Embracing large language model (LLM) technologies in hydrology research IOPscience / Environmental Research: Infrastructure and Sustainability, recent https://iopscience.iop.org/article/10.1088/3033-4942/addd43

7. Large Language Models as Calibration Agents in Hydrological Modeling Geophysical Research Letters (AGU Journals), 2025 https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2025GL120043