In the rapidly shifting landscape of scientific research, Large Language Models (LLMs) have emerged as more than just productivity tools; they are powerful engines for augmenting human creativity. As a hydrologist navigating complex phenomena like percolation theory and soil water dynamics, I have developed a hybrid workflow that integrates a suite of LLMs—including Gemini, Grok, ChatGPT, and Claude—into my daily practice.
This post outlines my approach, refined through cross-model interaction, and contextualizes it within recent literature. My goal is to demonstrate how we can exploit these "information reservoirs" with ease while maintaining the rigorous standards required by the physical sciences.
The Workflow: From Intuition to Iteration
My process is built on the principle that LLMs should act as collaborative aids, not replacements. This ensures human oversight counters the inherent risks of AI hallucinations or bias.
1. Conceptualization and Targeted Drafting
Rather than starting with exhaustive preliminary reading, I begin by formalizing concepts that have been "simmering" in my mind. I draft notes with highly specific questions. Precision is the key: the more targeted the query, the more useful the response. For example, I might prompt a model to bridge adjacent fields, such as connecting Minkowski functionals to percolation theory—a link I conceive independently but use AI to flesh out and expand.
2. The Multi-Model Reasoning Loop
Once a draft is mature, I submit it to an ensemble of models to leverage their unique strengths (e.g., Claude’s nuanced reasoning, Gemini’s expansive context window, or Grok’s real-time information access):
- Structural Refinement: Improving the logical flow and hierarchy of arguments.
- Cross-Verification: I use a multi-model approach, asking ChatGPT to critique the mathematical derivations provided by Gemini, or vice versa.
- Fact-Checking: Any discrepancies identified between models are looped back for revision until the logic holds across all platforms.
3. Traditional Validation
The AI output is never the final word. I validate the substance through traditional scientific means:
- Direct discussion with colleagues.
- Implementation of toy models or full-scale simulations to empirically verify the AI-assisted hypotheses.
4. Use impressions
The use of these tools has significantly boosted my search productivity and provides real comfort through the assistance they offer. While large language models (LLMs) often capture the general ideas and broad directions quite effectively, they frequently make errors in equations and produce interpretations that can feel hallucinated—mixed among many correct suggestions.
Nevertheless, because LLMs draw from a vastly wider knowledge base than any individual can access, they enable rapid retrieval of relevant information that would otherwise take weeks (or longer) to uncover through manual searching. On balance, I consider the overall impact positive.
Recent versions of several LLMs now provide accurate references in the vast majority of cases (often 99%), which represents a major advantage. I routinely retrieve and verify each cited source one by one, then store them in my personal papers organizer. As a valuable side effect, LLMs demonstrate strong recall of older literature and can point directly to the original sources of key ideas—helping improve the accuracy and integrity of citations.
The Environmental "Metabolism" of Research
A recurring theme in my work is the tension between the "easy exploitation" of information and the environmental cost. To provide a concrete example, I have estimated the footprint required to produce a substantive research commentary, such as my recent work on the Richards Equation.
Metric | Per Standard Query | Per Reasoning Query | Total for Research Cycle (est. 15-20 interactions) |
Energy | ~0.34 Wh | 4.3 – 33.6 Wh | ~110 - 250 Wh |
Carbon ($CO_2e$) | ~0.15 g | ~1.5 - 12.0 g | ~25 - 50 g |
Water Withdrawal | ~0.26 ml | ~3.5 - 25.0 ml | ~150 - 500 ml |
Contextualizing the Impact:
- Energy: The ~150 Wh consumed is roughly equivalent to leaving a 10W LED bulb on for 15 hours.
- Water: At the upper end, the research for a single deep-dive commentary "drinks" about 500ml of water (one standard bottle), used for cooling data centers.
- Carbon: 50g of$CO_2$is equivalent to driving a gasoline car for approximately 250 meters.
References
1. LLM4SR: A Survey on Large Language Models for Scientific Research
Authors: Various (survey paper)
arXiv preprint arXiv:2501.04306, 2025
https://arxiv.org/abs/2501.04306
2. Exploring the role of large language models in the scientific method: from hypothesis to discovery
npj Artificial Intelligence (Nature Portfolio), 2025
https://www.nature.com/articles/s44387-025-00019-5
3. A Survey of Human-AI Collaboration for Scientific Discovery
\Preprints.org, 2026
https://www.preprints.org/manuscript/202601.0405/v1
4. Scientific Discoveries by LLM Agents
OpenReview (conference/journal paper)
https://openreview.net/pdf?id=fxL6eFPsd1
5. How Much Energy Do LLMs Consume? Unveiling the Power Behind AI
ADASCI Blog / systematic reviews on LLM energy consumption
Approximate figure: GPT-3 training ~1,287 MWh
https://adasci.org/blog/how-much-energy-do-llms-consume-unveiling-the-power-behind-ai
6. Embracing large language model (LLM) technologies in hydrology research
IOPscience / Environmental Research: Infrastructure and Sustainability, recent
https://iopscience.iop.org/article/10.1088/3033-4942/addd43
7. Large Language Models as Calibration Agents in Hydrological Modeling
Geophysical Research Letters (AGU Journals), 2025
https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2025GL120043

No comments:
Post a Comment