Physically Augmented Training for Generalizable Deep Learning in Environmental Modeling

Guglielmo, Gianmarco

In recent years, Machine Learning (ML) has attracted growing interest across virtually every scientific domain, including environmental sciences, where it is increasingly used to identify patterns in data, make predictions, and model complex physical systems with the promising advantage of reducing the computational burden of traditional numerical simulations. However, ML methods generally require large amounts of high-quality data to achieve reliable performance. In many fields—such as hydrology and hydraulics—the limited availability of extensive training data severely restricts the applicability of data-driven approaches. Therefore, researchers have resorted to numerical models to build training datasets, as in flood hazard mapping problems. Given the limited generalization capability of data-driven models—i.e., their reduced ability to perform well on unseen data, transfer knowledge to new domains or configurations, or extrapolate beyond the bounds of the training dataset—they typically need to be retrained for each new application. This substantially diminishes the practical advantages that data-driven methods promised to offer over numerical simulations. Indeed, their proficient use is closely linked to the ability to construct reliable, albeit computationally demanding, numerical models. In short, the true bottleneck lies in their limited ability to generalize. One possible way to endow ML approaches with this desired ability is to exploit the underlying physical laws from which environmental data originate. Physics-informed ML builds on this ambition to enhance model performance. Within this broader direction—although with notable conceptual differences from the widespread concept of Physics-Informed Neural Networks, which are discussed in this work—the methodology employed here, namely Physical Augmented Training, modifies the loss function by introducing a physical regularization term. This term couples model inputs and outputs through physical relations, effectively augmenting the informational content available during training. The methodology is evaluated with the objective of improving model generalization across three progressively complex testbeds: a simple physical system, a one-dimensional, highly controllable hydraulic experimental setup, and a two-dimensional flood hazard mapping problem. In the latter case, a model is trained on data from a single river catchment under multiple synthetic rainfall events and tested on a previously unseen basin in a one-to-one basin-transfer setup. Improving catchment generalization is a key objective, as it would greatly increase the effectiveness of data-driven flood hazard mapping models, enabling them to go beyond the simple interpolation of synthetic data generated by numerical models. The proposed flood-mapping framework demonstrates promising potential for predicting flood hazard across entire river networks or multiple catchments, particularly in data-scarce regions, supporting effective knowledge transfer from data-rich basins to ungauged catchments, a long-standing chimera of hydrology. Moreover, the methodology to embed physical information appears versatile and potentially applicable to a broad range of environmental modeling tasks, offering a general strategy for incorporating physics into data-driven approaches.

Guglielmo, G. (2026). Physically Augmented Training for Generalizable Deep Learning in Environmental Modeling.