Motivated by the study of pollution trends in the city of Bergen, we introduce a flexible statistical framework for modeling multivariate air pollution data via a nonhomogeneous hidden semi-Markov vector autoregression. The hidden process captures unobserved environmental conditions, while the vector autoregressive structure accounts for temporal autocorrelation and cross-pollutant dependencies. The model further allows time-varying environmental conditions to influence both the average levels of pollutant concentrations and the duration of different transient states. Parameters are estimated via maximum likelihood using a tailored expectation-maximization (EM) algorithm, integrated with state-specific ℓ1 regularization to control overfitting and automatically select relevant temporal lags. The proposal is tested on simulated data under different scenarios and then applied to daily concentrations of nitrogens and particulate matter recorded in an urban area. Environmental risk is assessed by a Shapley value-based decomposition that attributes marginal risk contributions. This approach offers a comprehensive framework for multivariate environmental risk modeling, enabling better identification of high-pollution episodes and informing policy interventions.
Mingione, M., Alaimo Di Loro, P., Lagona, F., Maruotti, A. (2026). Environmental risk assessment via nonhomogeneous hidden semi-Markov models with penalized vector autoregression. THE ANNALS OF APPLIED STATISTICS, 20(1), 215-237 [10.1214/26-AOAS2142].
Environmental risk assessment via nonhomogeneous hidden semi-Markov models with penalized vector autoregression
Mingione, Marco;Lagona, Francesco;Maruotti, Antonello
2026-01-01
Abstract
Motivated by the study of pollution trends in the city of Bergen, we introduce a flexible statistical framework for modeling multivariate air pollution data via a nonhomogeneous hidden semi-Markov vector autoregression. The hidden process captures unobserved environmental conditions, while the vector autoregressive structure accounts for temporal autocorrelation and cross-pollutant dependencies. The model further allows time-varying environmental conditions to influence both the average levels of pollutant concentrations and the duration of different transient states. Parameters are estimated via maximum likelihood using a tailored expectation-maximization (EM) algorithm, integrated with state-specific ℓ1 regularization to control overfitting and automatically select relevant temporal lags. The proposal is tested on simulated data under different scenarios and then applied to daily concentrations of nitrogens and particulate matter recorded in an urban area. Environmental risk is assessed by a Shapley value-based decomposition that attributes marginal risk contributions. This approach offers a comprehensive framework for multivariate environmental risk modeling, enabling better identification of high-pollution episodes and informing policy interventions.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


