The goal of statistical matching, at a macro level, is the estimation of the joint distribution of variables separately observed in independent samples. The lack of joint information on the variables of interest leads to uncertainty about the data generating model. In this paper we propose the use of graphical models to deal with the statistical matching uncertainty for multivariate categorical variables. The use of Bayesian networks in the statistical matching context allows both to introduce extra sample information on the dependence structure between the variables of interest and to use such an information to factorize the joint probability distribution according to the graph decomposition of a multivariate dependence in lower dimension components. This representation of the joint probability distribution, taking advantage of local relationships, allows to simplify both parameters estimation and statistical matching quality evaluation in a multivariate context. A simulation experiment is performed in order to evaluate the performance of the proposed methodology with and without auxiliary information, as well as to compare it with the saturated multinomial model, in terms of uncertainty reduction. Finally, an application to a real case is provided. Results show a considerable improvement in the quality of statistical matching when the dependence structure is taken into account.
Luigi Conti, P., Marella, D., Vicard, P., Vitale, V. (2021). Multivariate statistical matching using graphical models. INTERNATIONAL JOURNAL OF APPROXIMATE REASONING [10.1016/j.ijar.2020.12.006].
Multivariate statistical matching using graphical models
Paola Vicard;
2021-01-01
Abstract
The goal of statistical matching, at a macro level, is the estimation of the joint distribution of variables separately observed in independent samples. The lack of joint information on the variables of interest leads to uncertainty about the data generating model. In this paper we propose the use of graphical models to deal with the statistical matching uncertainty for multivariate categorical variables. The use of Bayesian networks in the statistical matching context allows both to introduce extra sample information on the dependence structure between the variables of interest and to use such an information to factorize the joint probability distribution according to the graph decomposition of a multivariate dependence in lower dimension components. This representation of the joint probability distribution, taking advantage of local relationships, allows to simplify both parameters estimation and statistical matching quality evaluation in a multivariate context. A simulation experiment is performed in order to evaluate the performance of the proposed methodology with and without auxiliary information, as well as to compare it with the saturated multinomial model, in terms of uncertainty reduction. Finally, an application to a real case is provided. Results show a considerable improvement in the quality of statistical matching when the dependence structure is taken into account.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.