We propose a novel methodology based on the concept of Bayesian network (BN, see Cowell et al., 1999) for the estimation of a joint probability distribution of a set of categorical variables when samples are drawn according to complex survey designs. Note that, restricting ourselves to categorical variables, the previous aim corresponds to estimation of a contingency table, a very frequent problem in Official Statistics. BNs are graphical devices largely used in many different scientific contexts, such as artificial intelligence and multivariate statistics (Neapolitan, 2004). However, when estimating and using BNs, observations have always been considered as i.i.d. generations from a suitable joint distribution function. Up to now, BNs have never been defined and applied when sampling from finite populations. This paper shows that BNs can be easily adapted to the context of finite survey sampling via the definition of a suitable additional variable, in the following denoted with SD, representing the survey design. Hence, SD will be a categorical variable with as many states as the different inclusion probabilities of first order. The BN representation allows the definition of a much larger class of estimators, of the model assisted type (see Sarndal ¨ et al., 1992). Also, the possibility to use poststratification methods and, in general, integration of different surveys is illustrated.
Ballin, M., Scanu, ., Vicard, P. (2005). Bayesian networks and complex survey sampling from finite populations. In 2005 FCSM Conference Papers.
Bayesian networks and complex survey sampling from finite populations
VICARD, Paola
2005-01-01
Abstract
We propose a novel methodology based on the concept of Bayesian network (BN, see Cowell et al., 1999) for the estimation of a joint probability distribution of a set of categorical variables when samples are drawn according to complex survey designs. Note that, restricting ourselves to categorical variables, the previous aim corresponds to estimation of a contingency table, a very frequent problem in Official Statistics. BNs are graphical devices largely used in many different scientific contexts, such as artificial intelligence and multivariate statistics (Neapolitan, 2004). However, when estimating and using BNs, observations have always been considered as i.i.d. generations from a suitable joint distribution function. Up to now, BNs have never been defined and applied when sampling from finite populations. This paper shows that BNs can be easily adapted to the context of finite survey sampling via the definition of a suitable additional variable, in the following denoted with SD, representing the survey design. Hence, SD will be a categorical variable with as many states as the different inclusion probabilities of first order. The BN representation allows the definition of a much larger class of estimators, of the model assisted type (see Sarndal ¨ et al., 1992). Also, the possibility to use poststratification methods and, in general, integration of different surveys is illustrated.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.