Bayesian networks and complex survey sampling from finite populations

Ballin, M; Scanu,; Vicard, Paola

We propose a novel methodology based on the concept of Bayesian network (BN, see Cowell et al., 1999) for the estimation of a joint probability distribution of a set of categorical variables when samples are drawn according to complex survey designs. Note that, restricting ourselves to categorical variables, the previous aim corresponds to estimation of a contingency table, a very frequent problem in Ofﬁcial Statistics. BNs are graphical devices largely used in many different scientiﬁc contexts, such as artiﬁcial intelligence and multivariate statistics (Neapolitan, 2004). However, when estimating and using BNs, observations have always been considered as i.i.d. generations from a suitable joint distribution function. Up to now, BNs have never been deﬁned and applied when sampling from ﬁnite populations. This paper shows that BNs can be easily adapted to the context of ﬁnite survey sampling via the deﬁnition of a suitable additional variable, in the following denoted with SD, representing the survey design. Hence, SD will be a categorical variable with as many states as the different inclusion probabilities of ﬁrst order. The BN representation allows the deﬁnition of a much larger class of estimators, of the model assisted type (see Sarndal ¨ et al., 1992). Also, the possibility to use poststratiﬁcation methods and, in general, integration of different surveys is illustrated.

Ballin, M., Scanu, ., Vicard, P. (2005). Bayesian networks and complex survey sampling from finite populations. In 2005 FCSM Conference Papers.