As stated in Heeringa and Groves (2004): “The ability to continually monitor the streams of process data and survey data creates the opportunity to alter the design during the course of data collection in order to improve cost efficiency and achieve more precise, less biased estimates.” The designs that are adaptive to the flow of process data are usually named responsive designs. Actually there are a number of indicators that describe the process stream, based on the interviewer, housing unit, respondent, attempt, and response characteristics. These indicators are usually named paradata. The use of paradata in responsive designs may be described in the following way: 1. the process of altering the design is a decision procedure; 2. each possible decision is associated with costs/benefits; 3. each decision can be taken by an optimization procedure, more precisely by finding the decision with the lowest expected cost; 4. the expected costs are updated by the flow of paradata. This procedure shows that the notion of expected value is central not only in planning the survey or in evaluating the final results (e.g. variance or mean square error) but in the management of the processes flow too. Such expected value has to be applied to the cost associated to the different survey phases. In order to deduce the expected costs, it is important to understand how paradata interact, or in other words, to find a suitable multivariate model for paradata and an easy scheme for model parameter updating when some paradata are observed. The different costs are those related to the actions (or decisions) of the survey manager. Hence the previous problem is actually a decision problem. Statistical decision theory is widely applied in the most diverse settings, but it is still not widely used for survey planning and management. An easy way to deal with this problem is offered by graphical models known as Bayesian networks (BN), Cowel et al. (1999). When BNs are used in a decision context, they are usually named Influence diagrams (ID).

Ballin, M., Scanu, M., Vicard, P. (2006). Paradata and Bayesian Networks: A Tool for Monitoring and Troubleshooting the Data Production Process. In Proceedings della European Conference on Quality in Survey Statistics Q2006.

Paradata and Bayesian Networks: A Tool for Monitoring and Troubleshooting the Data Production Process

VICARD, Paola
2006-01-01

Abstract

As stated in Heeringa and Groves (2004): “The ability to continually monitor the streams of process data and survey data creates the opportunity to alter the design during the course of data collection in order to improve cost efficiency and achieve more precise, less biased estimates.” The designs that are adaptive to the flow of process data are usually named responsive designs. Actually there are a number of indicators that describe the process stream, based on the interviewer, housing unit, respondent, attempt, and response characteristics. These indicators are usually named paradata. The use of paradata in responsive designs may be described in the following way: 1. the process of altering the design is a decision procedure; 2. each possible decision is associated with costs/benefits; 3. each decision can be taken by an optimization procedure, more precisely by finding the decision with the lowest expected cost; 4. the expected costs are updated by the flow of paradata. This procedure shows that the notion of expected value is central not only in planning the survey or in evaluating the final results (e.g. variance or mean square error) but in the management of the processes flow too. Such expected value has to be applied to the cost associated to the different survey phases. In order to deduce the expected costs, it is important to understand how paradata interact, or in other words, to find a suitable multivariate model for paradata and an easy scheme for model parameter updating when some paradata are observed. The different costs are those related to the actions (or decisions) of the survey manager. Hence the previous problem is actually a decision problem. Statistical decision theory is widely applied in the most diverse settings, but it is still not widely used for survey planning and management. An easy way to deal with this problem is offered by graphical models known as Bayesian networks (BN), Cowel et al. (1999). When BNs are used in a decision context, they are usually named Influence diagrams (ID).
2006
Ballin, M., Scanu, M., Vicard, P. (2006). Paradata and Bayesian Networks: A Tool for Monitoring and Troubleshooting the Data Production Process. In Proceedings della European Conference on Quality in Survey Statistics Q2006.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11590/175482
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact