Benvenuti nell'Anagrafe della Ricerca d'Ateneo

The development of solutions to scale the extraction of data from Web sources is still a challenging issue. High accu- racy can be achieved by supervised approaches, but the costs of training data, i.e., annotations over a set of sam- ple pages, limit their scalability. Crowdsourcing platforms are making the manual annotation process more affordable. However, the tasks demanded to these platforms should be extremely simple, to be performed by non-expert people, and their number should be minimized, to contain the costs. We demonstrate alfred, a wrapper inference system super- vised by the workers of a crowdsourcing platform. Training data are labeled values generated by means of membership queries, the simplest form of queries, posed to the crowd. alfred includes several original features: it automatically selects a representative sample set from the input collection of pages; in order to minimize the wrapper inference costs, it dynamically sets the expressiveness of the wrapper for- malism and it adopts an active learning algorithm to select the queries posed to the crowd; it is able to manage inaccu- rate answers that can be provided by the workers engaged by crowdsourcing platforms.

Crescenzi, V., Merialdo, P., Qiu, D. (2013). ALFRED: Crowd assisted data extraction. In WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web (pp.297-300).

ALFRED: Crowd assisted data extraction

CRESCENZI, VALTER;MERIALDO, PAOLO;QIU, DISHENG

2013-01-01

Abstract

The development of solutions to scale the extraction of data from Web sources is still a challenging issue. High accu- racy can be achieved by supervised approaches, but the costs of training data, i.e., annotations over a set of sam- ple pages, limit their scalability. Crowdsourcing platforms are making the manual annotation process more affordable. However, the tasks demanded to these platforms should be extremely simple, to be performed by non-expert people, and their number should be minimized, to contain the costs. We demonstrate alfred, a wrapper inference system super- vised by the workers of a crowdsourcing platform. Training data are labeled values generated by means of membership queries, the simplest form of queries, posed to the crowd. alfred includes several original features: it automatically selects a representative sample set from the input collection of pages; in order to minimize the wrapper inference costs, it dynamically sets the expressiveness of the wrapper for- malism and it adopts an active learning algorithm to select the queries posed to the crowd; it is able to manage inaccu- rate answers that can be provided by the workers engaged by crowdsourcing platforms.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno di pubblicazione
	
				2013
			
	Codice ISBN
	
				9781450320382
9781450320382
			
	Citazione
	
				Crescenzi, V., Merialdo, P., Qiu, D. (2013). ALFRED: Crowd assisted data extraction. In WWW 2013 Companion - Proceedings of the 22nd International Conference on World Wide Web (pp.297-300).
			
	Appare nelle tipologie:
	
				4.1 Contributo in Atti di convegno

File in questo prodotto:

Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11590/307636

Citazioni

ND

8

6

social impact