This paper presents RoadRunner, a research project that aims at developing solutions for automatically extracting data from large HTML data sources. The target of our research are data-intensive Web sites, i.e., HTML-based sites with a fairly complex structure, that publish large amounts of data. The paper describes the top-level software architecture of the Road RunnerSystem, and the novel research challenges posed by the attempt to automate the information extraction process.

Crescenzi, V., Mecca, G., Merialdo, P. (2002). Automatic web information extraction in the ROADRUNNER system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp.264-277). Springer [10.1007/3-540-46140-X_21].

Automatic web information extraction in the ROADRUNNER system

CRESCENZI, VALTER;MECCA, Giansalvatore;MERIALDO, PAOLO
2002-01-01

Abstract

This paper presents RoadRunner, a research project that aims at developing solutions for automatically extracting data from large HTML data sources. The target of our research are data-intensive Web sites, i.e., HTML-based sites with a fairly complex structure, that publish large amounts of data. The paper describes the top-level software architecture of the Road RunnerSystem, and the novel research challenges posed by the attempt to automate the information extraction process.
2002
978-354044122-9
Crescenzi, V., Mecca, G., Merialdo, P. (2002). Automatic web information extraction in the ROADRUNNER system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp.264-277). Springer [10.1007/3-540-46140-X_21].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11590/308040
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 2
social impact