This paper presents RoadRunner, a research project that aims at developing solutions for automatically extracting data from large HTML data sources. The target of our research are data-intensive Web sites, i.e., HTML-based sites with a fairly complex structure, that publish large amounts of data. The paper describes the top-level software architecture of the Road RunnerSystem, and the novel research challenges posed by the attempt to automate the information extraction process.

Valter Crescenzi, Giansalvatore Mecca, & Paolo Merialdo (2002). Automatic web information extraction in the ROADRUNNER system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp.264-277). Springer [10.1007/3-540-46140-X_21].

Automatic web information extraction in the ROADRUNNER system

CRESCENZI, VALTER;MECCA, Giansalvatore;MERIALDO, PAOLO
2002

Abstract

This paper presents RoadRunner, a research project that aims at developing solutions for automatically extracting data from large HTML data sources. The target of our research are data-intensive Web sites, i.e., HTML-based sites with a fairly complex structure, that publish large amounts of data. The paper describes the top-level software architecture of the Road RunnerSystem, and the novel research challenges posed by the attempt to automate the information extraction process.
978-354044122-9
Valter Crescenzi, Giansalvatore Mecca, & Paolo Merialdo (2002). Automatic web information extraction in the ROADRUNNER system. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (pp.264-277). Springer [10.1007/3-540-46140-X_21].
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: http://hdl.handle.net/11590/308040
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 13
  • ???jsp.display-item.citation.isi??? 2
social impact