We present OpenTriage, a system for extracting structured entities from detail Web pages of several sites and finding linkages between the extracted data. The system builds an integrated knowledge base by leveraging the redundancy of information with an Open Information Extraction approach: it incrementally processes all the available pages while discovering new attributes. It is based on a hybrid human-machine learning technique that targets a desired quality level. After two preliminary tasks, i.e., blocking and extraction, OpenTriage interleaves two integration tasks, i.e., linkage, and matching, while managing the uncertainty by means of very simple questions that are posed to an external oracle.
Voyat, R., Crescenzi, V., Merialdo, P. (2022). OpenTRIAGE: Entity Linkage for DetailWebpages. In CEUR Workshop Proceedings (pp.1-12). CEUR-WS.
OpenTRIAGE: Entity Linkage for DetailWebpages
Voyat R.;Crescenzi V.;Merialdo P.
2022-01-01
Abstract
We present OpenTriage, a system for extracting structured entities from detail Web pages of several sites and finding linkages between the extracted data. The system builds an integrated knowledge base by leveraging the redundancy of information with an Open Information Extraction approach: it incrementally processes all the available pages while discovering new attributes. It is based on a hybrid human-machine learning technique that targets a desired quality level. After two preliminary tasks, i.e., blocking and extraction, OpenTriage interleaves two integration tasks, i.e., linkage, and matching, while managing the uncertainty by means of very simple questions that are posed to an external oracle.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.