The product domain contains valuable data for many important applications. Given the large and increasing number of sources that provide data about product specifications and the velocity as well as the variety with which such data are available, this domain represents a challenging scenario for developing and evaluating big data integration solutions. In this paper, we present the results of our efforts towards big data integration for product specifications. We present a pipeline that decomposes the problem into different tasks from source and data discovery, to extraction, data linkage, schema alignment and data fusion. Although we present the pipeline as a sequence of tasks, different configurations can be defined depending on the application goals.
Luciano, B., Crescenzi, V., Xin Luna Dong, ., Merialdo, P., Piai, F., Qiu, D., et al. (2018). Big Data Integration for Product Specifications. IEEE DATA ENGINEERING BULLETIN, 41(2), 71-81.