Capturing and querying fine-grained provenance of preprocessing pipelines in data science