Provenance for database queries or scientific workflows is often motivated as
providing explanation, increasing understanding of the underlying data sources
and processes used to compute the query, and reproducibility, the capability to
recompute the results on different inputs, possibly specialized to a part of
the output. Many provenance systems claim to provide such capabilities;
however, most lack formal definitions or guarantees of these properties, while
others provide formal guarantees only for relatively limited classes of
changes. Building on recent work on provenance traces and slicing for
functional programming languages, we introduce a detailed tracing model of
provenance for multiset-valued Nested Relational Calculus, define trace slicing
algorithms that extract subtraces needed to explain or recompute specific parts
of the output, and define query slicing and differencing techniques that
support explanation. We state and prove correctness properties for these
techniques and present a proof-of-concept implementation in Haskell.Comment: PPDP 201