We track the lineage of tuples throughout their database lifetime. That is, we consider a scenario in which tuples (records) that are produced by a query may affect other tuple insertions into the DB, as part of a normal workflow. As time goes on, exact provenance explanations for such tuples become deeply nested, increasingly consuming space, and resulting in decreased clarity and readability.We present a novel approach for approximating lineage tracking, using a Machine Learning (ML) and Natural Language Processing (NLP) technique; namely, word embedding. The basic idea is summarizing (and approximating) the lineage of each tuple via a small set of constant-size vectors (the number of vectors per-tuple is a hyperparameter). For explicitly (and independently of DB contents) inserted tuples -the vectors are obtained via a pre-trained word vectors model over their underlying database domain "text". During the execution of a query, we construct the lineage vectors of the final (and intermediate) result tuples in a similar fashion to that of semiring-based exact provenance calculations. We extend the + and β’ operations to generate sets of lineage vectors, while retaining the ability to propagate information and preserve the compact representation. Therefore, our solution does not suffer from space complexity blow-up over time, and it "naturally ranks" explanations to the existence of a tuple.We devise a genetics-inspired improvement to our basic method. The data columns of an entity (and potentially other columns) are a tuple's basic properties, i.e., the "genes" that combine to form its genetic code. We design an alternative lineage tracking mechanism, that of keeping track of and querying lineage (via embeddings) at the column ("gene") level; thereby, we manage to better distinguish between the provenance features and the textual characteristics of a tuple. Finding the lineage of a tuple in the DB is analogous to finding its predecessors via DNA examination.We further introduce several improvements and extensions to the basic method: tuple creation timestamp, column emphasis and query dependency DAG.We integrate our lineage computations into the PostgreSQL system via an extension (ProvSQL) and extensive experiments exhibit useful results in terms of accuracy against exact, semiring-based, justifications, especially for the column-based (CV) method which exhibits high precision and high per-level recall. In the experiments, we focus on tuples with multiple generations of tuples in their lifelong lineage and analyze them in terms of direct and distant lineage.