Knowledge graphs are important for industrial digitalization. Industrial knowledge graphs are often mapped from multiple existing large data sources, and creating a mapping requires the time of scarce subject matter experts (SME). Interactive, literal programming for large scale mapping would allow mapping engineers to make good use of SME time, and document their work. Currently, there are no open source tools supporting such a process. To solve this problem, we implement maplib, which leverages existing tooling from data science. In data science, there is widespread use of literate programming using frameworks such as Jupyter notebooks to interactively prepare data and create analyses using in-memory tables called DataFrames. Maplib is implemented in Rust using Polars DataFrames and has Python bindings, allowing us to leverage tooling used in data science. Maplib implements the OTTR mapping language, which is highly suited for industrial use cases. Maplib features a SPARQL engine defined directly on DataFrames, making querying possible immediately after mapping. We evaluate our approach by comparing mapping and querying performance with Morph-KGC and SPARQL Anything on the GTFS Madrid benchmark. Our approach materializes the graph and is ready to query 47x-182x faster, and scales to models that are over twice as large. Morph-KGC and SPARQL Anything perform better for most, but not all of the queries once the graph has been constructed. On the end-to-end task of mapping and querying however, which is very important for interactive mapping, maplib performs better for most queries.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.