Data wrangling, the multi-faceted process by which the data required by an application is identified, extracted, cleaned and integrated, is often cumbersome and labor intensive. In this paper, we present an architecture that supports a complete data wrangling lifecycle, orchestrates components dynamically, builds on automation wherever possible, is informed by whatever data is available, refines automatically produced results in the light of feedback, takes into account the user's priorities, and supports data scientists with diverse skill sets. The architecture is demonstrated in practice for wrangling property sales and open government data.
An enterprise database contains a global, integrated, and consistent representation of a company's data. Multi-level modeling facilitates the definition and maintenance of such an integrated conceptual data model in a dynamic environment of changing data requirements of diverse applications. Multi-level models transcend the traditional separation of class and object with clabjects as the central modeling primitive, which allows for a more flexible and natural representation of many real-world use cases. In deep instantiation, the number of instantiation levels of a clabject or property is indicated by a single potency. Dual deep modeling (DDM) differentiates between source potency and target potency of a property or association and supports the flexible instantiation and refinement of the property by statements connecting clabjects at different modeling levels. DDM comes with multiple generalization of clabjects, subsetting/specialization of properties, and multi-level cardinality constraints. Examples are presented using a UML-style notation for DDM together with UML class and object diagrams for the representation of two-level user views derived from the multi-level model. Syntax and semantics of DDM are formalized and implemented in F-Logic, supporting the modeler with integrity checks and rich query facilities.Communicated by Prof.
Abstract. Multi-level modeling aims to reduce redundancy in data models by defining properties at the right abstraction level and inheriting them to more specific levels. We revisit one of the earliest such approaches, Telos, and investigate what needs to be added to its axioms to get a true multi-level modeling language. Unlike previous approaches, we define levels not with numeric potencies but with hierarchies of so-called most general instances.
Abstract. Application integration requires the consideration of instance data and schema data. Instance data in one application may be schema data for another application, which gives rise to multiple instantiation levels. Using deep instantiation, an object may be deeply characterized by representing schema data about objects several instantiation levels below. Deep instantiation still demands a clear separation of instantiation levels: the source and target objects of a relationship must be at the same instantiation level. This separation is inadequate in the context of application integration. Dual deep instantiation (DDI), on the other hand, allows for relationships that connect objects at different instantiation levels. The depth of the characterization may be specified separately for each end of the relationship. In this paper, we present and implement set-theoretic predicates and axioms for the representation of conceptual models with DDI.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.