-There is a problem in the process of data analysis, which is related to their extraction and preparation. This problem is the consequence of a necessity for integration of heterogeneous structures both in structure and format. The technical solution to this problem is to use ETL-systems that automate processes of extraction, transformation and loading of data into a storage according to strictly defined rules. To date, scientific research in this area focuses on increasing performance and documenting the semantics of the process for its reuse. The paper presents results of a review and analysis of actual solutions in the field of extraction of heterogeneous data of large volume.
In this paper, we explore the ways to represent big social graphs using adjacency lists and edge lists. Furthermore, we describe a list-based algorithm for graph folding that makes possible to analyze conditionally infinite social graphs on resource constrained mobile devices. The steps of the algorithm are (a) to partition, in a certain way, the graph into clusters of different levels, (b) to represent each cluster of the graph as an edge list, and (c) to absorb the current cluster by the cluster of the next level. The proposed algorithm is illustrated by the example of a sparse social graph.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.