The paper presents a possible solution to the problems of structuring data of a large volume, as well as their integrated storage in structures that ensure the integrity, consistency of their presentation, high speed and flexibility of processing unstructured information. To solve mentioned problems, the authors propose a method for developing a multi-level ontological structure that provides a solution to interrelated problems of identifying, structuring and processing big data sets that has primarily natural-linguistic forms of representation. This multi-level model is developed based on methods of semantic analysis and relative modeling. The model is suitable for the interpretation and effective integrated processing of unstructured data obtained from distributed sources of information. The multilevel representation of the big data determines the methods and mechanisms of the unified meta-description of the data elements at the logical level, the search for patterns and classification of the characteristic space at the semantic level, and the linguistic level of the procedures for identifying, consolidating and enriching data. The modification of this method consists in applying a scalable and computationally effective genetic algorithm for searching and generating weight coefficients that correspond to different similarity measures for the set of observed features used in the dataclustering model.