2022
DOI: 10.1186/s40537-021-00554-3
|View full text |Cite
|
Sign up to set email alerts
|

Addressing big data variety using an automated approach for data characterization

Abstract: The creation of new knowledge from manipulating and analysing existing knowledge is one of the primary objectives of any cognitive system. Most of the effort on Big Data research has been focussed upon Volume and Velocity, while Variety, “the ugly duckling” of Big Data, is often neglected and difficult to solve. A principal challenge with Variety is being able to understand and comprehend the data. This paper proposes and evaluates an automated approach for metadata identification and enrichment in describing … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2022
2022
2023
2023

Publication Types

Select...
4
2

Relationship

0
6

Authors

Journals

citations
Cited by 6 publications
(4 citation statements)
references
References 22 publications
0
4
0
Order By: Relevance
“…These three approaches relied on the semantic parsing of the workload in-depth as a limited approach [22]. Moreover, a softwaredriven model-based non-semantic parsing for the big data workload was proposed in [31]. The proposed method based on metadata identification in big data focuses on the application of self-learning systems to enable automatic data compliance with legal requirements and the possibility of providing essential and easily accessible metadata for data classification.…”
Section: Background Researchmentioning
confidence: 99%
“…These three approaches relied on the semantic parsing of the workload in-depth as a limited approach [22]. Moreover, a softwaredriven model-based non-semantic parsing for the big data workload was proposed in [31]. The proposed method based on metadata identification in big data focuses on the application of self-learning systems to enable automatic data compliance with legal requirements and the possibility of providing essential and easily accessible metadata for data classification.…”
Section: Background Researchmentioning
confidence: 99%
“…Therefore, if there is no uncertainty in the membership function, this reduces to ordinary fuzzy sets. TY2(d) = {< (a i , r), m T (a i ,r) > | a i ∈ D} (7) where r ∈ P x ⊆ [0, 1].…”
Section: Type-2 Fuzzy Setsmentioning
confidence: 99%
“…This can include sensor information and data ranging to the subjective interpretations obtained from expert individuals and analysts. Currently, increasingly massive amounts of heterogeneous data and information from multiple sources are prevalent where the problems of Big Data are being managed [4][5][6][7]. However, although effective decision making should be able to make use of all the available and relevant information about such combined uncertainty, an assessment of the value of a generalization result is critical.…”
Section: Introductionmentioning
confidence: 99%
“…Another analytical difficulty faced by analysts when using “big data” is that, unlike more consciously constructed research projects where data collection and construction can be carefully planned, datasets collected via mobile computing, data digitization, and social media are often an amalgam of various data streams that, when cobbled together, have missing values for many observations (Einav & Levin, 2014; Emmanuel et al, 2021; Fan et al, 2014; Vranopoulos et al, 2022). When this occurs, a modeler may be tempted to ignore every observation that does not have a complete set of predictive attributes.…”
Section: Introductionmentioning
confidence: 99%