2016
DOI: 10.15439/2016f90
|View full text |Cite
|
Sign up to set email alerts
|

Massively Parallel Feature Extraction Framework Application in Predicting Dangerous Seismic Events

Abstract: Abstract-In this paper we introduce an automated mechanism for knowledge discovery from data streams. As a part of this work, we also present a new approach to the creation of classifiers ensemble based on a wide variety of models. Furthermore, we describe an innovative, highly scalable feature extraction and selection framework designed to work with the MapReduce programming model and the application of designed framework to build an ensemble of classifiers which takes into account both the quality and the di… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
7
0

Year Published

2016
2016
2022
2022

Publication Types

Select...
5
3

Relationship

2
6

Authors

Journals

citations
Cited by 10 publications
(7 citation statements)
references
References 23 publications
0
7
0
Order By: Relevance
“…In order to construct their solution, authors were using only the data available to all participants, however, due to their organizational involvement, team snm was excluded from the final ranking. More details regarding this solution can be found in [15].…”
Section: Overview Of the Competition Resultsmentioning
confidence: 99%
“…In order to construct their solution, authors were using only the data available to all participants, however, due to their organizational involvement, team snm was excluded from the final ranking. More details regarding this solution can be found in [15].…”
Section: Overview Of the Competition Resultsmentioning
confidence: 99%
“…Furthermore, we plan to consider changes in users' behavior and preferences by periodically updating frequent itemsets based on recent changes in rating history. It would also be of value to extend the users' and items' data representation by applying a more advanced feature extraction to model the similarities among them more effectively [44], [45], [46].…”
Section: Discussionmentioning
confidence: 99%
“…One way to do this is to train models on diverse subsets of objects and attributes [17]. The training set selection, complemented by parallelization of computation, can lead to better general- ization and minimize the overall training latency [16], [18]. It is usually essential to ingest several diverse data sources to provide adequately rich data representation with many different attributes.…”
Section: Related Workmentioning
confidence: 99%
“…The first step in the developed pipeline is data ingestion and integration. Next, we perform data cleansing, encoding categorical variables to the numeric form, extracting some custom characteristics from text columns, imputing missing values, and conducting further feature extraction (FE) [18], [28].…”
Section: Solution Overviewmentioning
confidence: 99%
See 1 more Smart Citation