2012
DOI: 10.1007/s13748-011-0004-4
|View full text |Cite
|
Sign up to set email alerts
|

Scaling up data mining algorithms: review and taxonomy

Abstract: The overwhelming amount of data that are now available in any field of research poses new problems for data mining and knowledge discovery methods. Due to this huge amount of data, most of the current data mining algorithms are inapplicable to many real-world problems. Data mining algorithms become ineffective when the problem size becomes very large. In many cases, the demands of the algorithm in terms of the running time are very large, and mining methods cannot be applied when the problem grows. This aspect… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1

Citation Types

0
10
0

Year Published

2012
2012
2020
2020

Publication Types

Select...
4
3
1

Relationship

1
7

Authors

Journals

citations
Cited by 30 publications
(10 citation statements)
references
References 112 publications
0
10
0
Order By: Relevance
“…The main problem of the method described above is the scalability [24]. When we deal with a large dataset, the cost of the RCGA is high.…”
Section: Constructing Supervised Projections Using a Rcgamentioning
confidence: 99%
“…The main problem of the method described above is the scalability [24]. When we deal with a large dataset, the cost of the RCGA is high.…”
Section: Constructing Supervised Projections Using a Rcgamentioning
confidence: 99%
“…Big data, so far, does not have a formal definition, although it is generally accepted that the concept refers to datasets that are too large to be processed using conventional data processing tools and techniques. Contemporary information systems produce data in huge quantities that are difficult to be measured [1]. It means that we already have found ourselves in the "big data era," and the question of how to solve largescale machine learning problems is open and requires a lot of research efforts.…”
Section: Introductionmentioning
confidence: 99%
“…It is usually assumed in the literature that linear-time algorithms are acceptable for scaling up to large datasets [8]. We also adopt this assumption.…”
Section: Introductionmentioning
confidence: 99%
“…8 show the results for the SVHN and XM2VTS datasets when using spherical hashing to speed-up the proposed methods, Here we name this version LSH, from Limited Spherical Hashing since we do not use the original procedure. Instead, we use a limited version which uses the dissimilarities instead of the binary codification since the latter increases the classification error, while the use of dissimilarities still provides Accuracy and execution times results when using LSH to speed up the proposed prototype selection methods in SVHN dataset Accuracy and execution times results when using SH to speed up the proposed prototype selection methods in XM2VTS dataset…”
mentioning
confidence: 99%