2009
DOI: 10.1109/mis.2009.36
|View full text |Cite
|
Sign up to set email alerts
|

The Unreasonable Effectiveness of Data

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

17
716
0
10

Year Published

2012
2012
2019
2019

Publication Types

Select...
5
4
1

Relationship

0
10

Authors

Journals

citations
Cited by 1,356 publications
(743 citation statements)
references
References 10 publications
17
716
0
10
Order By: Relevance
“…When dealing with today's large-scale data sets, many data mining practitioners therefore often abandon deterministic approaches and resort to randomized approaches. However, huge data such as collections of online books at Amazon TM , image repositories at Flickr TM or Google TM , or personal health records [16,19,23,24,29] are becoming ever more common and thus pose a challenge to research on interpretable matrix factorization.…”
Section: Introductionmentioning
confidence: 99%
“…When dealing with today's large-scale data sets, many data mining practitioners therefore often abandon deterministic approaches and resort to randomized approaches. However, huge data such as collections of online books at Amazon TM , image repositories at Flickr TM or Google TM , or personal health records [16,19,23,24,29] are becoming ever more common and thus pose a challenge to research on interpretable matrix factorization.…”
Section: Introductionmentioning
confidence: 99%
“…In the last several years, an active and inventive group at Google, possibly inspired by Halevy, Norvig, and Pereira [37], collected and analyzed millions of tables harvested from the web [1,38,39]. Visual verification of their results has necessarily been restricted to much smaller samples.…”
Section: Physical Structure Extractionmentioning
confidence: 99%
“…Do other types of anomaly detectors, or more generally, learning algorithms for other data mining tasks also exhibit the gravity-defiant behaviour? In a complex domain such as natural language processing, millions of additional data has been shown to continue to improve the performance of trained models (Halevy et al 2009;Banko and Brill 2001). Is this the domain for which algorithms always comply with the learning curve?…”
Section: Implications and Potential Future Workmentioning
confidence: 99%