2013
DOI: 10.1016/j.jpdc.2012.10.001
|View full text |Cite
|
Sign up to set email alerts
|

Accelerating text mining workloads in a MapReduce-based distributed GPU environment

Abstract: Scientific computations have been using GPU-enabled computers successfully, often relying on distributed nodes to overcome the limitations of device memory. Only a handful of text mining applications benefit from such infrastructure. Since the initial steps of text mining are typically data-intensive, and the ease of deployment of algorithms is an important factor in developing advanced applications, we introduce a flexible, distributed, MapReducebased text mining workflow that performs I/O-bound operations on… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0
1

Year Published

2013
2013
2019
2019

Publication Types

Select...
6
2
1

Relationship

0
9

Authors

Journals

citations
Cited by 19 publications
(6 citation statements)
references
References 49 publications
0
5
0
1
Order By: Relevance
“…It is an abstraction that allows users to easily create parallel applications while hiding the details of data distribution, load balancing, and fault tolerance. At present, it is popular in text mining of various applications, especially natural language processing (NLP) and machine learning [8], [31], [37]. Laclavik et al presented a pattern of annotation tool based on the MapReduce architecture to process large amount of text data [13].…”
Section: Related Workmentioning
confidence: 99%
“…It is an abstraction that allows users to easily create parallel applications while hiding the details of data distribution, load balancing, and fault tolerance. At present, it is popular in text mining of various applications, especially natural language processing (NLP) and machine learning [8], [31], [37]. Laclavik et al presented a pattern of annotation tool based on the MapReduce architecture to process large amount of text data [13].…”
Section: Related Workmentioning
confidence: 99%
“…So MapReduce is able to handle large amount of data processing problem which is difficult to use general servers. Now it is popular in text mining of various applications [18], especially Natural Language Processing (NLP) and Machine Learning (ML), as the MapReduce paradigm has emerged as a highly successful programing model for large-scale data-intensive computing applications [19]. Laclavik et al presented a pattern of annotation tool based on MapReduce architecture to process large amount of text data [20].…”
Section: Related Workmentioning
confidence: 99%
“…This flexibility allows Lucene's API to be independent of the file format. Text from PDFs, HTML, Microsoft Word, and OpenDocument documents, can all be indexed as long as their textual information can be extracted [14]. Since the search operations of Lucene are performed in the indexed file, the metadata records, which are stored in relational database, should be converted to the indexed file in advance.…”
Section: Metadata Retrievalmentioning
confidence: 99%