Proceedings of the 21st Annual International Conference on Supercomputing 2007
DOI: 10.1145/1274971.1274975
|View full text |Cite
|
Sign up to set email alerts
|

Scalability of the Nutch search engine

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
15
0

Year Published

2008
2008
2021
2021

Publication Types

Select...
3
2
1

Relationship

0
6

Authors

Journals

citations
Cited by 22 publications
(15 citation statements)
references
References 9 publications
0
15
0
Order By: Relevance
“…We studied the scalability of P orgl and P ghtl by doing experiments on IBM intranet website data as used in [1]. The text data was extracted from HTML files and loaded equally into the memory of the producer nodes, before, the indexing time measurement is started.…”
Section: Methodsmentioning
confidence: 99%
See 3 more Smart Citations
“…We studied the scalability of P orgl and P ghtl by doing experiments on IBM intranet website data as used in [1]. The text data was extracted from HTML files and loaded equally into the memory of the producer nodes, before, the indexing time measurement is started.…”
Section: Methodsmentioning
confidence: 99%
“…We get peak indexing rate within a single-index group(G = 64) of about 2.44 GB/min. Now assuming search is scalable to around 2K nodes [1], if we have 2K such independent indexgroups, each of size 64 nodes, we will get a peak indexing rate of around 5 TB/min, while maintaining acceptable search performance. As part of our experiment, we instead used 8K nodes and got a peak indexing rate 312 GB/min.…”
Section: Strong Scalability Studymentioning
confidence: 99%
See 2 more Smart Citations
“…Nutch is particularly well suited for scaling out with a large number of commodity hardware [32,33].…”
Section: Building An Inverted Indexmentioning
confidence: 99%