2013
DOI: 10.13053/rcs-70-1-18
|View full text |Cite
|
Sign up to set email alerts
|

Don't Use a Lot When Little Will Do: Genre Identification Using URLs

Abstract: The ever increasing data on world wide web calls for the use of vertical search engines. Sandhan is one such search engine which offers search in tourism and health genres in more than 10 different Indian languages. In this work we build a URL based genre identification module for Sandhan. A direct impact of this work is on building focused crawlers to gather Indian language content. We conduct experiments on tourism and health web pages in Hindi language. We experiment with three approaches-list based, naive … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2019
2019
2023
2023

Publication Types

Select...
2
1

Relationship

0
3

Authors

Journals

citations
Cited by 3 publications
references
References 10 publications
0
0
0
Order By: Relevance