2021
DOI: 10.5614/itbj.ict.res.appl.2021.15.3.1
|View full text |Cite
|
Sign up to set email alerts
|

Development of Focused Crawlers for Building Large Punjabi News Corpus

Abstract: Web crawlers are as old as the Internet and are most commonly used by search engines to visit websites and index them into repositories. They are not limited to search engines but are also widely utilized to build corpora in different domains and languages. This study developed a focused set of web crawlers for three Punjabi news websites. The web crawlers were developed to extract quality text articles and add them to a local repository to be used in further research. The crawlers were implemented using the P… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...

Citation Types

0
0
0

Year Published

2024
2024
2024
2024

Publication Types

Select...
1

Relationship

0
1

Authors

Journals

citations
Cited by 1 publication
references
References 9 publications
0
0
0
Order By: Relevance