An Approach of Web Scraping on News Website based on Regular Expression

Maududie, Achmad; Yulia, Endah; Rohim, Muhamat Abdul

doi:10.1109/eiconcit.2018.8878550

Cited by 12 publications

(4 citation statements)

References 12 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Achmad M. et al [4] describe a method for automatically retrieving the title, publication date, author, clean text article, and URL address of a news article from the HTML page of three news websites, namely Detik, ribunnews, and Liputan 6, without manually copying and pasting the information. This method consists of three steps: analyzing the structure of news websites, creating Regex patterns, and implementing the patterns as a set of rules for web scraping.…”

Section: Related Workesmentioning

confidence: 99%

Metadata Scraping Using Programmable Customized Search Engine

2023

IJCCCE

View full text Add to dashboard Cite

The World Wide Web (WWW) is a vast repository of knowledge, including intellectual, social, financial, and security-related data. Online information is typically accessed for instructional purposes. On the internet, information is accessible in a variety of formats and access interfaces. Because of this, indexing or semantic processing of the data via websites may be difficult. The method that seeks to resolve this issue is web data scraping. Unstructured web data can be converted into structured data using web data scraping so that it can be stored and examined in a central local database or spreadsheet. This paper offers a metadata scraping using a programmable Customized Search Engine (CSE) system, which can extract metadata from web pages (HTML pages) in the Google database and save it in an XML format for later analysis and retrieval. Documents that contain metadata are a relatively recent phenomenon on the web and increase the likelihood that users will find the information they need. Index Terms— Programmable (CSE), JSON API, API key, metadata scraping.

show abstract

Section: Related Workesmentioning

confidence: 99%

Metadata Scraping Using Programmable Customized Search Engine

2023

IJCCCE

View full text Add to dashboard Cite

show abstract

“…The websites include Liputan6.com, Detik.com and Tribunnews.com. The reason for choosing the three news websites is because the three news websites have a high level of access in Indonesia [2].…”

Section: ) News Website Selectionmentioning

confidence: 99%

“…The amount of new news that appears every day becomes a new problem when news websites do not provide API services to download these news. The copy and paste method cannot be used to get news from news websites every day because it will take a very long time [2]. Web scraping technique can be a solution to the problem because this technique can retrieve data from a website quickly.…”

Section: Introductionmentioning

confidence: 99%

Web Scraping with HTML DOM Method for Website News API creation

Firdian

Darwiyanto

Adrian

2022

jipi. jurnal. ilmiah. penelitian. dan. pembelajaran. informatik

View full text Add to dashboard Cite

Information is one of the important things in this era, one of the information that always exists every day is news. The amount of news that appears every day becomes a new problem when news websites do not provide API (Application Programming Interface) services to get the news. This is an obstacle for researchers who will analyze news topics. The copy and paste method is less effective in getting news every day on news websites because it takes a long time. In this research, web scraping is done with the HTML (Hypertext Markup Language) DOM (Document Object Model) method to retrieve data from news sites. The results of web scraping are in the form of datasets which are then entered into the database and made into an API. The API that has been created is tested using black box testing and testing the suitability of the data, between the data obtained when scraping and the data on the news website at the time of testing. The results of testing using black box testing show that the filters on the API created run according to their functions and get a high percentage of data conformity. The Tribunnews.com news website has a conformity rate of 99.2%, Detik.com of 97.9% and Li-putan6.com of 98.6%.

show abstract

“…including Sri Lanka, leading to the election of candidates who fail to fulfill their promises and contribute to societal decline [1]. To address these challenges, researchers and scholars have explored various technologies and methodologies to enhance the candidate selection process.…”

Section: Introductionmentioning

confidence: 99%

Online Candidate Selection System for Elections

Meregngnage H.E.,

Fernando J.A.T.C.,

Fernando W.K.

et al. 2023

int. j. eng. mgmt. res.

View full text Add to dashboard Cite

This research proposes an innovative online platform for political parties in Sri Lanka to enhance the candidate selection process. The platform incorporates features such as sentiment analysis, background checks, aptitude tests, ranking system, and analysis of candidates' promises and activities. Developed using advanced technologies, it aims to ensure transparency, efficiency, and accessibility for all eligible candidates, ultimately contributing to a more democratic election.

show abstract

An Approach of Web Scraping on News Website based on Regular Expression

Cited by 12 publications

References 12 publications

Metadata Scraping Using Programmable Customized Search Engine

Metadata Scraping Using Programmable Customized Search Engine

Web Scraping with HTML DOM Method for Website News API creation

Online Candidate Selection System for Elections

Contact Info

Product

Resources

About