The World Wide Web (WWW) contains enormous amounts of web pages which are accessible by users with the intent of searching information. Web pages are formatted using Hyper Text Markup Language (HTML). All the Web pages, pictures, videos and other online content can be accessed via a Web browser. This provides a very useful and helpful information. Information retrieval systems can help to retrieving the relevant information from web documents. This process of information retrieval involves three stages such as identifying the documents to be processed, writing of query and use of searching mechanism to retrieve the relevant information. With the demand of effective page ranking, we have discussed how HTML tag structure information is useful in searching mechanism to improve efficiency of web page information retrieval and provide relevant information.
In the huge network of World Wide Web, web pages contained large amount of information. Web researches are always requiring main content (e.g., an article text) from the web pages to be gathered, processed and stored quickly and efficiently. Mining the data on the Web has become a major task for locating useful information from the Web. The Web information"s that are considered as useful information usually has huge amounts of noise data"s such as navigation bars, links, advertisements, copyright notices etc. Performance of Web mining can be improved by identifying and removing noises from Web pages. In this paper new method is proposed for removing noise content tag and extracts the information of main content tag from web pages.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.