Web data mining is a field that has gained popularity in the recent time with the advancement in web mining technologies. Web data mining is the extraction of data on web. The term Web Data Mining is a technique used to crawl through various web resources to collect required information, which enables an individual or a company to promote business, understanding marketing dynamics, new promotions floating on the Internet, etc. The data on web is unstructured, irregular and lacks a fixed unified pattern as it is presented in HTML format that represents data in the presentation format and is unable to handle semi-structured or unstructured data . These difficulties lead to the emergence of XML based web data mining. XML was created so that richly structured documents could be used over the web.XML provides a standard for the data exchange and data storage .This paper presents a web data mining model based on XML. In this model first of all unstructured data is transformed to XML and then XML document is stored in database in the form of the string tree, then specific records are searched using a LINQ query. If record does not exist in the database then check the updates of specific website and repeat the same steps. At last data selected by LINQ Query is displayed on web browser. The feature that helped to increase the speed of data extraction and that also reduces the time of extraction is the presence of database that stores the data that have been extracted earlier by a user and can be used by other users by passing a LINQ query .In this model there is no need to create an extra separate XSL file because this model stores xml document in the database in the form of the string tree. This model is implemented using C# with XML.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.