Abstract. Data mining applied to databases of data sequences generates a number of sequential patterns, which often require additional processing. The post-processing usually consists in searching the source databases for data sequences which contain a given sequential pattern or a part of it. This type of content-based querying is not well supported by RDBMSs, since the traditional optimization techniques are focused on exact-match querying. In this paper, we introduce a new bitmap-oriented index structure, which efficiently optimizes content-based queries on dense databases of data sequences. Our experiments show a significant improvement over traditional database accessing methods.
Data mining is a useful decision support technique, which can be used to find trends and regularities in warehouses of corporate data. A serious problem of its practical applications is long processing time required by data mining algorithms. Current systems consume minutes or hours to answer simple queries. In this paper we present the concept of materialized data mining views. Materialized data mining views store selected patterns discovered in a portion of a database, and are used for query rewriting, which transforms a data mining query into a query accessing a materialized view. Since the transformation is transparent to a user, materialized data mining views can be created and used like indexes.
example, the use of noninvasive mechanical home ventilation (MHV) [14,28] in the care of patients with chronic respiratory failure. This kind of treatment raises ventilatory effectiveness, improves physical
Abstract. Data clustering methods have many applications in the area of data mining. Traditional clustering algorithms deal with quantitative or categorical data points. However, there exist many important databases that store categorical data sequences, where significant knowledge is hidden behind sequential dependencies between the data. In this paper we introduce a problem of clustering categorical data sequences and present an efficient scalable algorithm to solve the problem. Our algorithm implements the general idea of agglomerative hierarchical clustering and uses frequently occurring subsequences as features describing data sequences. The algorithm not only discovers a set of high quality clusters containing similar data sequences but also provides descriptions of the discovered clusters.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.