Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining 2005
DOI: 10.1145/1081870.1081969
|View full text |Cite
|
Sign up to set email alerts
|

Generation of synthetic data sets for evaluating the accuracy of knowledge discovery systems

Abstract: Information Discovery and Analysis Systems (IDAS) are designed to correlate multiple sources of data and use data mining techniques to identify potential significant events. Application domains for IDAS are numerous and include the emerging area of homeland security.Developing test cases for an IDAS requires background data sets into which hypothetical future scenarios can be overlaid. The IDAS can then be measured in terms of false positive and false negative error rates. Obtaining the test data sets can be a… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
16
0

Year Published

2009
2009
2024
2024

Publication Types

Select...
4
3
2

Relationship

0
9

Authors

Journals

citations
Cited by 28 publications
(17 citation statements)
references
References 9 publications
0
16
0
Order By: Relevance
“…In a few approaches the underlying model is well described and defined, e.g., by intersecting planes [72] or variations of the SwissRoll [97]. Other approaches [55,56] use rules and statistics to encode relationships between data instances (e.g., older person implies higher income) and allow one to insert anomalies for different applications. The data is typically created in a black-box manner, making its scope and validity hard to grasp.…”
Section: Multi-dimensional Data Visualizationmentioning
confidence: 99%
“…In a few approaches the underlying model is well described and defined, e.g., by intersecting planes [72] or variations of the SwissRoll [97]. Other approaches [55,56] use rules and statistics to encode relationships between data instances (e.g., older person implies higher income) and allow one to insert anomalies for different applications. The data is typically created in a black-box manner, making its scope and validity hard to grasp.…”
Section: Multi-dimensional Data Visualizationmentioning
confidence: 99%
“…However, it was claimed that based on the heuristic devised, the system could be extended to handle three or higher dimensional data. Jeske et al (2005) proposed an architecture for an information discovery analysis system data and scenario generator that generates synthetic datasets on a to-be-decided semantic graph. Based on this architecture, Lin et al (2006) developed a prototype of this system, which is capable of generating synthetic data for a particular scenario, such as credit card transactions.…”
Section: Related Workmentioning
confidence: 99%
“…A solution to these problems could be using synthetic generated data with intrinsic patterns. There are a number of approaches and techniques that have been developed for generating synthetic data (Coyle et al, 2013, Frasch et al, 2011, van der Walt and Bernard, 2007, Sanchez-Monedero et al, 2013, Jeske et al, 2005, Lin et al, 2006, and Pei and Zaiane, 2006. However, since each of the previous research was either focused on a particular category, such as clustering, or using some special techniques, there are still spaces for further research.…”
Section: Introductionmentioning
confidence: 99%
“…There are a number of approaches and techniques that have been developed for generating synthetic data (Coyle et al, 2013, Frasch et al, 2011, van der Walt and Bernard, 2007, Sanchez-Monedero et al, 2013, Jeske et al, 2005, Lin et al 2006, and Pei and Zaiane, 2006. However, since each of the previous research was either focused on a particular category, such as clustering, or using some special techniques, there are still spaces for further research.…”
Section: Introductionmentioning
confidence: 99%