Two programs developed by the authors of the article based on artificial intelligence methods are presented. These programs allow solving hidden patterns discovering problems in statistical data: Augur and iWizard-E. The first is based on association rules mining algorithm, and the second is based on a modified CART decision tree. To increase the reliability of the comparative analysis results, four third-party intelligent systems (Deductor, Orange, KNIME and WizWhy) were used in the study, as well as two sets of statistical data, each of which contains sixteen patterns. A series of seven experiments showed significant superiority of iWizard-E over Augur, which is due to a more advanced iWizard-E algorithm.
Представлены две разработанные авторами статьи программы на основе методов искусственного интеллекта, позволяющие решать задачи по выявлению скрытых закономерностей в статистических данных: Авгур и iWizard-E. Первая основывается на алгоритме поиска ассоциативных правил, вторая на модифицированном дереве решений CART. Для повышения достоверности результатов сравнительного анализа в исследовании задействованы четыре сторонние интеллектуальные системы (Deductor, Orange, KNIME и WizWhy), а также два набора статистических данных, каждый из которых содержит по 16 закономерностей. Серия из семи экспериментов показала заметное превосходство iWizard-E над Авгур , что обусловлено более совершенным алгоритмом iWizard-E.
The intelligent decision support systems process a large amount of data. Often, the information is duplicated, which slows down the process of predictive models building. The iWizard-E system, which is designed to assist the university applicants in choosing a training direction, has the function of duplicate records removal from the data before building a predictive model. The paper analyzes the influence of the mentioned function on the system operation. To this end, a series of experiments were conducted, during which various samples were processed, containing the individual features of students and the information about their graduation from the university, after which the recommendations were generated regarding the choice of a preferred course of study. The samples were formed on the basis of a set containing only unique records. Then the real data were compared with the results issued by the system. The F-measure was used as a quality criterion. It was found that duplicate removal has a positive effect on the quality of work of iWizard-E. This fact is of high practical significance: the amount of data required for the formation of reliable predictive models and, as a result, reliable recommendations to the applicants is reduced. Moreover, the time required to build the predictive models is reduced.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.