This paper describes the architecture of a cross-sectorial Big Data platform for the process industry domain. The main objective was to design a scalable analytical platform that will support the collection, storage and processing of data from multiple industry domains. Such a platform should be able to connect to the existing environment in the plant and use the data gathered to build predictive functions to optimize the production processes. The analytical platform will contain a development environment with which to build these functions, and a simulation environment to evaluate the models. The platform will be shared among multiple sites from different industry sectors. Cross-sectorial sharing will enable the transfer of knowledge across different domains. During the development, we adopted a user-centered approach to gather requirements from different stakeholders which were used to design architectural models from different viewpoints, from contextual to deployment. The deployed architecture was tested in two process industry domains, one from the aluminium production and the other from the plastic molding industry.
Digital data are all around us and occurs in various forms as videos, pictures or texts. Digital documents represent the vast majority of such data. It can be e-news, social media contributions and so on. They can contain useful information, but due to their amount, it is time-consuming to find relevant information for the concrete company or persons. For that reason, there is a need for their automatic analysis. One of the areas which dealt with textual data analysis is topic modeling. It showed us a new way of how to automatically browse, search and summarize data in the organization. Topic modeling can be useful for time-based analysis of crises, elections, news feeds, launching of new products on the market, and other tasks which led to decision support tasks. In this paper, we aim to survey and compare topic modeling methods and propose web application to visualize extracted topics using topic modeling method called Latent Dirichlet Allocation (LDA). The comparison of selected standard topic modeling methods was experimentally tested on two selected textual datasets (20Newsgroup and Reuters) using standard evaluation metric. The proposed web application was implemented to use LDA and can extract topic models from textual documents datasets, visualize them and show their evolution over time.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.