Typically, textual information is available as unstructured data, which require processing so that data mining algorithms can handle such data; this processing is known as the pre-processing step in the overall text mining process. This paper aims at analyzing the strong impact that the pre-processing step has on most mining tasks. Therefore, we propose a methodology to vary distinct combinations of pre-processing steps and to analyze which pre-processing combination allows high precision. In order to show different combinations of pre-processing methods, experiments were performed by comparing some combinations such as stemming, term weighting, term elimination based on low frequency cut and stop words elimination. These combinations were applied in text and opinion mining tasks, from which correct classification rates were computed to highlight the strong impact of the pre-processing combinations. Additionally, we provide graphical representations from each pre-processing combination to show how visual approaches are useful to show the processing effects on document similarities and group formation (i.e., cohesion and separation).
Visual analytics techniques are useful tools to support decision-making and cope with increasing data, particularly when monitoring natural or artificial phenomena. When monitoring disease progression, visual analytics approaches help decision-makers to understand or even prevent dissemination paths. In this paper, we propose a new visual analytics tool for monitoring COVID-19 dissemination. We use k-nearest neighbors of cities to mimic neighboring cities and analyze COVID-19 dissemination based on comparing a city under consideration and its neighborhood. Moreover, such analysis is performed within periods, which facilitates the assessment of isolation policies. We validate our tool by analyzing the progression of COVID-19 in neighboring cities of São Paulo state, Brazil.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.