The COVID-19 pandemic created new demands for services in the judicial system, requiring the use of a data warehouse (DW). Although there exist approaches that use DW in the judicial domain, few target the pandemic or publicly provide the information extracted from the texts. Following the needs of a legal expert, we have developed the COVID-19 Portal. It extracts documents from the Supreme Federal Court in Brazil to obtain quantitative information on words used in the texts. In this paper, we present the design of a DW, and show the query performance improvement achieved with its implementation. The DW has been developed on Postgres, and its performance is compared with the original implementation on MongoDB Cloud and a local MongoDB database.
Text Classification is one of the tasks of Natural Language Processing (NLP). In this area, Graph Convolutional Networks (GCN) has achieved values higher than CNN's and other related models. For GCN, the metric that defines the correlation between words in a vector space plays a crucial role in the classification because it determines the weight of the edges between two words (represented by nodes in the graph). In this study, we empirically investigated the impact of thirteen measures of distance/similarity. A representation was built for each document using word embedding from word2vec model. Also, a graph-based representation of five dataset was created for each measure analyzed, where each word is a node in the graph, and each edge is weighted by distance/similarity between words. Finally, each model was run in a simple graph neural network. The results show that, concerning text classification, there is no statistical difference between the analyzed metrics and the Graph Convolution Network. Even with the incorporation of external words or external knowledge, the results were similar to the methods without the incorporation of words. However, the results indicate that some distance metrics behave better than others in relation to context capture, with Euclidean distance reaching the best values or having statistical similarity with the best.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.