Abstract. Portuguese juridical documents from Supreme Courts and the Attorney General's Office are manually classified by juridical experts into a set of classes belonging to a taxonomy of concepts. In this paper, a preliminary approach to develop techniques to automatically classify these juridical documents, is proposed. As basic strategy, the integration of natural language processing techniques with machine learning ones is used. Support Vector Machines (SVM) are used as learning algorithm and the obtained results are presented and compared with other approaches, such as C4.5 and Naïve Bayes.
Text classification is an important task in the legal domain. In fact, most of the legal information is stored as text in a quite unstructured format and it is important to be able to automatically classify these texts into a predefined set of concepts.Support Vector Machines (SVM), a machine learning algorithm, has shown to be a good classifier for text bases [Joachims, 2002]. In this paper, SVMs are applied to the classification of European Portuguese legal texts -the Portuguese Attorney General's Office Decisions -and the relevance of linguistic information in this domain, namely lemmatisation and part-of-speech tags, is evaluated.The obtained results show that some linguistic information (namely, lemmatisation and the part-of-speech tags) can be successfully used to improve the classification results and, simultaneously, to decrease the number of features needed by the learning algorithm.
Given the continuous increase in the global population, the food manufacturers are advocated to either intensify the use of cropland or expand the farmland, making land cover and land usage dynamics mapping vital in the area of remote sensing. In this regard, identifying and classifying a high-resolution satellite imagery scene is a prime challenge. Several approaches have been proposed either by using static rule-based thresholds (with limitation of diversity) or neural network (with data-dependent limitations). This paper adopts the inductive approach to learning from surface reflectances. A manually labeled Sentinel-2 dataset was used to build a Machine Learning (ML) model for scene classification, distinguishing six classes (Water, Shadow, Cirrus, Cloud, Snow, and Other). This models was accessed and further compared to the European Space Agency (ESA) Sen2Cor package. The proposed ML model presents a Micro-F1 value of 0.84, a considerable improvement when compared to the Sen2Cor corresponding performance of 0.59. Focusing on the problem of optical satellite image scene classification, the main research contributions of this paper are: (a) an extended manually labeled Sentinel-2 database adding surface reflectance values to an existing dataset; (b) an ensemble-based and a Neural-Network-based ML models; (c) an evaluation of model sensitivity, biasness, and diverse ability in classifying multiple classes over different geographic Sentinel-2 imagery, and finally, (d) the benchmarking of the ML approach against the Sen2Cor package.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.