Plant species identification is a task of interdisciplinary interest,desirable in many contexts, such as gardening, botanical research,and agriculture. For some plant species such as the Acer Palmatum,the characteristics of leaves, petioles, and trunks can drasticallyvary among the different genera of the same subspecies. ComputerVision and Machine Learning research areas made possible thecreation of different classifiers trained to assist in species plantrecognition based on digital images. However, the success of thetraining of a classification model is directly linked to the qualityand adequacy of the dataset used. For the classification of AcerPalmatum plants, datasets composed of samples regarding the differentvarieties within this genus were not identified. Thus, in thispaper, we proposed the creation of a new dataset and of a classifierto support the identification of distinct plant genera of the subspeciesAcer Palmatum. We believe that our proposal aggregatesrelevant information not currently available, and will encouragefurther work aimed at automatically classifying between genera ofsome plant species which task is considered non-trivial even forexperienced growers.
The application of Natural Language Processing (NLP) has achieved a high level of relevance in several areas. In the field of software engineering (SE), NLP applications are based on the classification of similar texts (e.g. software requirements), applied in tasks of estimating software effort, selection of human resources, etc. Classifying software requirements has been a complex task, considering the informality and complexity inherent in the texts produced during the software development process. The pre-trained embedding models are shown as a viable alternative when considering the low volume of textual data labeled in the area of software engineering, as well as the lack of quality of these data. Although there is much research around the application of word embedding in several areas, to date, there is no knowledge of studies that have explored its application in the creation of a specific model for the domain of the SE area. Thus, this article presents the proposal for a contextualized embedding model, called BERT_SE, which allows the recognition of specific and relevant terms in the context of SE. The assessment of BERT_SE was performed using the software requirements classification task, demonstrating that this model has an average improvement rate of 13% concerning the BERT_base model, made available by the authors of BERT. The code and pre-trained models are available at https://github.com/elianedb.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.