Este artículo propone una metodología para descubrir patrones en datos climatológicos, particularmente temperaturas y precipitación, observados en unidades políticas subnacionales, usando un algoritmo de clasificación automática (un árbol de decisión producido por el algoritmo C4.5). Por lo tanto, los patrones representan árboles de clasificación, en el supuesto de que: 1) cada unidad de división política contiene al menos una estación climatológica y 2) los periodos de registro de las estaciones son relativamente similares en duración y en sus años iniciales y finales. Se produce una serie de modelos de clasificación mediante el uso de diferentes subconjuntos de un conjunto de datos experimentales. Este conjunto de datos contiene información de 3606 estaciones climatológicas en México cuyos periodos de registro tienen diversas duraciones, años iniciales y finales. La variable objetivo (dependiente) en todos estos modelos es el nombre de la unidad política (es decir, el estado). Los predictores son 36 características mensuales por cada estación climatológica: 12 corresponden a una temperatura mínima, 12 a una temperatura máxima y 12 a la precipitación acumulada. También se usó la altitud como predictor adicional a los 36 mencionados, pero sólo para cuantificar su contribución adicional al modelado. Los resultados muestran que los árboles de clasificación son modelos eficaces para describir y representar los patrones no triviales que caracterizan a las unidades de división política, con base en sus temperaturas y precipitación mensual. Uno de los hallazgos destacables es que la precipitación acumulada de mayo es la característica con el mayor poder discriminatorio en esta tarea de caracterización, lo cual es consistente con el trasfondo teórico de la climatología mexicana. Además, los árboles de clasificación ofrecen alta expresividad a personas poco familiarizadas con aprendizaje automático. ABSTRACTThis article proposes a methodology to discover patterns in observed climatologic data, particularly temperatures and rainfall, in subnational political division units using an automatic classification algorithm (a decision tree produced by the C4.5 algorithm). Thus, the patterns represent classification trees, assuming that: (1) every political division unit contains at least one climatological station, and (2) the recording periods of the stations are relatively similar in duration and in their initial and ending years. A series of classification models are produced by using different subsets from an experimental dataset. This dataset contains information from 3606 climatological stations in Mexico with recording periods whose durations, initial and ending years are diverse. The target (dependent) variable in all these models is the name of the political unit (i.e., the state). The predictors are 36 monthly features per each climatological station: 12 features corresponding to a minimum temperature, 12 to a maximum temperature, and 12 to cumulative rainfall. The altitude feature is also used as one of the predicto...
En este artículo mostramos un procedimiento para construir automáticamente una ontología a partir de un corpus de documentos de texto sin ayuda externa tal como diccionarios o tesauros. El método propuesto encuentra conceptos relevantes en forma de frases temáticas en el corpus de documentos y relaciones no jerárquicas entre ellos de manera no supervisada.
Web published news written in the Spanish language, were analyzed by using categories that are related to its content, such as: 'Culture', 'Sports' and 'Finances', or they are classified very generally as is the case of 'National' or 'International'. The large content of documents generated the need to provide the user with an analysis of such documents, particularly in circumstances where in search engines are involved. First of all, a pre-process was applied to allow the mining of texts, which includes the lemmatization, homologation of synonyms and representation of documents with a Boolean method. This pre-process also includes a dimensional reduction of the obtained matrix. Secondly, different classification methods were applied to compare their performance in order to find the one that best assigns the category to the news.
The accelerated development in Grid computing has positioned it as promising next generation computing platforms. Grid computing contains resource management, task scheduling, security problems, information management and so on. In the context of database query processing, existing parallelisation techniques can not operate well in Grid environments, because the way they select machines and allocate queries. This is due to the geographic distribution of resources that are owned by different organizations. The resource owners have different usage or access policies, cost models, varying loads and availability. It is a big challenge for efficient scheduling algorithm design and implementation. In this paper, a heuristic approach based on particle swarm optimization algorithm is adopted to solving parallel query scheduling problem in grid environment.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.