Clustering is an unsupervised process to determine which unlabeled objects in a set share interesting properties. The objects are grouped into k subsets (clusters) whose elements optimize a proximity measure. Methods based on information theory have proven to be feasible alternatives. They are based on the assumption that a cluster is one subset with the minimal possible degree of "disorder". They attempt to minimize the entropy of each cluster. We propose a clustering method based on the maximum entropy principle. Such a method explores the space of all possible probability distributions of the data to find one that maximizes the entropy subject to extra conditions based on prior information about the clusters. The prior information is based on the assumption that the elements of a cluster are "similar" to each other in accordance with some statistical measure. As a consequence of such a principle, those distributions of high entropy that satisfy the conditions are favored over others. Searching the space to find the optimal distribution of object in the clusters represents a hard combinatorial problem, which disallows the use of traditional optimization techniques. Genetic algorithms are a good alternative to solve this problem. We benchmark our method relative to the best theoretical performance, which is given by the Bayes classifier when data are normally distributed, and a multilayer perceptron network, which offers the best practical performance when data are not normal. In general, a supervised classification method will outperform a non-supervised one, since, in the first case, the elements of the classes are known a priori. In what follows, we show that our method's effectiveness is comparable to a supervised one. This clearly exhibits the superiority of our method.
IntroductionMathematical models and field data suggest that human mobility is an important driver for Dengue virus transmission. Nonetheless little is known on this matter due the lack of instruments for precise mobility quantification and study design difficulties.Materials and methodsWe carried out a cohort-nested, case-control study with 126 individuals (42 cases, 42 intradomestic controls and 42 population controls) with the goal of describing human mobility patterns of recently Dengue virus-infected subjects, and comparing them with those of non-infected subjects living in an urban endemic locality. Mobility was quantified using a GPS-data logger registering waypoints at 60-second intervals for a minimum of 15 natural days.ResultsAlthough absolute displacement was highly biased towards the intradomestic and peridomestic areas, occasional displacements exceeding a 100-Km radius from the center of the studied locality were recorded for all three study groups and individual displacements were recorded traveling across six states from central Mexico. Additionally, cases had a larger number of visits out of the municipality´s administrative limits when compared to intradomestic controls (cases: 10.4 versus intradomestic controls: 2.9, p = 0.0282). We were able to identify extradomestic places within and out of the locality that were independently visited by apparently non-related infected subjects, consistent with houses, working and leisure places.ConclusionsResults of this study show that human mobility in a small urban setting exceeded that considered by local health authority’s administrative limits, and was different between recently infected and non-infected subjects living in the same household. These observations provide important insights about the role that human mobility may have in Dengue virus transmission and persistence across endemic geographic areas that need to be taken into account when planning preventive and control measures. Finally, these results are a valuable reference when setting the parameters for future mathematical modeling studies.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.