With the increasing use of the Internet and mobile devices, social networks are becoming the most used media to communicate citizens' ideas and thoughts.This information is very useful to identify communities with common ideas based on what they publish in the network. This paper presents a method to automatically detect city communities based on machine learning techniques applied to a set of tweets from Bogotá's citizens. An analysis was performed in a collection of 2,634,176 tweets gathered from Twitter in a period of six months.Results show that the proposed method is an interesting tool to characterize a city population based on a machine learning methods and text analytics. reducing the digital gap in Colombia. For instance, from 2011 to 2014, several portions of the city have experienced an increase of up to 148.3% in homes with Internet connection [2]. By 2011, both Facebook and Twitter were the most popular social networks in Bogotá [1]. Moreover, 65.5% of Bogotá's Internet users used social networks by 2014 [2]. This study concentrates on Twitter because it allows its users to write posts with a length ranging from 1 to 140 characters 1 , making Twitter a tool for microblogging, a form of communication in which users express their opinion about several topics in short posts (tweets) [3].By only taking the users' posts and not the explicit relations between the users, our large data set is an interesting basis for testing unsupervised community detection methodologies. In fact, the only common factor that relates the users from our data set is that they are connected by geographical location.Such unsupervised community detection methodologies help to understand social phenomena that takes place in that geographical region and in a particular period of using data from any social network with no a priori knowledge of people's relations.With the objective of identifying the main topics treated by Bogotá's population on Twitter, as well as detecting possible communities, we collected tweets emitted from Bogotá. Then, a Word2Vec model [4] was built in order to represent the set of tweets corresponding to each user as a vector in a vector space.The gap statistic [5] was used to estimate the number of clusters that could be formed using these vectors. Finally, a frequency distribution of words was built for each cluster, so that each cluster could be identified by its most frequent words, which ultimately characterizes a topic.Our work acknowledges that online social networks constitute one of the main scenarios where people express their opinions, which makes them an outstanding source of information that allows the characterization of important topics for the citizenship. Therefore, we provide a robust method for studying 1 Data was collected before the change in the maximum tweet length to 280 characters