Now-a-days, social media and communications in social media have become very important for services providers and those play a key role in service quality improvement as well as in decision making. The services consumers' discussions usually are written in their local languages and extracting important knowledge sometimes is very hard and problematic. In this field the natural language processing techniques are helpful, but different languages have their specifics and difficulties, and some languages are not prosperous enough in the techniques and methods on NLP, especially the local speaking of the language. In this scientific paper, we have tried to solve such a problem for the Albanian language spoken in Kosovo. Namely, for a dataset of the comments, written in Albanian language in Kosovo (local speaking), collected from the social media, by use of unsupervised clustering techniques, to make clustering regarding the topic of discussion in the comment. In this research, the different techniques of text feature extraction (vectorization and others) and clustering algorithms (K-means, Spectral, Agglomerative, etc.), are used with the idea to find and define more appropriate techniques for the Albanian language. In this paper are shown the results of the conducted experiments as well as discussions about what to use in case of the Albanian language and other languages similar or in group with Albanian (those which have a weak NLP).
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.