Stop words are very important for information retrieval and text analysis investigation. This study aimed to automatically analyze and detect stop words in texts in the Uzbek language. Because of the limited availability of methods for automatic search of stop words of texts in Uzbek we analyzed a newly prepared corpus. The Uzbek language belongs to the family of agglutinative languages. As with all agglutinative languages, we can explain that the detection of stop words in Uzbek texts is a more complex process than in inflected languages: In inflected languages, words such as auxiliary words, articles, prepositions can be included in the stop words group. In agglutinative languages, the meanings of such words are hidden in the text. Therefore, it is not appropriate to apply all known methods of stop words detection in inflected languages directly to agglutinative languages. In this work, the “School corpus” which contains 731156 Uzbek words has been investigated. The bigram method of analysis was applied to the corpus. We proposed the collocation method of detecting stop words of the corpus. We proposed the method of automatically detecting stop words of texts in Uzbek. It is shown that the collocation method is 6 times better than the bigram method.
Axborot texnologiyalari kafedrasi dotsenti, texnika fanlari nomzodi. Bekchanov Shukurla Kurbanbayevich, UrDU Axborot texnologiyalari kafedrasi tayach doktoranti.Annatatsiya. Ma'lumki, matnlarni tahlil qilishda uning mazmunini o'zgartirmaydigan darajada nomuhim so'zlarni matndan olib tashlash masalasi juda katta ahamiyatga ega. Maqolada berilgan o'zbek tilida yozilgan matn uchun nomuhim so'zlarni avtomatik aniqlash bilan o'chirish, kerak bo'lganida asl matnga qayta olish masalasi qaraladi.Kalit so'zlar: nomuhim so'zlar, deep learning, machine learning STOP WORDS IN UZBEK LANGUAGE TEXTSAnnotation. It is well known that in the analysis of texts it is very important to remove from the text stop words that do not change their content. The article deals with the automatic deletion of stop words for the Uzbek text and, if necessary, its return to the original text.Key words: stop words, deep learning, machine learning Matnni sinflarga ajratish yoki ma'nosini tahlil qilish masalasi qo'yilgan bo'lsa biz nomuhim so'zlarni olib tashlashimiz kerak bo'ladi. Chunki ular biz quradigan model uchun ahamyatga ega emas. Ya'ni, ularni olib tashlash orqali model qurishni osonlashtiramiz. Modelni ishlash tezligini oshiramiz va ma'lumotlar hajmini kich-rayishiga erishamiz. Lekin til tarjimasi masalalarini yechish kerak bo'lsa nomuhim so'zlar ahamiyatli bo'ladi, shuning uchun ularni o'chirib tashlamaymiz.Kam ma'noli ma'lumotlarga ega bo'lgan,yoki mustaqil ma'noga ega bo'lmagan, yoki barcha matnlarga xos keng tarqalgan so'zlar nomuhim so'zlari deb ataladi.Nomuhim so'zlar kontseptsiyasi uzoq tarixga ega, Hans Piter Luh 1960 yilda ushbu atamani yaratgan [Luh, 1960]. Ushbu so'zlarning ingliz tilidagi misollari: "a", "the", "of" va "not". Ushbu so'zlar juda keng tarqalgan va odatda ba'zi
No abstract
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.