Research in the field of Natural Language Processing (NLP) is currently increasing especially with the arrival of a new term that is "big data". The needs of the programming library that ready-touse becomes very important to speed up the phases of research. Some libraries that have already been mature is available but generally for English language and its dependently. So, it can't be used for other languages. Stemming is one of the basic processes that exist in NLP. Indonesian stemming algorithm that often used is ECS (Enhanced Confix-Stripping). One of the libraries that already implemented the algorithm is Sastrawi 1. Results from the experiment show that the time of stemming processing by Sastrawi is still slow. Therefore, this research will optimize the speed of stemming processing using multiprocessing (MP). The data test are used in this research has manually taken from Wikipedia 2. The experiment results show that the MP technique can decrease the average time of stemming processing about 98.45%.
Every Real Time Operating System (RTOS) has different characteristics. Testing is needed to determine which criteria of real time application is suitable to be implemented using an RTOS. In this research, benchmarking is performed on two Linux based RTOS; Real Time Patch Linux and Xenomai. Benchmarking is done by running encryption application on each RTOS. RTOS performance assessed through encryption application performance. We use three performance metrics; processing time, jitter, and throughput. Tests are conducted at low load and overload conditions. Test results shows that the RT Patch Linux is able to produce higher throughput compared to Xenomai, but processing time in Xenomai is more predictive than processing time in RT Patch Linux. In overload conditions, Xenomai is able to provide more stable performance than RT Patch Linux.
Question classification is one of the essential tasks for question answering system. This task will determine the expected answer type (EAT) of the question given to the system. Multinomial Naïve Bayes algorithm is one of the learning algorithms that can be used to classify questions. At the classification stage, this algorithm used a set of features in the knowledge model. The number of features used can result in curse of dimensionality if the feature is in high dimension. Feature selection can be used to reduce the feature dimension and could increase the system performance. Chi-Square algorithm can be used to select features that describe each category. In this research, the Multinomial Naïve Bayes is used to classify the question sentences and the Chi-Square algorithm is used for the feature selection. The dataset used is a set of Indonesian question sentences, consisting of 519 labeled factoids, 491 labeled non-factoids, and 185 labeled other. The test results showed an increase in accuracy of 0.1 when used feature selection. System accuracy when used feature selection is 0.87 with the number of features used are 248. Without feature selection, the accuracy is 0.77 with the number of features used are 1374.
Query reformulation is one of the tasks in Information Retrieval (IR), which automatically creates new queries based on previous queries. The main challenge of query reformulation is to create a new query whose meaning or context is similar to the old query. Query reformulation can improve the search for relevant documents for Open-domain Question Answering (OpenQA). The more queries are given to the search system, and the more documents will be generated. We propose a Word Predicted and Substituted (WPS) method for query reformulation using a word embedding word2vec. We tested this method on the Indonesian Question Answering System (IQAS). The test results obtained an E-1 value of 81% and an E-2 value of 274%. These results prove that the query reformulation method with WPS and word-embedding can improve the search for potential IQAS answers.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.