In this paper we focus on the problem of question ranking in community question answering (cQA) forums in Arabic. We address the task with machine learning algorithms using advanced Arabic text representations. The latter are obtained by applying tree kernels to constituency parse trees combined with textual similarities, including word embeddings. Our two main contributions are: (i) an Arabic language processing pipeline based on UIMA-from segmentation to constituency parsing-built on top of Farasa, a state-of-the-art Arabic language processing toolkit; and (ii) the application of long short-term memory neural networks to identify the best text fragments in questions to be used in our treekernel-based ranker. Our thorough experimentation on a recently released cQA dataset shows that the Arabic linguistic processing provided by Farasa produces strong results and that neural networks combined with tree kernels further boost the performance in terms of both efficiency and accuracy. Our approach also enables an implicit comparison between different processing pipelines as our tests on Farasa and Stanford parsers demonstrate.
The automated processing of Arabic dialects is challenging due to the lack of spelling standards and the scarcity of annotated data and resources in general. Segmentation of words into their constituent tokens is an important processing step for natural language processing. In this paper, we show how a segmenter can be trained on only 350 annotated tweets using neural networks without any normalization or reliance on lexical features or linguistic resources. We deal with segmentation as a sequence labeling problem at the character level. We show experimentally that our model can rival state-of-the-art methods that heavily depend on additional resources.
This paper focuses on comparing between using Support Vector Machine based ranking (SVM Rank ) and Bidirectional LongShort-Term-Memory (bi-LSTM) neuralnetwork based sequence labeling in building a state-of-the-art Arabic part-ofspeech tagging system. Using SVM Rank leads to state-of-the-art results, but with a fair amount of feature engineering. Using bi-LSTM, particularly when combined with word embeddings, may lead to competitive POS-tagging results by automatically deducing latent linguistic features. However, we show that augmenting bi-LSTM sequence labeling with some of the features that we used for the SVM Rankbased tagger yields to further improvements. We also show that gains realized using embeddings may not be additive with the gains achieved due to features. We are open-sourcing both the SVM Rank and the bi-LSTM based systems for the research community.
Recently, the COVID-19 epidemic has had a major impact on day-to-day life of people all over the globe, and it demands various kinds of screening tests to detect the coronavirus. Conversely, the development of deep learning (DL) models combined with radiological images is useful for accurate detection and classification. DL models are full of hyperparameters, and identifying the optimal parameter configuration in such a high dimensional space is not a trivial challenge. Since the procedure of setting the hyperparameters requires expertise and extensive trial and error, metaheuristic algorithms can be employed. With this motivation, this paper presents an automated glowworm swarm optimization (GSO) with an inception-based deep convolutional neural network (IDCNN) for COVID-19 diagnosis and classification, called the GSO-IDCNN model. The presented model involves a Gaussian smoothening filter (GSF) to eradicate the noise that exists from the radiological images. Additionally, the IDCNN-based feature extractor is utilized, which makes use of the Inception v4 model. To further enhance the performance of the IDCNN technique, the hyperparameters are optimally tuned using the GSO algorithm. Lastly, an adaptive neuro-fuzzy classifier (ANFC) is used for classifying the existence of COVID-19. The design of the GSO algorithm with the ANFC model for COVID-19 diagnosis shows the novelty of the work. For experimental validation, a series of simulations were performed on benchmark radiological imaging databases to highlight the superior outcome of the GSO-IDCNN technique. The experimental values pointed out that the GSO-IDCNN methodology has demonstrated a proficient outcome by offering a maximal sensy of 0.9422, specy of 0.9466, precn of 0.9494, accy of 0.9429, and F1score of 0.9394.
Human-centric biomedical diagnosis (HCBD) becomes a hot research topic in the healthcare sector, which assists physicians in the disease diagnosis and decision-making process. Leukemia is a pathology that affects younger people and adults, instigating early death and a number of other symptoms. Computer-aided detection models are found to be useful for reducing the probability of recommending unsuitable treatments and helping physicians in the disease detection process. Besides, the rapid development of deep learning (DL) models assists in the detection and classification of medical-imaging-related problems. Since the training of DL models necessitates massive datasets, transfer learning models can be employed for image feature extraction. In this view, this study develops an optimal deep transfer learning-based human-centric biomedical diagnosis model for acute lymphoblastic detection (ODLHBD-ALLD). The presented ODLHBD-ALLD model mainly intends to detect and classify acute lymphoblastic leukemia using blood smear images. To accomplish this, the ODLHBD-ALLD model involves the Gabor filtering (GF) technique as a noise removal step. In addition, it makes use of a modified fuzzy c-means (MFCM) based segmentation approach for segmenting the images. Besides, the competitive swarm optimization (CSO) algorithm with the EfficientNetB0 model is utilized as a feature extractor. Lastly, the attention-based long-short term memory (ABiLSTM) model is employed for the proper identification of class labels. For investigating the enhanced performance of the ODLHBD-ALLD approach, a wide range of simulations were executed on open access dataset. The comparative analysis reported the betterment of the ODLHBD-ALLD model over the other existing approaches.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.