The statistical machine translation for the Arabic language integrates external linguistic resources such as part-of-speech tags. The current research presents a Bidirectional Long Short-Term Memory (Bi-LSTM) - Conditional Random Fields (CRF) segment-level Arabic Dialect POS tagger model, which will be integrated into the Multitask Neural Machine Translation (NMT) model. The proposed solution for NMT is based on the recurrent neural network encoder-decoder NMT model that has been introduced recently. The study has proposed and developed a unified Multitask NMT model that shares an encoder between the two tasks; Arabic Dialect (AD) to Modern Standard Arabic (MSA) translation task and the segment-level POS tagging tasks. A shared layer and an invariant layer are shared between the translation tasks. By training translation tasks and POS tagging task alternately, the proposed model can leverage the characteristic information and improve the translation quality from Arabic dialects to Modern Standard Arabic. The experiments are conducted from Levantine Arabic (LA) to MSA and Maghrebi Arabic (MA) to MSA translation tasks. As an additional linguistic resource, the segment-level part-of-speech tags for Arabic dialects were also exploited. Experiments suggest that translation quality and the performance of POS tagger were improved with the implementation of multitask learning approach.
In this research article, we study the problem of employing a neural machine translation model to translate Arabic dialects to modern standard Arabic. The proposed solution of the neural machine translation model is prompted by the recurrent neural network-based encoder-decoder neural machine translation model that has been proposed recently, which generalizes machine translation as sequence learning problems. We propose the development of a multiytask learning (MTL) model which shares one decoder among language pairs, and every source language has a separate encoder. The proposed model can be applied to limited volumes of data as well as extensive amounts of data. Experiments carried out have shown that the proposed MTL model can ensure a higher quality of translation when compared to the individually learned model.
Languages that allow free word order, such as Arabic dialects, are of significant difficulty for neural machine translation (NMT) because of many scarce words and the inefficiency of NMT systems to translate these words. Unknown Word (UNK) tokens represent the out-of-vocabulary words for the reason that NMT systems run with vocabulary that has fixed size. Scarce words are encoded completely as sequences of subword pieces employing the Word-Piece Model. This research paper introduces the first Transformer-based neural machine translation model for Arabic vernaculars that employs subword units. The proposed solution is based on the Transformer model that has been presented lately. The use of subword units and shared vocabulary within the Arabic dialect (the source language) and modern standard Arabic (the target language) enhances the behavior of the multi-head attention sublayers for the encoder by obtaining the overall dependencies between words of input sentence for Arabic vernacular. Experiments are carried out from Levantine Arabic vernacular (LEV) to modern standard Arabic (MSA) and Maghrebi Arabic vernacular (MAG) to MSA, Gulf–MSA, Nile–MSA, Iraqi Arabic (IRQ) to MSA translation tasks. Extensive experiments confirm that the suggested model adequately addresses the unknown word issue and boosts the quality of translation from Arabic vernaculars to Modern standard Arabic (MSA).
The understanding of DNA damage intensityconcentration-level is critical for biological and biomedical research, such as cellular homeostasis, tumor suppression, immunity, and gametogenesis. Therefore, recognizing and quantifying DNA damage intensity levels is a substantial issue, which requires further robust and effective approaches. DNA damage has several intensity levels. These levels of DNA damage in malignant cells and in other unhealthy cells are significant in the assessment of lesion stages located in normal cells. There is a need to get more insight from the available biological data to predict, explore and classify DNA damage intensity levels. Herein, the development process relied on the available biological dataset related to DNA damage signaling pathways, which plays a crucial role in DNA damage in the mammalian cell system. The biological dataset that was used in the proposed model consists of 15000 records intensity -concentration-level for a set of five proteins which regulate DNA damage. This research paper proposes an innovative deep learning model, which consists of an attention-based long short term-memory (AT-LSTM) model for DNA damage multi class predictions. The proposed model splits the prediction procedure into dual stages. For the first stage, we adopt the related feature sequences which are inserted as input to the LSTM neural network. In the next stage, the attention feature is applied efficiently to adopt the related feature sequences which are inserted as input to the softmax layer for prediction in the following frame. Our developed framework not only solves the long-term dependence problem of prediction effectively, but also enhances the interpretability of the prediction methods that was established on the neural network. We conducted a novel proposed model on big and complex biological datasets to perform prediction and multi classification tasks. Indeed, the (AT-LSTM) model has the ability to predict and classify the DNA damage in several classes: No-Damage, Low-damage, Medium-damage, High-damage, and Excessdamage. The experimental results show that our framework for DNA damage intensity level can be considered as state of the art for the biological DNA damage prediction domain.
Organism network systems provide a biological data with high complex level. Besides, these data reflect the complex activities in organisms that identifies nonlinear behavior as well. Hence, mathematical modelling methods such as Ordinary Differential Equations model (ODE's) are becoming significant tools to predict, and expose implied knowledge and data. Unfortunately, the aforementioned approaches face some of cons such as the scarcity and the vagueness in the biological knowledge to expect the protein concentrations measurements. So, the main object of this research presents a computational model such as a neural Feed Forward Network model using Back Propagation algorithm to engage with imprecise and missing biological knowledge to provide more insight about biological systems in organisms. Therefore, the model predicts protein concentration and illustrates the nonlinear behavior for the biological dynamic behavior in precise form. Also, the desired results are matched with recent ODE's model and it provides precise results in simpler form than ODEs.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.