Extensive work has been done on different activities of natural language processing for Western languages as compared to its Eastern counterparts particularly South Asian Languages. Western languages are termed as resource-rich languages. Core linguistic resources e.g. corpora, WordNet, dictionaries, gazetteers and associated tools being developed for Western languages are customarily available. Most South Asian Languages are low resource languages e.g. Urdu is a South Asian Language, which is among the widely spoken languages of sub-continent. Due to resources scarcity not enough work has been conducted for Urdu. The core objective of this paper is to present a survey regarding different linguistic resources that exist for Urdu language processing, to highlight different tasks in Urdu language processing and to discuss different state of the art available techniques. Conclusively, this paper attempts to describe in detail the recent increase in interest and progress made in Urdu language processing research. Initially, the available datasets for Urdu language are discussed. Characteristic, resource sharing between Hindi and Urdu, orthography, and morphology of Urdu language are provided. The aspects of the pre-processing activities such as stop words removal, Diacritics removal, Normalization and Stemming are illustrated. A review of state of the art research for the tasks such as Tokenization, Sentence Boundary Detection, Part of Speech tagging, Named Entity Recognition, Parsing and development of WordNet tasks are discussed. In addition, impact of ULP on application areas, such as, Information Retrieval, Classification and plagiarism detection is investigated. Finally, open issues and future directions for this new and dynamic area of research are provided. The goal of this paper is to organize the ULP work in a way that it can provide a platform for ULP research activities in future.
Bardet–Biedl syndrome (BBS) is a recessive disorder characterized by heterogeneous clinical manifestations, including truncal obesity, rod-cone dystrophy, renal anomalies, postaxial polydactyly, and variable developmental delays. At least 20 genes have been implicated in BBS, and all are involved in primary cilia function. We report a 1-year-old male child from Guyana with obesity, postaxial polydactyly on his right foot, hypotonia, ophthalmologic abnormalities, and developmental delay, which together indicated a clinical diagnosis of BBS. Clinical chromosomal microarray (CMA) testing and high-throughput BBS gene panel sequencing detected a homozygous 7p14.3 deletion of exons 1–4 of BBS9 that was encompassed by a 17.5 Mb region of homozygosity at chromosome 7p14.2–p21.1. The precise breakpoints of the deletion were delineated to a 72.8 kb region in the proband and carrier parents by third-generation long-read single molecule real-time (SMRT) sequencing (Pacific Biosciences), which suggested non-homologous end joining as a likely mechanism of formation. Long-read SMRT sequencing of the deletion breakpoints also determined that the aberration included the neighboring RP9 gene implicated in retinitis pigmentosa; however, the clinical significance of this was considered uncertain given the paucity of reported cases with unambiguous RP9 mutations. Taken together, our study characterized a BBS9 deletion, and the identification of this shared haplotype in the parents suggests that this pathogenic aberration may be a BBS founder mutation in the Guyanese population. Importantly, this informative case also highlights the utility of long-read SMRT sequencing to map nucleotide breakpoints of clinically relevant structural variants.
We present a prototype software system with sufficient capacity and speed to estimate radiation exposures in a mass casualty event by counting dicentric chromosomes (DCs) in metaphase cells from many individuals. Top-ranked metaphase cell images are segmented by classifying and defining chromosomes with an active contour gradient vector field (GVF) and by determining centromere locations along the centreline. The centreline is extracted by discrete curve evolution (DCE) skeleton branch pruning and curve interpolation. Centromere detection minimises the global width and DAPI-staining intensity profiles along the centreline. A second centromere is identified by reapplying this procedure after masking the first. Dicentrics can be identified from features that capture width and intensity profile characteristics as well as local shape features of the object contour at candidate pixel locations. The correct location of the centromere is also refined in chromosomes with sister chromatid separation. The overall algorithm has both high sensitivity (85 %) and specificity (94 %). Results are independent of the shape and structure of chromosomes in different cells, or the laboratory preparation protocol followed. The prototype software was recoded in C++/OpenCV; image processing was accelerated by data and task parallelisation with Message Passaging Interface and Intel Threading Building Blocks and an asynchronous non-blocking I/O strategy. Relative to a serial process, metaphase ranking, GVF and DCE are, respectively, 100 and 300-fold faster on an 8-core desktop and 64-core cluster computers. The software was then ported to a 1024-core supercomputer, which processed 200 metaphase images each from 1025 specimens in 1.4 h.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.