Gene promoters are the key DNA regulatory elements positioned around the transcription start sites and are responsible for regulating gene transcription process. Various alignment-based, signal-based and content-based approaches are reported for the prediction of promoters. However, since all promoter sequences do not show explicit features, the prediction performance of these techniques is poor. Therefore, many machine learning and deep learning models have been proposed for promoter prediction. In this work, we studied methods for vector encoding and promoter classification using genome sequences of three distinct higher eukaryotes viz. yeast (Saccharomyces cerevisiae), A. thaliana (plant) and human (Homo sapiens). We compared one-hot vector encoding method with frequency-based tokenization (FBT) for data pre-processing on 1-D Convolutional Neural Network (CNN) model. We found that FBT gives a shorter input dimension reducing the training time without affecting the sensitivity and specificity of classification. We employed the deep learning techniques, mainly CNN and recurrent neural network with Long Short Term Memory (LSTM) and random forest (RF) classifier for promoter classification at k-mer sizes of 2, 4 and 8. We found CNN to be superior in classification of promoters from non-promoter sequences (binary classification) as well as species-specific classification of promoter sequences (multiclass classification). In summary, the contribution of this work lies in the use of synthetic shuffled negative dataset and frequency-based tokenization for pre-processing. This study provides a comprehensive and generic framework for classification tasks in genomic applications and can be extended to various classification problems.
Computational analysis methods including machine learning have a significant impact in the fields of genomics and medicine. High-throughput gene expression analysis methods such as microarray technology and RNA sequencing produce enormous amounts of data. Traditionally, statistical methods are used for comparative analysis of gene expression data. However, more complex analysis for classification of sample observations, or discovery of feature genes requires sophisticated computational approaches. In this review, we compile various statistical and computational tools used in analysis of expression microarray data. Even though the methods are discussed in the context of expression microarrays, they can also be applied for the analysis of RNA sequencing and quantitative proteomics datasets. We discuss the types of missing values, and the methods and approaches usually employed in their imputation. We also discuss methods of data normalization, feature selection, and feature extraction. Lastly, methods of classification and class discovery along with their evaluation parameters are described in detail. We believe that this detailed review will help the users to select appropriate methods for preprocessing and analysis of their data based on the expected outcome.
Green synthesis of nanoparticles has gained importance due to its eco-friendly, low toxicity and cost effective nature. This study deals with the biosynthesis of silver nanoparticles (AgNPs) from the bark extract of Amentotaxus assamica. The AgNPs have been synthesised by reducing the silver ions into stable AgNPs using the bark extract of Amentotaxus assamica under the influence of sunlight irradiation. The characterisation of the biosynthesised AgNPs was carried out by UV-vis spectroscopy, X-ray diffraction analysis (XRD), Fourier transform infrared spectroscopy, scanning electron microscopy (SEM) and energy dispersive X-ray analysis. The UV-vis spectrum showed a broad peak at 472 nm. Also, the XRD confirmed the crystalline structure of the AgNPs. Moreover, the SEM analysis revealed that the biosynthesised AgNPs were spherical in shape. Also, dynamic light scattering techniques were used to evaluate the size distribution profile of the biosynthesised AgNPs. Furthermore, the biosynthesised AgNPs showed a prominent inhibitory effect against both Escherichia coli (MTCC 111) and Staphylococcus aureus (MTCC 97). Thus the biosynthesis of AgNPs from the bark extract of Amentotaxus assamica is found to eco-friendly way of producing AgNPs compared to chemical method.
An audio finger print is a small set of features that uniquely identifies a song. An audio fingerprint can be used for broadcast monitoring, audience measurement, meta-data collection. The general framework for building an audio fingerprint includes a front-end and a finger print modeling block. This paper details various uses and properties of an audio fingerprint and also the various stages included in the front end. Two algorithms namely -PRH and MLH have been discussed.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.