Accurate estimation of Intrinsic Dimensionality (ID) is of crucial importance in many data mining and machine learning tasks, including dimensionality reduction, outlier detection, similarity search and subspace clustering. However, since their convergence generally requires sample sizes (that is, neighborhood sizes) on the order of hundreds of points, existing ID estimation methods may have only limited usefulness for applications in which the data consists of many natural groups of small size. In this paper, we propose a local ID estimation strategy stable even for 'tight' localities consisting of as few as 20 sample points. The estimator applies MLE techniques over all available pairwise distances among the members of the sample, based on a recent extreme-valuetheoretic model of intrinsic dimensionality, the Local Intrinsic Dimension (LID). Our experimental results show that our proposed estimation technique can achieve notably smaller variance, while maintaining comparable levels of bias, at much smaller sample sizes than state-of-the-art estimators.
Degenerate primer sets were designed to reveal biodiversity of genes using dynamic pattern matching. Bacterial nucleotide sequences were randomly selected from ten different genus and used to find universal primers for bacterial gene selection. Aligned sequences from different bacterial nucleotides were entered into the system consisted of three steps: data reformation, primer design, and property filtering. First, degenerate and consensus sequences were calculated using statistical models. The results were combined with Gibbs Free Energy to design and select the most appropriate sequences as a series of primer sets. Moreover, users can also adjust their own criteria for each primer set. The results indicate that the degenerate primers designed by our proposed system were proved to be positive.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.