Functionally related genes often appear in each others neighborhood on the genome, however the order of the genes may not be the same. These groups or clusters of genes may have an ancient evolutionary origin or may signify some other critical phenomenon and may also aid in function prediction of genes. Such gene clusters also aid toward solving the problem of local alignment of genes. Similarly, clusters of protein domains, albeit appearing in different orders in the protein sequence, suggest common functionality in spite of being nonhomologous. In the paper we address the problem of automatically discovering clusters of entities be it genes or domains: we formalize the abstract problem as a discovery problem called the πpattern problem and give an algorithm that automatically discovers the clusters of patterns in multiple data sequences. We take a model-less approach and introduce a notation for maximal patterns that drastically reduces the number of valid cluster patterns, without any loss of information, We demonstrate the automatic pattern discovery tool on motifs on E Coli protein sequences.
The growing gap between the advanced capabilities of static compilers as reflected in benchmarking results and the actual performance that users experience in real-life scenarios makes client-side dynamic optimization technologies imperative to the domain of static languages. Dynamic optimization of software distributed in the form of a platform-agnostic Intermediate-Representation (IR) has been very successful in the domain of managed languages, greatly improving upon interpreted code, especially when online profiling is used. However, can such feedback-directed IR-based dynamic code generation be viable in the domain of statically compiled, rather than interpreted, languages? We show that fat binaries, which combine the IR together with the statically compiled executable, can provide a practical solution for software vendors, allowing their software to be dynamically optimized without the limitation of binary-level approaches, which lack the highlevel IR of the program, and without the warm-up costs associated with the IR-only software distribution approach. We describe and evaluate the fat-binary-based runtime compilation approach using SPECint2006, demonstrating that the overheads it incurs are low enough to be successfully surmounted by dynamic optimization. Building on Java JIT technologies, our results already improve upon common real-world usage scenarios, including very small workloads.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.