LitMiner and WikiGene: identifying problem-related key players of gene regulation using publication abstracts

Maier, Holger R.; Döhr, Stefanie; Grote, Korbinian; O’Keeffe, Sean; Werner, Thomas; Angelis, Martin Hrabé de; Schneider, Ralf

doi:10.1093/nar/gki417

Cited by 49 publications

(37 citation statements)

References 7 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…While other similar systems exist, such as XplorMed [69], MedlineR [70], LitMiner [71] and Anni [72], FACTA was chosen because of its ability to pre-index words and concepts, which results in fast, real-time responses. Two sets of biomedical concepts, namely "drug" and "protein" were examined and ranked through FACTA according to their frequencies of appearing in MEDLINE abstracts.…”

Section: Biomedical Corpus Design and Title Extractionmentioning

confidence: 99%

Genomic Taxonomy Boost by Lexical Clustering

Gramatikoff¹

2014

JIG

View full text Add to dashboard Cite

SummaryMultiple sequence alignment is a foundational technique in bioinformatics, and is often the first step in DNA and protein sequence analyses. However, it can be a slow step for genomic scale datasets, a problem that will only get worse as the sheer scale of biological sequence analyses continues to increase. Sequence alignment is also potentially inappropriate when there have been many small-and large-scale rearrangements among the sequences to be aligned, and subsequent analyses may be sensitive to uncertainties in the alignment. In this paper, we propose an alignment-free methodology for sequence comparison, based on n-gram frequency vectors, and demonstrate its ability to detect ontological relationships in biological literature and DNA sequence families (specifically kinases, Alu repeats and promoter sequences of co-expression networks). The methodology is versatile for clustering methods such as classical hierarchical clustering, as well as non-negative matrix factorization. It is also highly efficient in terms of computational time and space requirements, and we foresee it becoming an indispensable tool in genomic sequence analysis. IntroductionAugust Schleicher, Ernst Haeckel, and other 19 th century AbstractIn the post-genomic era, drawing inferences from multiple massive data sets is a ubiquitous challenge in the computational life sciences. Multiple sequence alignment has played a key role in genomics (and other "omics") as a means of summarizing and representing relationships between sequences. However, two problems with alignment-based strategies are apparent: the computational expense of constructing alignments and the sensitivity of subsequent analyses to alignment uncertainties.Here we present a novel alignment-free alternative. We use frequency profiles (or n-gram vectors) for sequence comparison, a method inspired by lexical statistics. Such profiles can be used to infer relationships between texts or between biological sequences, and we demonstrate that two statistical techniques -hierarchical clustering (HC) and non-negative matrix factorization (NMF) -provide invaluable insights in both contexts.We present four case studies. First, we show that bigram frequency profiles can be used to reconstruct the ontology of 102,402 PubMed titles selected for their relevance to nine drugs and nine therapeutic proteins. Second, we apply the same methodology to classify 63 protein kinase coding DNA sequences into functional categories, based on trigram frequency profiles. The two major classes (Tyr vs Ser/Thr) are correctly identified. Third, and similarly, we show that Alu subfamilies can be identified in 58,122 Alu sequences, in perfect agreement with the accepted topology of the Alu phylogeny, again based only on trigram frequency profiles. Fourth, we clustered 8,885 human promoters using trigram frequency profiles for ab initio discovery of co-expression networks associated with disease.We demonstrate that "lexical" statistics offers a viable alignment-free approach to identifying and representing...

show abstract

Section: Biomedical Corpus Design and Title Extractionmentioning

confidence: 99%

Genomic Taxonomy Boost by Lexical Clustering

Gramatikoff¹

2014

JIG

View full text Add to dashboard Cite

show abstract

“…Within the biomedical field, the notion of community annotation has also recently started to be adopted. For instance, WikiProteins (Mons et al, 2008) or WikiGene (Maier et al, 2005) deliver appropriate environments in which it is possible to address the annotation of genes and proteins. Since 2007, GoPubMed also includes a collaborative curation tool for the annotation of concepts and Pubmed authors profiles.…”

Section: Social Annotation and Tagging In Life Sciencesmentioning

confidence: 99%

Social and Semantic Web Technologies for the Text-to-Knowledge Translation Process in Biomedicine

Cano¹,

Labarga²,

Blanco³

et al. 2011

Biomedical Engineering, Trends, Research and Technologies

View full text Add to dashboard Cite

“…It is easy-to-edit and track changes made by visitors to the site. The literature contains several examples where wikis have been used to promote effective knowledge management and efficient systems in healthcare settings [16][17][18][19]. Of particular note is the study carried out by Carvalho et al, 2010 [14] who used a wiki-based platform to establish a collaborative environment for the sharing of information on epidemiological and clinical research data sets, with notable success.…”

Section: Initial Assessment Of Impactsmentioning

confidence: 99%