Ortholog Clustering on a Multipartite Graph

Vashist, Akshay; Kulikowski, Casimir A.; Muchnik, Ilya

doi:10.1109/tcbb.2007.1004

Cited by 16 publications

(11 citation statements)

References 30 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…This is because the algorithm must: (i) iterate over all edges e ( u , v ) in C (Step 3), with the worst-case complexity O ( m ) = O ( g 2 ); and for each, (ii) look for a vertex w and edge f ( u , w ) in G (Step 4), which is at worst O ( g ) if it must look through all other genomes in the g -partite graph; and finally for each of these, (iii) check whether u and w are adjacent in G , which is an efficient O (log g ) lookup from the list of all adjacent vertices of w (or v ). The worst-case complexity of EdgeSearch is comparable to the O (V 3 ) (V = number of vertices) of another heuristic method described in Vashist et al (2007), but uses different topological information, i.e. triangles in a SymBets graph rather than dense clusters (quasi-cliques) in a graph that may include all edges and does not require a species tree.…”

Section: Resultsmentioning

confidence: 97%

“…Examples of automated implementations of the former approach include the publicly available algorithms EnsemblCompara (Vilella et al , 2009), SYNERGY (Wapinski et al , 2007), RIO (Zmasek and Eddy, 2002), Orthostrapper (Storm and Sonnhammer, 2002) and the databases of orthologous protein families HOBACGEN, HOVERGEN and HOGENOME (Dufayard et al , 2005), whereas examples of the latter include OrthoMCL (Li et al , 2003), eggNOG (Jensen et al , 2008), InParanoid and MultiParanoid (Alexeyenko et al , 2006; O'Brien et al , 2005; Remm et al , 2001), MSOAR and MultiMSOAR (Fu and Jiang, 2007; Fu et al , 2007), Homologene (Sayers et al , 2010), RoundUp (Deluca et al , 2006) and OMA (Roth et al , 2008). Still other methods exist that do not fall neatly into either category, such as that described in (Vashist et al , 2007), which uses topological distance in a species tree as a factor in a linkage equation to find dense clusters in a multipartite graph (whose edges are not restricted to SymBets).

Fig.…”

Section: Introductionmentioning

confidence: 99%

See 1 more Smart Citation

A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches

et al. 2010

View full text Add to dashboard Cite

Motivation: Identifying orthologous genes in multiple genomes is a fundamental task in comparative genomics. Construction of intergenomic symmetrical best matches (SymBets) and joining them into clusters is a popular method of ortholog definition, embodied in several software programs. Despite their wide use, the computational complexity of these programs has not been thoroughly examined.Results: In this work, we show that in the standard approach of iteration through all triangles of SymBets, the memory scales with at least the number of these triangles, O(g3) (where g = number of genomes), and construction time scales with the iteration through each pair, i.e. O(g6). We propose the EdgeSearch algorithm that iterates over edges in the SymBet graph rather than triangles of SymBets, and as a result has a worst-case complexity of only O(g3log g). Several optimizations reduce the run-time even further in realistically sparse graphs. In two real-world datasets of genomes from bacteriophages (POGs) and Mollicutes (MOGs), an implementation of the EdgeSearch algorithm runs about an order of magnitude faster than the original algorithm and scales much better with increasing number of genomes, with only minor differences in the final results, and up to 60 times faster than the popular OrthoMCL program with a 90% overlap between the identified groups of orthologs.Availability and implementation: C++ source code freely available for download at ftp.ncbi.nih.gov/pub/wolf/COGs/COGsoft/Contact: dmk@stowers.orgSupplementary information: Supplementary materials are available at Bioinformatics online.

show abstract

Section: Resultsmentioning

confidence: 97%

Fig.…”

Section: Introductionmentioning

confidence: 99%

A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches

et al. 2010

View full text Add to dashboard Cite

show abstract

“…This combinatorial optimization problem has been studied in [20] and it has been shown that an efficient algorithm exists for finding the global optimal solution H * if the linkage function π(i, H ) is monotone increasing. The monotone increasing property requires that the value of the linkage function for the vertex i can only increase when the second argument H increases in a set theoretic sense, i.e.…”

Section: Combinatorial Selection Of Characteristic Image Patchesmentioning

confidence: 99%

“…The algorithm for solving this combinatorial optimization problem is given [20], and is described in the pseudocode form in Algorithm 3.1. This iterative algorithm begins by calculating F (V + ) and finds the set M 1 containing the set of vertices from V + which have the minimum value of the linkage function, i.e.…”

Section: Combinatorial Selection Of Characteristic Image Patchesmentioning

confidence: 99%

See 1 more Smart Citation

Combinatorial and statistical methods for part selection for object recognition

Zhao

Vashist

Elgammal

et al. 2007

International Journal of Computer Mathematics

Self Cite

View full text Add to dashboard Cite

In object recognition tasks, where images are represented as constellations of image patches, often many patches correspond to the cluttered background. In this paper, we present a two-stage method for selecting the image patches which characterize the target object class and are capable of discriminating between the positive images containing the target objects and the complementary negative images. The first stage uses a combinatorial optimization formulation on a weighted multipartite graph. The following stage is a statistical method for selecting discriminative patches from the positive images. Another contribution of this paper is the part-based probabilistic method for object recognition, which uses a common reference frame instead of reference patch to avoid possible occlusion problems. We also explore different feature representation using principal component analysis (PCA) and 2D PCA. The experiment demonstrates our approach has outperformed most of the other known methods on a popular benchmark dataset while approaching the best known results.

show abstract

Comparative Genomics: Algorithms and Applications

Yang

Aluru

2010

Algorithms in Computational Molecular Biology

View full text Add to dashboard Cite

Ortholog Clustering on a Multipartite Graph

Cited by 16 publications

References 30 publications

A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches

A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches

Combinatorial and statistical methods for part selection for object recognition

Comparative Genomics: Algorithms and Applications

Contact Info

Product

Resources

About