Abstract-An evolutionary network (EN) in formatted protein sequence space is a very large graph representing information about sequence similarity of relatively short protein fragments. This graph can be used for detecting hidden relatedness between proteins, which is highly significant in protein annotation. Effective EN analysis requires an appropriate graph clustering approach. Based on the fact that biological relatedness is strongly dependent on the number of independent graph nodes connections, we develop a network clustering method that is capable to produce quality clusters the members of which have a satisfactory level of relatedness.In this article we describe a new network partitioning method which is based on the k-cycles graph connectivity approach. After formally defining a unique structure, named k-ladder connectivity, we demonstrate that the k-ladder-based algorithm is able to successfully detect the groups of functionally related proteins.To exhibit the quality of the method, we have conducted a set of experiments in which it has been very effective in clustering of EN, as well as the significantly denser protein-protein interaction networks (PPINs). Furthermore, it can be simply adapted for more complicated structures than cycles, as well as applied to other large networks of different types.Index Terms-K-ladder, connectivity algorithm, network clustering, protein evolutionary network, formatted protein sequence space, protein-protein interaction networks.
I. INTRODUCTIONProteins are the main components in all living organisms. Significant progress in molecular genetic technology during the last decade provided us with a vast amount of protein sequences that exist in nature. For example, the recent release of the UniProt database (http://www.uniprot.org/) contains more than 40,000,000 protein sequences. However, many of these protein sequences have no proper annotation -meaning that the structure and biological function of the corresponding proteins are unknown. Such characterization of these proteins on the basis of their known sequences and often according to some other high-throughput information is one of the main challenges in computational biology.Among a multitude of bioinformatics methods and algorithms dedicated to reveal the sought-after protein organization and biological functionality, there is a group of approaches that use graph analysis techniques applied to various kinds of protein associated networks. A common Manuscript received May 14, 2014; revised July 19, 2014. This work was supported by the European Union seventh framework program via the PathoSys Project (grant number 260429).The authors are with the ORT Braude College of Engineering, Karmiel, Israel and Research Fellow at Institute of Evolution, University of Haifa, Israel (e-mail: reshma.iidsalld2007@gmail.com, asoffer@braude.ac.il and ahumu@yahoo.com, vlvolkov@braude.ac.il, zakharf@research.haifa.ac.il).example of such a network is the Protein-Protein Interaction network (PPI or PPIN), a dedicated graph used for integ...