“…Character n-grams have been successfully used for a long time in a wide variety of text processing problems and domains, including the following: approximate word matching (Zobel and Dart, 1995;Mustafa, 2005), string-similarity measures (Angell et al, 1983), language identification (Gottron and Lipka, 2010;Gökçay and Gökçay, 1995), authorship attribution (Kešelj et al, 2003), text compression (Wisniewski, 1987), and bioinformatics (Pavlović-Laetić et al, 2009;Cheng and Carbonell, 2007;Tomović et al, 2006).…”