“…A substantial amount of work has been carried out in text region localization in everyday scenes (Pilu, 2001;Chen and Yuille, 2004;León et al, 2005;Liu and Samarabandu, 2005;Liu and Samarabandu, 2006;Fu et al, 2005;Merino and Mirmehdi, 2007;Retornaz and Marcotegui, 2007;Lintern, 2008;Jung et al, 2009;Zini et al, 2009;Epshtein et al, 2010;Zhang et al, 2010;Pratheeba et al, 2010;Chen et al, 2011;Yi and Tian, 2011;Pan et al, 2011;Neumann and Matas, 2011a;Neumann and Matas, 2011b;Merino-Gracia et al, 2011), with many of them explicitly dealing with the text aggregation stage, such as (Pilu, 2001;Retornaz and Marcotegui, 2007;Epshtein et al, 2010;Neumann and Matas, 2011a;Chen et al, 2011;Pan et al, 2011;Merino-Gracia et al, 2011), although often other terminology was used for it, such as word or line formation. We now focus our review on these specific works, especially as we use several of them for comparative analysis.…”