“…); and in machine translation or cross-language information retrieval, special transliteration processes can be applied to entity names across languages with different alphabets, 4 provided that the names have been identified. Although ners that employ mostly hand-crafted rules 5,6 may perform very well, ners that use statistical and machine learning techniques, including Hidden Markov or Maximum Entropy Models, 7,8,9,10 decision tree learning and/or boosting, 11,12,13 and Support Vector Machines, 14,15 usually outperform them and they are easier to port to new text genres (e.g., biomedical, instead of news articles), where new name categories (e.g., protein names) may also need to be supported. However, supervised statistical and machine learning-based ners still require a tedious manual annotation phase, during which humans must tag occurrences of entity names in a training corpus.…”