Pattern matching or finding the occurrences of a pattern in a text arises frequently in many applications. The task of splitting the character stream or text into words is called tokenization. Search engines use tokenizers[1]. The first phase of a compiler outputs a stream of tokens of the given high-level language program. The pattern rules are specified as regular expressions. Many tools have been developed in the past that generate the tokenizer automatically which are mostly sequential. The advent of multicore architectures has made it mandatory to use its features like multiple threads and SIMD instructions in generating software tools. This works attempts to parallelize tokenization. This is a simple prototype implementation of a parallelized lexical analyzer that recognizes the tokens of the given source code. Each Synergetic Processing Element(SPE) of the cell processor works on a block of source code and tokenizes them independently. The Power Processing Unit(PPE) is responsible for splitting the source code into a finite number of blocks to be used by the different processing elements. Each SPE sends the stream of identifiers to the PPE which maintains the symbol table. The parallel lexical analyzer developed runs on IBM Cell Processor simulator and the execution times are plotted varying the code size and the number of processing elements.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.