Threads of parallel applications need to communicate in order to fulfill their tasks. The communication performance between the cores in modern multi-core architectures differs because of the memory and interconnection hierarchies. In these architectures, it is important to map the threads of parallel applications by taking into account the communication between them, to improve their performance and energy consumption. In parallel applications based on shared memory, communication is implicit, which makes it difficult to detect the communication pattern between the threads.In this paper, we introduce a new lightweight mechanism to detect the communication pattern between threads of shared memory applications using the translation lookaside buffer. Our mechanism relies on hardware features, which make it transparent to the programmer and allow the detection to be performed by the operating system during the execution of the application. We also developed a heuristic mapping algorithm that uses the detected pattern to dynamically map the threads to cores. Experiments were performed with applications from the NAS-OMP and PARSEC parallel benchmark suites in a simulated machine as well as a real machine. Results show that our mechanism can substantially improve parallel application performance, as well as processor and DRAM energy consumption. COMMUNICATION-AWARE THREAD MAPPING USING THE TRANSLATION LOOKASIDE BUFFER 4971 replicated cache lines optimizes the usage of the caches [4] and also reduces the amount of invalidation messages sent by cache coherence protocols. Furthermore, mapping threads according to the communication can reduce energy consumption, because each coherence and cache line transfer message between the caches increases the amount of energy used by the interconnections.Communication-aware mapping requires a method to detect the communication between the threads and algorithms that use this information to perform the mapping of threads to cores. In shared memory architectures, detecting the communication between threads presents challenges, because the communication is implicit and happens through memory accesses to shared variables. Most previous research in this area focuses on static profiling methods to provide information for the mapping [5,6], which usually present a high overhead and cannot be used in case the application's behavior changes between executions. Dynamic methods were also proposed, but with several disadvantages, such as low accuracy [7,8], the need to modify the source code of the applications [9] or parallelization libraries, or they are limited to specific processor architectures [10].In this paper, we propose a new lightweight, dynamic mechanism to detect the communication patterns of parallel applications based on shared memory. Our approach consists of looking at the most recently accessed virtual memory pages by each core. This is carried out by checking the contents of the translation lookaside buffer (TLB), which performs the translation of virtual addresses to physical...