The objective of this work is to get benefit of advancement in GPU technologies in the state of art software framework. We have analyzed the existing map-reduce (MR) framework and modify the same for new GPU architectures. We have identified some significant possibilities for improvement. These improvements are mainly in the context of the different GPU architectures, which were introduced after the development of the MR framework. Our experiments show an average of 2.5x speedup of MR framework on these architectures. Cache reconfiguration is also investigated in this work. We have achieved performance benefit ranging from 10% to 200% for various cache sizes. Based on the above analysis, three techniques have been developed for the performance enhancement of MR framework. First, we exploited the concept of principle of locality by code restructure. We have saved over 32% cache miss per thread. Second, we have reduced the number of comparisons per thread in group phase. Our optimized group phase gives an average of 1.5x speed up. In third optimization, we have performed delayed writing during mapperCount function and make this function as cache sensitive. This reduces significant cache misses and improves the execution time by 10% to 25% for this function.