We present efficient realization of Generalized Givens Rotation (GGR) based QR factorization that achieves 3-100x better performance in terms of Gflops/watt over state-of-the-art realizations on multicore, and General Purpose Graphics Processing Units (GPGPUs). GGR is an improvement over classical Givens Rotation (GR) operation that can annihilate multiple elements of rows and columns of an input matrix simultaneously. GGR takes 33% lesser multiplications compared to GR. For custom implementation of GGR, we identify macro operations in GGR and realize them on a Reconfigurable Data-path (RDP) tightly coupled to pipeline of a Processing Element (PE). In PE, GGR attains speed-up of 1.1x over Modified Householder Transform (MHT) presented in the literature. For parallel realization of GGR, we use REDEFINE, a scalable massively parallel Coarse-grained Reconfigurable Architecture, and show that the speed-up attained is commensurate with the hardware resources in REDEFINE. GGR also outperforms General Matrix Multiplication (gemm) by 10% in-terms of Gflops/watt which is counter-intuitive.Index Terms-Parallel computing, orthogonal transforms, dense linear algebra, multiprocessor system-on-chip, instruction level parallelism ! Ranjani Narayan has over 15 years experience at IISc and 9 years at Hewlett Packard. She has vast work experience in a variety of fields computer architecture, operating systems, and special purpose systems. She has also worked in the Technical University of Delft, The Netherlands, and Massachusetts Institute of Technology, Cambridge, USA. During her tenure at HP, she worked on various areas in operating systems and hardware monitoring and diagnostics systems. She has numerous research publications.She is currently Chief Technology Officer at Morphing Machines he was the chief engineer at the Embedded Systems chair at TU Dortmund. In 2002, he joined RWTH Aachen University as a professor for Software for Systems on Silicon. His research comprises software development tools, processor architectures, and system-level electronic design automation, with focus on application-specific multicore systems. He published numerous books and technical papers and served in committees of the leading international EDA conferences. He received various scientific awards, including Best Paper Awards at DAC and twice at DATE, as well as several industrial awards. Dr. Leupers is also engaged as an entrepreneur and in turning novel technologies into innovations. He holds several patents on system-on-chip design technologies and has been a co-founder of LISATek (now with Synopsys), Silexica, and Secure Elements. He has served as consultant for various companies, as an expert for the European Commission, and in the management boards of large-scale projects like HiPEAC and UMIC. He is the coordinator of EU projects TETRACOM and TETRAMAX on academia/industry technology transfer.