Cyclic Redundancy Check (CRC) is widely used error detection technique in many contemporary communication systems such as Fourth Generation (4G) Mobile Communication-Long Term Evolution (LTE) and LTE Advanced, Wi-Fi, Wireless LAN. For real time embedded systems, code size (Memory), Processor Machine Cycle (Speed) and Power are the three important parameters which are needs to be optimized. CRC is very effective and simple for error detection but its software implementation is not efficient. This paper presents software implementation of CRC using Bit by Bit (BYB) and Look-Up Table (LUT) approaches reported in the earlier literature. Using these approaches, we have compared machine cycle requirements for computation of CRC-3/5/8/12/16 generator polynomials. We have used TMS320C6713 and Freescale Star Core SC140 architectures for comparing the machine cycle requirements. Then we have intuitively modified our software implementations (Based on C program) of LUT using In Place Computation (IPC).This IPC-LUT based CRC computation is found to be more optimized in terms of machine cycle and memory compared to LUT method. We have reduced the machine cycle requirement by 39.47 % using our IPC-LUT approach compared to conventional LUT. We have also developed inline assembly code for SC140 architecture using IPC-LUT approach that takes only 45 machine cycles for computations. Peak to Average Power Ratio (PAPR) is one of the major drawback of contemporary communication systems. For third parameter (Power), we have simply done the analysis to fix up the decision criteria for deciding the sequences having low PAPR.