Highly efficient arithmetic operations are necessary to achieve the desired performance in many real-time systems and digital image processing applications. In all these applications, one of the important arithmetic operations frequently performed is to multiply and accumulate with small computational time. In this paper, a serial -parallel multiplier, which can be used to perform either signed or unsigned multiplications, is presented. In this multiplier one factor B(n) is fed serially with word length n=4 while the other A(m) is stored in parallel with number of bits m=4. Baugh-Wooley algorithm necessitates complementation of last bit of each partial product except the last partial product in which all but the last bit are complemented. In the proposed algorithm all bits of the last partial product are complemented. This modification results in considerable reduction in hardware compared to Baugh-Wooley multiplier. This multiplier can be used for implementation of discrete orthogonal transforms, which are used in many applications, including image and signal processing. Also a fully pipelined 2-D bit-level systolic architecture is presented for efficient implementation of discrete orthogonal transforms using a serial-parallel matrix-vector multiplication scheme. A comparison with similar structures has shown that the proposed structure requires less computation time.