Serial arithmetic cores reduce area compared to bitparallel alternatives, but are generally assumed to be inappropriate for high-performance FPGA applications due to a significant reduction in throughput. In this paper, we perform a performance and tradeoff analysis of Xilinx 7-series specialized architectures for a novel serial adder tree and multiplier. We show that these serial arithmetic architectures significantly improve functional density due to an average 2u clock speedup compared to bit-parallel alternatives, which provides attractive tradeoffs for different usage scenarios. We also show that serial arithmetic can surprisingly provide better performance than bit-parallel alternatives when replication is solely limited by an area constraint and not application parallelism or input bandwidth. We evaluate this performance improvement on several highly parallel sliding-window applications, showing average speedups of 4.8u and 4.4u compared to bit-parallel implementations over a variety of area constraints.