In this paper, we present modular multipliers for hardware implementations of (hyper)-elliptic curve cryptography on FPGAs. The prime modulus P is generic and can be configured at run-time to provide flexible circuits. A finelypipelined architecture is proposed for overlapping the partial products and reductions steps in the pipeline of hardwired DSP slices. For instance, 2, 3, or 4 independent multiplications can share the hardware resources at the same time to overlap internal latencies. We designed a tool, distributed as open source, for generating VHDL codes with various parameters: width of operands, number of logical multipliers per physical one, speed or area optimization, possible use of BRAMs, target FPGA. Our modular multipliers lead to, at least, 2 times faster as well as 2 times smaller circuits than state of the art operators.