In an effort to reduce the productivity gap separating hardware design and software programming practices, this paper presents the application of our synchronizedtransfer-level hardware design methodology to the implementation of a hardware matrix multiplication accelerator. The methodology builds on a hardware description language for which the designer manages dynamic connections between sources and sinks that may not always be ready to send or receive data tokens. In addition to these connections, the designer can constrain the authorization of data transfers by the means of logical rules that make transfers dependant on each other. Combining both finite state machine and constraint programming paradigms, the featured description language enhances the ability to express and exploit low-level parallelism. A compiler automates the generation and the optimization of the synchronization logic, whose low-level complexity is thus hidden to the designer. Applied to the design of the pipelined matrix multiplication circuit, the proposed methodology leads to similar computing performances than the dedicated designs reported in the literature but within shorter design times (a single day), simpler source code and no need for advanced hardware design skills.