This paper presents a fast analytical method for estimating the throughput of pipelined asynchronous systems, and then applies that method to develop a fast solution to the problem of pipelining "slack matching." The approach targets systems with hierarchical topologies, which typically result when high-level (block structured) language specifications are compiled into data-driven circuit implementations. A significant contribution is that our approach is the first to efficiently handle architectures with choice (i.e., the presence of conditional computation constructs such if-then-else and conditional loops).The key idea behind the fast speed of our analysis method is to exploit information about the hierarchy of a given block-structured system, thereby yielding a runtime that is linear in the number of pipeline stages. In contrast, existing approaches typically represent an entire system as a single Petri net or marked graph, and then apply Markov chain analysis or other state enumeration methods with costly runtimes.Building upon our analysis approach, we introduce a novel solution to the problem of slack matching, i.e., determining optimal insertion of FIFO stages into a pipelined design to improve performance. We present both an optimal solution using an MILP formulation, and a fast heuristic algorithm that yielded optimal results for all of our examples.