Recent efforts in the network function virtualization (NFV) field have targeted ameliorating end-to-end latency of service function chains (SFCs) using network function (NF) composition. NF composition breaks the NF into building blocks and decides the appropriate combination for these blocks. However, two issues remain in current NF composition methods: (1) In the sequential scope, the fine-grained function's block-level composition eliminates redundancy, but with the drawback of flexibility restriction. (2) In the vertical scope, complete NF parallelism adds overhead in packet copying and reordering. To reconcile this, here we present ParaGraph, a subgraph-level NF composition with delay-balanced parallelism. ParaGraph has three main components: an NF subgraph-extraction module that extrapolates right-grained core function subgraphs from NFs; an orchestrator that dynamically composes subgraphs with delay-balanced parallelism; and an infrastructure performing lightweight packet copying and merging. We implement a ParaGraph prototype based on Click and the Data Plane Development Kit (DPDK); extensive evaluations show that with minimum overhead, ParaGraph reaches line-speed packet processing and reduces latency by up to 55% compared to state-of-the-art methods.