Network Functions (NFs) perform on-path processing of network traffic. ISPs are deploying NF Virtualization (NFV) with software NFs run on commodity servers. ISPs aim to ensure that NF chains, directed acyclic graphs of NFs, do not violate Service Level Objectives (SLOs) promised by the ISP to its customers. To meet SLOs, NFV systems sometimes leverage on-path hardware (such as programmable switches and smart NICs) to accelerate NF execution. Lemur places and executes NF chains across heterogeneous hardware while meeting SLOs. Lemur's novel placement algorithm yields an SLO-satisfying NF placement while weighing many constraints: hardware memory and processing stages, server cores, link capacity, NF profiles, and NF chain interactions. Lemur's metacompiler automatically generates code and rules (in P4, Python, eBPF, C++, and OpenFlow) to stitch cross-platform NF chain execution while also optimizing resource usage. Our experiments show that Lemur is alone among competing strategies in meeting SLOs for canonical NF chains while maximizing marginal throughput (the traffic rate in excess of the service-level objective).