Given the exploding growth of video traffic, efficient video distribution is essential to the future Internet. Therefore, how to optimize its networking cost is a critical research problem. In this paper, we introduce Network Function Virtualization (NFV) in conjunction with Software-Defined Networking (SDN) to minimize the cost via joint orchestration of caching, transcoding and routing functions. Specifically, we propose a two-step iterative approach. First, in NFV-based resource allocation phase, we maximize total cache hits by optimally allocating storage and computing resources for a giving routing policy. Second, in SDN-based routing phase, we minimize the networking cost by optimally configuring the routing matrix for a given resource placement. Finally, we analytically prove their iterative repeat converges to the joint optimum. Through extensive simulations, we verify its convergence, and performance gains compared with the optimal solution of either phase alone. By examining numerical results, we obtain some operational guidelines. From the resource allocation aspect, we should allocate more resources to the node with heavier request rate. From the routing aspect, for each node-server pair, the node should split the traffic across multiple paths with identical shortest hops if there are many, or use the shortest path alone if there is only one.