In this work we investigate the advantages of multiscale methods in Petrov-Galerkin (PG) formulation in a general framework. The framework is based on a localized orthogonal decomposition of a high dimensional solution space into a low dimensional multiscale space with good approximation properties and a high dimensional remainder space, which only contains negligible fine scale information. The multiscale space can then be used to obtain accurate Galerkin approximations. As a model problem we consider the Poisson equation. We prove that a Petrov-Galerkin formulation does not suffer from a significant loss of accuracy, and still preserve the convergence order of the original multiscale method. We also prove inf-sup stability of a PG Continuous and a Discontinuous Galerkin Finite Element multiscale method. Furthermore, we demonstrate that the Petrov-Galerkin method can decrease the computational complexity significantly, allowing for more efficient solution algorithms. As another application of the framework, we show how the Petrov-Galerkin framework can be used to construct a locally mass conservative solver for two-phase flow simulation that employs the Buckley-Leverett equation. To achieve this, we couple a PG Discontinuous Galerkin Finite Element method with an upwind scheme for a hyperbolic conservation law. 2 where a(v, w) := Ω A∇v · ∇w and (v, w) := (v, w) L 2 (Ω) .The problematic term in the equation is the diffusion matrix A, which is known to exhibit very fast variations on a very fine scale (i.e. it has a multiscale character). These variations can be highly heterogenous and unstructured, which is why it is often necessary to resolve them globally by an underlying computational grid that matches the said heterogeneity. Using standard finite element methods, this results in high dimensional solution spaces and hence an enormous computational demand, which often cannot be handled even by today's computing technology. Consequently, there is a need for alternative methods, so called multiscale methods, which can either operate below linear computational complexity by using local representative elements (cf. [1,2,11,18,19,24,37]) or which can split the original problem into very localized subproblems that cover Ω but that can be solved cheaply and independent from each other (cf. [5,8,12,13,17,26,39,28,29,32,34,38]).In this paper, we focus on a rather recent approach called Localized Orthogonal Decomposition (LOD) that was introduced by Målqvist and Peterseim [36] and further generalized in [25,20].We consider a coarse space V H , which is low-dimensional but possibly inadequate for finding a reliable Galerkin approximation to the multiscale solution of problem (1.2). The idea of the method is to start from this coarse space and to update the corresponding set of basis functions step-by-step to improve the approximation properties of the space. In a summarized form, this can be described in four steps: 1) define a (quasi) interpolation operator I H from H 1 0 (Ω) onto V H , 2) information in the kernel of th...