In this paper we study backward stochastic differential equations (BSDEs) driven by the compensated random measure associated to a given pure jump Markov process X on a general state space K. We apply these results to prove well-posedness of a class of nonlinear parabolic differential equations on K, that generalize the Kolmogorov equation of X. Finally we formulate and solve optimal control problems for Markov jump processes, relating the value function and the optimal control law to an appropriate BSDE that also allows to construct probabilistically the unique solution to the Hamilton-Jacobi-Bellman equation and to identify it with the value function. and the conditional probability that the process is in A immediately after a jump at time T 1 = s issee below for precise statements. We denote by F the natural filtration of the process X. Denoting by T n the jump times of X, we consider the marked point process (T n , X Tn ) and the associated random measure p(dt dy) = n δ (Tn,XT n ) on (0, ∞) × K, where δ denotes the Dirac measure. In the markovian case the dual predictable projectionp of p (shortly, the compensator) has the following explicit expressioñ p(dt dy) = ν(t, X t− , dy) dt.In the first part of the paper we introduce a class of BSDEs driven by the compensated random measure q(dt dy) := p(dt dy) −p(dt dy) and having the following formfor given generator f and terminal condition g. Here Y is real-valued, while Z is indexed by y ∈ K, i.e. it is a random field on K, with appropriate measurability conditions, and the generator depends on Z(1.4) By the results above there exists a unique solution (Y t,x s , Z t,x s ) s∈[t,T ] and previous estimates on the BSDEs are used to prove well-posedness of (1.3). As a by-product we also obtain the representation formulae, which are sometimes called, at least in the diffusive case, non linear Feynman-Kac formulae.The second application, that we present in section 5 is an optimal control problem. This is formulated in a classical way by means of a change of probability measure, see e.g. [11], [12], [4]. For every fixed (t, x) ∈ [0, T ] × K, we define a class A t of admissible control processes u, and the cost to be minimized and the corresponding value function are J(t, x, u(·)) = E t,x u T t l(s, X s , u s ) ds + g(X T ) , v(t, x) = inf u(·)∈A t J(t, x, u(·)),