We consider an optimal control problem for a weighted deterministic transition system required to satisfy a constraint expressed as a Linear Temporal Logic (LTL) formula over its labels. By assuming that the executions of the system incur time-varying penalties modeled as Markov chains, our goal is to minimize the expected average cumulative penalty incurred between consecutive satisfactions of a desired property. Using concepts from theoretical computer science, we provide two solutions to this problem. First, we derive a provably correct optimal strategy within the class of strategies that do not exploit values of penalties sensed in real time. Second, we show that by taking advantage of locally sensing the penalties, we can construct heuristic strategies leading to lower collected penalty. While still ensuring satisfaction of the LTL constraint, we cannot guarantee optimality in the latter case. We provide a user-friendly implementation of the proposed algorithms and analysis of two case studies.