We propose a method to develop trustworthy reinforcement learning systems. To ensure safety especially during exploration, we automatically synthesize a correct-by-construction runtime enforcer, called a shield, that blocks all actions of the agent that are unsafe with respect to a temporal logic specification. Our main contribution is a new synthesis algorithm for computing the shield online. Existing offline shielding approaches compute exhaustively the safety of all states-action combinations ahead-of-time, resulting in huge computation times, large memory consumption, and significant delays at runtime due to the look-ups in huge databases. The intuition behind online shielding is to compute at runtime the set of all states that could be reached in the near future. For each of these states, the safety of all available actions is analysed and used for shielding as soon as one of the considered states is reached. Our proposed method is general and can be applied to a wide range of planning problems with stochastic behaviour. For our evaluation, we selected a 2player version of the classical computer game Snake. The game requires fast decisions and the multiplayer setting induces a large state space, computationally expensive to analyze exhaustively. The safety objective of collision avoidance is easily transferable to a variety of planning tasks.