To find the path that minimizes the time to navigate between two given points in a fluid flow is known as Zermelo's problem. Here, we investigate it by using a Reinforcement Learning (RL) approach for the case of a vessel which has a slip velocity with fixed intensity, Vs, but variable direction and navigating in a 2D turbulent sea. We show that an Actor-Critic RL algorithm is able to find quasi-optimal solutions for both time-independent and chaotically evolving flow configurations. For the frozen case, we also compared the results with strategies obtained analytically from continuous Optimal Navigation (ON) protocols. We show that for our application, ON solutions are unstable for the typical duration of the navigation process, and are therefore not useful in practice. On the other hand, RL solutions are much more robust with respect to small changes in the initial conditions and to external noise, even when Vs is much smaller than the maximum flow velocity. Furthermore, we show how the RL approach is able to take advantage of the flow properties in order to reach the target, especially when the steering speed is small.Zermelo's point-to-point optimal navigation problem in the presence of a complex flow is key for a variety of geophysical and applied instances. In this work, we apply Reinforcement Learning to solve Zermelo's problem in a multi-scale 2d turbulent snapshot for both frozen-in-time velocity configurations and fully time-dependent flows. We show that our approach is able to find the quasi-optimal path to navigate from two distant points with high efficiency, even comparing with policies obtained from optimal control theory. Furthermore, we connect the learned policy with the topological flow structures that must be harnessed by the vessel to navigate fast. Our result can be seen as a first step towards more complicated applications to surface oceanographic problems and/or 3d chaotic and turbulent flows.