Deep reinforcement learning (deep RL) has achieved superior performance in complex sequential tasks by using deep neural networks as function approximators to learn directly from raw input images. However, learning directly from raw images is data inefficient. The agent must learn feature representation of complex states in addition to learning a policy. As a result, deep RL typically suffers from slow learning speeds and often requires a prohibitively large amount of training time and data to reach reasonable performance, making it inapplicable to real-world settings where data is expensive. In this work, we improve data efficiency in deep RL by addressing one of the two learning goals, feature learning. We leverage supervised learning to pre-train on a small set of non-expert human demonstrations and empirically evaluate our approach using the asynchronous advantage actor-critic algorithms (A3C) in the Atari domain. Our results show significant improvements in learning speed, even when the provided demonstration is noisy and of low quality.the Asynchronous Advantage Actor-Critic (A3C) ) algorithm in six Atari games (Bellemare et al. 2013). Unlike previous work where a large amount of expert human data is required to achieve good initial performance boost, our approach shows significant learning speed improvements on all experiments with only a relatively small amount of noisy, non-expert demonstration data. The simplicity of our approach has made it generally adaptable to other deep RL algorithms and potentially to other domains since the collection of demonstration data becomes easy. In addition, we apply Gradient-weighted Class Activation Mapping (Grad-CAM) (Selvaraju et al. 2017) on learned feature maps for both the human data and the agent data, providing a detailed analysis on why pre-training helps to speed up learning. Our work makes the following contributions:1. We show that pre-training on a small amount of non-expert human demonstration data is sufficient to achieve significant performance improvements. 2. We are the first to apply the transformed Bellman (TB) operator (Pohlen et al. 2018) in the A3C algorithm ) and further improve A3C's performance on both baseline and pre-training methods. 3. We propose a modified version of the Grad-CAM method (Selvaraju et al. 2017), which we are the first to provide empirical analysis on what features are learned from pre-training, indicating why pre-training on human demonstration data helps. 4. We release our code and all collected human demonstration data at https://github.com/gabrieledcjr/ DeepRL.This article is organized as the following. In the next section, we review some of the related work in using pre-training to improve data efficiency. Section 3 provides background on deep RL algorithms and the transformed Bellman operator. In Section 4, we propose our pre-training methods for deep RL. Followed by Section 5 where we describe the experimental designs. Results and analysis are presented in Section 6. We conclude this article in Section 7 with discussio...