Carefully crafted computer worms such as Stuxnet and recent data breaches on retail organizations (e.g., Target, Home Depot) are very sophisticated security attacks on critical cyber infrastructures. Such attacks are referred to as advanced persistent threat (APT), and are on constant rise with severe implications. In all these attacks, the presence of an attacker itself is difficult to detect as they log-in as legitimate users. Hence, these attacks comprising multiple actions are challenging to differentiate from benign and therefore common detection techniques have to deal with high false positive rates. While machine learning and game theoretic models have been applied for intrusion detection, machine learning techniques lack the ability to model the rationality of the players, while the game theoretic approaches rely on the strict assumption of full rationality and complete information. This thesis discusses an approach that proposes Q-Learning to model the decision process of a security administrator which addresses the joint limitations of using game theory and machine learning techniques for this problem. This work compares variations of Q-Learning with a traditional stochastic game model by performing a simulation under different pair of profiles for attackers and defenders using parameters derived from real incident data of a large computer organization.Analysis on the strengths and weaknesses of the algorithms, and how the parameters in the algorithms affect the performance are studied. Simulation results show that Naive Q-Learning, despite the restricted information on the opponent, better reduces the impact of an attacker compared to Minmax Q-Learning against all attackers, or Stochastic Games players against less rational opponents.ii To my parents, for their love and support.iii ACKNOWLEDGMENTS