Distracted driving on roads is a problem that is common across the world now. With increasing use of smarter and connected devices, coupled with their miniature form factors, humans are now increasingly using these devices under mobility. When operating a vehicle, using smart devices can pose serious threats to road safety. Another contributing factor to distracted driving today stems from the urge to eat, drink and fall asleep while driving. In this paper, we use a popular and publicly available image dataset captured from embedded cameras inside cars that indicate instances of distracted driving or not. Different from existing works that look at the entire image to classify distracted driving, we first localize objects within the image that impact distracted driving. There are three broad categories we localize, namely, external entities (smartphones and bottles); entities within the car (steering wheel) and human-centered entities (left and right hand). Our approach to localize objects is based on Regional-Convolutional Neural Networks (R-CNNs). Once we localize these objects, we then design simpler machine learning techniques to process the relative locations of these objects within the image to detect instances of distracted driving. Our resulting performance evaluations demonstrate the validity of our approach. To the best of our knowledge, our work in this paper is unique, and we believe, provides more contextual relevance towards detecting instances of distracting driving, and could possibly yield newer approaches to educate drivers on safe driving.