This paper introduces a framework of contextual thinking in mobile devices. It is based on real-time sensing of local time, significant locations, location-dwelling states, and user states to infer significant activities. A significant activity is a well-defined activity to be inferred, for example, waiting for a bus, having a meeting, working in office, taking a break in a coffee shop et al. A significant location is defined as a geofence, which can be a node associated with a circle, or a polygon. A location-dwelling state is defined as enter into a significant location, the location-dwelling duration, or exit from a significant location. A user state is a combination of user mobility states, user actions, user social states and event psychological states. With this initial study, we just focus on the user motion states including static, slow walking, walking and fast moving that can be fast walking or driving. However, the framework and the activity inference algorithm are flexible for adopting other user states in the future. Using the measurements of the built-in sensors and radio signals in mobile devices, we can capture a snapshot of a contextual tuple for every second, which includes a time tag, an ID of a significant location, a locationdwelling duration, and a user state. The sequence of contextual tuples is used as the inputs for inferring the user significant activities. The contextual thinking engine will evaluate the posteriori probability of each significant activity for each given contextual tuple using a Bayesian approach. An "un-defined" activity is adopted to cover all activities other than the selected significant activities. A prototype of the contextual thinking engine has been developed in the Geospatial Computing Lab at Texas A&M University Corpus Christi. A test environment was setup on the campus. Six significant activities were defined and tested by two different testers for three days using two different smartphones. These significant activities include: 1) working in an office; 2) having a meeting; 3) having a lunch, 4) having a coffee break, 5) visiting the library, and 6) waiting for a bus. An "un-defined" activity was included to cover all activities other than the selected significant activities. The inferred activities were then compared with the labeled activities to assess the performance of the contextual thinking engine. We demonstrated that the success rate of inference was more than 90% on average. We recognized that the positioning accuracy plays a significant role in the inference algorithm because it has direct impact to two elements in the contextual tuple: the significant location and the location-dwelling duration.