Intelligent city transportation systems are one of the core infrastructures of a smart city. The true ingenuity of such an infrastructure lies in providing the commuters with real-time information about citywide transports like public buses, allowing her (him) to pre-plan the travel. However, providing prior information for transportation systems like public buses in real-time is inherently challenging because of the diverse nature of different stay-locations that a public bus stops. Although straightforward factors stay duration, extracted from unimodal sources like GPS, at these locations look erratic, a thorough analysis of public bus GPS trails for 720km of bus-travels at the city of Durgapur, a semi-urban city in India, reveals that several other fine-grained contextual features can characterize these locations accurately. Accordingly, we develop BuStop, a system for extracting and characterizing the stay locations from multi-modal sensing using commuters' smartphones. Using this multi-modal information BuStop extracts a set of granular contextual features that allow the system to differentiate among the different stay-location types. A thorough analysis of BuStop using the collected in-house dataset indicates that the system works with high accuracy in identifying different stay locations like regular bus-stops, random ad-hoc stops, stops due to traffic congestion stops at traffic signals, and stops at sharp turns. Additionally, we also develop a proof-of-concept setup on top of BuStop to analyze the potential of the framework in predicting expected arrival time, a critical piece of information required to pre-plan travel, at any given bus-stop. Subsequent analysis of the PoC framework, through simulation over the test dataset, shows that characterizing the stay-locations indeed helps make more accurate arrival time predictions with deviations less than 60 seconds from the ground-truth arrival time.CCS Concepts: • Human-centered computing → Smartphones; Ubiquitous computing.