Object tracking is crucial for planning safe maneuvers of mobile robots in dynamic environments, in particular for autonomous driving with surrounding traffic participants. Multistage processing of sensor measurement data is thereby required to obtain abstracted high-level objects, such as vehicles. This also includes sensor fusion, data association, and temporal filtering. Often, an early-stage object abstraction is performed, which, however, is critical, as it results in information loss regarding the subsequent processing steps. We present a new grid-based object tracking approach that, in contrast, is based on already fused measurement data. The input is thereby pre-processed, without abstracting objects, by the spatial grid cell discretization of a dynamic occupancy grid, which enables a generic multi-sensor detection of moving objects. On the basis of already associated occupied cells, presented in our previous work, this paper investigates the subsequent object state estimation. The object pose and shape estimation thereby benefit from the freespace information contained in the input grid, which is evaluated to determine the current visibility of extracted object parts. An integrated object classification concept further enhances the assumed object size. For a precise dynamic motion state estimation, radar Doppler velocity measurements are integrated into the input data and processed directly on the object-level. Our approach is evaluated with real sensor data in the context of autonomous driving in challenging urban scenarios.