Deep Neural Networks have achieved remarkable advancement in various intelligence tasks. However, the massive computation and storage consumption limit applications on resource-constrained devices. While channel pruning has been widely applied to compress models, it is challenging to reach very deep compressions for such a coarse-grained pruning structure without significant performance degradation. In this article, we propose an acceleration-aware fine-grained channel pruning (AFCP) framework for accelerating neural networks, which optimizes trainable gate parameters by estimating residual errors between pruned and original channels with hardware characteristics. Our fine-grained concept consists of both algorithm and structure levels. Different from existing methods that leverage a pre-defined pruning criterion, AFCP explicitly considers both zero-out and similar criteria for each channel and adaptively selects the suitable one via residual gate parameters. For structure level, AFCP adopts a fine-grained channel pruning strategy for residual neural networks and a decomposition-based structure, which further extends the pruning optimization space. Moreover, instead of using theoretical computation costs such as FLOPs, we propose the hardware predictor that bridges the gap between realistic acceleration and pruning procedure to guide the learning of pruning, which improves the efficiency of model pruning when deployed on accelerators. Extensive evaluation results demonstrate that AFCP outperforms state-ofthe-art methods, and achieves a favorable balance between model performance and computation cost.