The neuromorphic event cameras, which capture the optical changes of a scene, have drawn increasing attention due to their high speed and low power consumption. However, the event data are noisy, sparse, and nonuniform in the spatial-temporal domain with extremely high temporal resolution, making it challenging to process for traditional deep learning algorithms. To enable convolutional neural network models for event vision tasks, most methods encode events into point-cloud or voxel representations, but their performance still has much room for improvement. Additionally, as event cameras can only detect changes in the scene, relative movements can lead to misalignment, i.e., the same pixel may refer to different real-world points at different times. To this end, this work proposes the aligned compressed event tensor (ACE) as a novel event data representation, and a framework called branched event net (BET) for event-based vision under both static and dynamic scenes. We apply them on various datasets for object classification and action recognition tasks, and show that they surpass state-of-the-art methods by significant margins. Specifically, our method achieves 98.88% accuracy for the DVS128 action recognition task, and outperforms the second best method by large margins of 4.85%, 9.56% and 2.33% on N-Caltech101, DVSAction and NeuroIV datasets, respectively. Furthermore, the proposed ACE-BET is efficient, and achieves the fastest inference speed among various methods being tested.