Session-based Recommendation (SBR) aims to predict the next item for a session, which consists of several clicked items in a transaction. Most SBR approaches follow an underlying assumption that all sequential information should be strictly utilized. Thus, they model temporal information for items using implicit, explicit, or ensemble methods. In fact, users may recall previously clicked items but might not remember the exact order in which they were clicked. Therefore, focusing on representing item temporal information in various ways could make learning session intents challenging. In this paper, we rethink the necessity of temporal information for items in SBR. We propose Aggregating the Contextual intents of the session with Attentive networks, namely ACARec. Specifically, we avoid explicitly modeling positional embeddings and learn contextual intents through aggregation methods (convolutions or poolings). We also demonstrate that even an entirely position-agnostic aggregation approach can yield promising results. Extensive experiments on real-world datasets validate our arguments. We hope our study can provide insights into SBR and inspire future research in the community.