Developing cognition is difficult to achieve yet crucial for robots. Infants can gradually improve their cognition through parental guidance and self-exploration. However, conventional learning methods for robots often focus on a single modality and train a pre-defined model by large datasets in an offline way. In this paper, we propose a hierarchical autonomous cognitive architecture for robots to learn object concepts online by interacting with humans. Two pathways for audiovisual information are devised. Each pathway has three layers based on the self-organizing incremental neural networks. Visual features and names of objects are incrementally learned and self-organized in an unsupervised way in sample layers, respectively, in which we propose a dynamically adjustable similarity threshold strategy to allow the network itself to control cluster rather than using a pre-defined threshold. Two symbol layers abstract the cluster results from the corresponding sample layer to form concise symbols and transmit them to an associative layer. An associative relationship between two modalities can be built in real time by binding activated visual and auditory symbols simultaneously in the associative layer. In this layer, a top-down response strategy is proposed to let robots autonomously recall another associative modality, solve conflicting associative relationships, and adjust learned knowledge from the top down. The experimental results on two objects datasets and a real task show that our architecture is efficient to learn and associate object view and name in an online way. What is more, the robot can autonomously improve its cognitive level by utilizing its own experience without enquiring with humans. INDEX TERMS Cognitive development, concept online learning, self-organizing incremental neural network, object recognition, audiovisual integration.