Abstract-In this paper, a human-mimicking model for sound source recognition is presented. It consists of an artificial neural network with three neuron layers (input, middle and output) that are connected by feedback connections between the output and middle layer, on top of feedforward connections from the input to middle and middle to output layers. Learning is accomplished by the model following the Hebb principle, dictating that "cells that fire together, wire together", with some important alterations, compared to standard Hebbian learning, in order to prevent the model from forgetting previously learned patterns, when learning new ones. In addition, short-term memory is introduced into the model in order to facilitate and guide learning of neuronal synapses (long-term memory). As auditory attention is an essential part of human auditory scene analysis (ASA), it is also indispensable in any computational model mimicking it, and it is shown that different auditory attention mechanism naturally emerge from the neuronal behaviour as implemented in the model described in this paper. The learning behavior of the model is further investigated in the context of an urban sonic environment, and the importance of shortterm memory in this process is demonstrated. Finally, the effectiveness of the model is evaluated by comparing model output on presented sound recordings to a human expert listeners evaluation of the same fragments.