Localization is very essential for interaction when it comes to multisensory integration.Based on Superior Colliculus (SC) motivation, the audio and visual signal processing during the stimuli integration is investigated. A novel methodology is proposed using neural network architecture that can localize effectively, especially in integrating stimuli of varied intensities in lower order audio and visual signals. During the integration, cases arise where the SC is unable to localize the source due to simultaneous arrival of too weak or too strong stimuli, causing enhancement and depression phenomena. This phenomena arise when the SC is not able to localize the source based on the given stimuli intensities. This paper provides a dual layered neural network model that integrates visual and audio sensory stimuli and also drives a way to track the stimuli source. This behavior is applicable for guided robots that help humans to track or cooperate for tasks like personal assistance, route guidance and incident tracking applications.