Echocardiography is an ultrasound-based imaging modality that helps the physician to visualize heart chambers and valves motion activity. Recently, deep learning plays an important role in several clinical computer-assisted diagnostic systems. There is a real need to employ deep learning methodologies to increase such systems. In this paper, we proposed a deep learning system to classify several echocardiography views and identify its physiological location. Firstly, the spatial CNN features are extracted from each frame in the echo-motion. Secondly, we proposed novel temporal features based on neutrosophic sets. The neutrosophic temporal motion features are extracted from echo-motion activity. To extract the deep CNN features, we activated a pre-trained deep ResNet model. Then, both spatial and neutrosophic temporal CNN features were fused based on features concatenation technique. Finally, the fused CNN features were fed into deep long short-term memory network to classify echo-cardio views and identify their location. During our experiments, we employed a public echocardiography dataset that consisted of 432 videos for eight cardio-views. We have investigated several pre-trained network activation performances. ResNet architecture activation achieved the best accuracy score among several pre-trained networks. The Proposed system based on fused spatial neutrosophic temporal deep features achieved 96.3% accuracy and 95.75% sensitivity. For the classification of cardio-views location, the proposed system achieved 99.1% accuracy. The proposed system achieved more accuracy than previous deep learning methods with a significant decrease in the training time cost. The experimental results showed promising results for our proposed approach.