One of the most difficult tasks in the area of computer vision is object detection, which combines object categorization and object location within a scene. In terms of object detection, Deep Neural Networks have been recently demonstrated to outperform alternative approaches. The issues related deep learning neural network is its complexity and huge computation, so it is not possible to detect and track the objects in image of high resolution in real time. We proposed scaled YOLOv4 lite model as Single Stage Detector Neural Network for object detection, tracking and it is trained using COCO 2017 dataset. To create the YOLOv4-CSP- P5- P6- P7- P8 networks, the Scaled YOLOv4 applied efficient network scaling strategies. The additional layer in YOLOv4 lite model is added as P8 layer which improves accuracy. Cross-stage-partial (CSP) connections and Mish activation are used in improved network design, such as backbone optimization and Neck (PAN). In the case of YOLOv4, however, it can only be trained once for all resolutions. Width and Height activations have been changed, allowing for faster network training. With YOLOv4 lite model, we used CSPDarkNet-53 model as a backbone. The experimental result show our YOLOv4 lite model can detect and track object up to 28 fps when model run with the video input and has an accuracy of 86.09% when tested on real-time video with resolutions 1920 × 1080 (full HD). AP = 50.81%, AP @50 = 63.6%, and AP @75 = 52.5% for CSPDarkNet-53 model backbone.