Video surveillance is ubiquitous. In addition to understanding various scene objects, extracting human visual attributes from the scene has attracted tremendous traction over the past many years. This is a challenging problem even for human observers. This is a multi-label problem, i.e., a subject in a scene can have multiple attributes that we are hoping to recognize, such as shoes types, clothing type, wearing some accessory, or carrying some object or not, etc. Solutions have been presented over the years and many researchers have employed convolutional neural networks (CNNs). In this work, we propose using Depthwise Separable Convolution Neural Network (DS-CNN) to solve the pedestrian attribute recognition problem. The network employs depthwise separable convolution layers (DSCL), instead of the regular 2D convolution layers. DS-CNN performs extremely well, especially with smaller datasets. In addition, with a compact network, DS-CNN reduces the number of trainable parameters while making learning efficient. We evaluated our method on two benchmark pedestrian datasets and results show improvements over the state of the art.