Deep neural networks show very good performance in phoneme and speech recognition applications when compared to previously used GMM (Gaussian Mixture Model)-based ones. However, efficient implementation of deep neural networks is difficult because the network size needs to be very large when high recognition accuracy is demanded. In this work, we develop a digital VLSI for phoneme recognition using deep neural networks and assess the design in terms of throughput, chip size, and power consumption. The developed VLSI employs a fixed-point optimization method that only uses +Δ, 0, and -Δ for representing each of the weight. The design employs 1,024 simple processing units in each layer, which however can be scaled easily according to the needed throughput, and the throughput of the architecture varies from 62.5 to 1,000 times of the real-time processing speed.