Grow and Prune Compact, Fast, and Accurate LSTMs

Dai, Xiaoliang; Yin, Hongxu; Jha, Niraj K.

doi:10.1109/tc.2019.2954495

Cited by 79 publications

(55 citation statements)

References 46 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Network growth is a complementary method to pruning that enables a sparser, yet more accurate, model before pruning starts [10], [27]. A grow-and-prune synthesis paradigm typically reduces the number of parameters in CNNs [10], [28] and LSTMs [29] by another 2×, and increases the classification accuracy [10]. It enables NN based inference even on Internet-of-Things (IoT) sensors [28].…”

Section: Efficient Neural Networkmentioning

confidence: 99%

“…Then, it prunes away insignificant connections and neurons based on magnitude information to drastically reduce model redundancy. This leads to improved accuracy and efficiency [10], [29], where the former is highly preferred on the server and the latter is critical at the edge. The training process generates two inference models, i.e., DiabNN-server and DiabNN-edge, for server and edge inference, respectively.…”

Section: Model Trainingmentioning

confidence: 99%

“…In DiabNN-edge, we base our SR layer design on the H-LSTM cell [29]. It is a variant of the conventional LSTM cell obtained through addition of hidden layers to its control gates.…”

Section: The Diabnn Architecturementioning

confidence: 99%

“…Second, they can be easily regularized through dropout, and thus lead to better generalization. Third, they offer a wide range of choices for internal activation functions, such as the rectified linear unit (ReLU), that can lead to faster learning [29]. Using H-LSTM based SR layers, DiabNN-edge reduces the model size by 130× and inference FLOPs by 2.2× relative to DiabNN-server.…”

Section: The Diabnn Architecturementioning

confidence: 99%

“…We explain the implementation details for DiabNN-edge in this section. Data input: Unlike SC layer based DiabNN-server, SR layer based DiabNN-edge acts on time series data step by step [29]. Thus, at each time step, we concatenate the temporal signal values from each data stream along with the demographic information to form an input vector of length 40 (corresponding to seven smartwatch data streams, 26 smartphone data streams, and seven demographic features, as shown in Table 1).…”

Section: Diabnn-edgementioning

confidence: 99%

See 4 more Smart Citations

DiabDeep: Pervasive Diabetes Diagnosis Based on Wearable Medical Sensors and Efficient Neural Networks

Yin

Mukadam

Dai

et al. 2021

IEEE Trans. Emerg. Topics Comput.

Self Cite

View full text Add to dashboard Cite

Diabetes impacts the quality of life of millions of people around the globe. However, diabetes diagnosis is still an arduous process, given that this disease develops and gets treated outside the clinic. The emergence of wearable medical sensors (WMSs) and machine learning points to a potential way forward to address this challenge. WMSs enable a continuous, yet user-transparent, mechanism to collect and analyze physiological signals. However, disease diagnosis based on WMS data and its effective deployment on resource-constrained edge devices remain challenging due to inefficient feature extraction and vast computation cost. To address these problems, we propose a framework called DiabDeep that combines efficient neural networks (called DiabNNs) with off-the-shelf WMSs for pervasive diabetes diagnosis. DiabDeep bypasses the feature extraction stage and acts directly on WMS data. It enables both an (i) accurate inference on the server, e.g., a desktop, and (ii) efficient inference on an edge device, e.g., a smartphone, to obtain a balance between accuracy and efficiency based on varying resource budgets and design goals. On the resource-rich server, we stack sparsely connected layers to deliver high accuracy. On the resource-scarce edge device, we use a hidden-layer long short-term memory based recurrent layer to substantially cut down on computation and storage costs while incurring only a minor accuracy loss. At the core of our system lies a grow-and-prune training flow: it leverages gradient-based growth and magnitude-based pruning algorithms to enable DiabNNs to learn both weights and connections, while improving accuracy and efficiency. We demonstrate the effectiveness of DiabDeep through a detailed analysis of data collected from 52 participants. For server (edge) side inference, we achieve a 96.3% (95.3%) accuracy in classifying diabetics against healthy individuals, and a 95.7% (94.6%) accuracy in distinguishing among type-1 diabetic, type-2 diabetic, and healthy individuals. Against conventional baselines, such as support vector machines with linear and radial basis function kernels, k-nearest neighbor, random forest, and linear ridge classifiers, DiabNNs achieve higher accuracy, while reducing the model size (floating-point operations) by up to 454.5× (8.9×). Therefore, the system can be viewed as pervasive and efficient, yet very accurate.

show abstract

Section: Efficient Neural Networkmentioning

confidence: 99%

Section: Model Trainingmentioning

confidence: 99%

“…In DiabNN-edge, we base our SR layer design on the H-LSTM cell [29]. It is a variant of the conventional LSTM cell obtained through addition of hidden layers to its control gates.…”

Section: The Diabnn Architecturementioning

confidence: 99%

Section: The Diabnn Architecturementioning

confidence: 99%

Section: Diabnn-edgementioning

confidence: 99%

See 3 more Smart Citations

DiabDeep: Pervasive Diabetes Diagnosis Based on Wearable Medical Sensors and Efficient Neural Networks

Yin

Mukadam

Dai

et al. 2021

IEEE Trans. Emerg. Topics Comput.

Self Cite

View full text Add to dashboard Cite

show abstract

Topological Insights into Sparse Neural Networks

Liu

Lee

Yaman

et al. 2021

Machine Learning and Knowledge Discovery in Databases

View full text Add to dashboard Cite

Sparse neural networks are effective approaches to reduce the resource requirements for the deployment of deep neural networks. Recently, the concept of adaptive sparse connectivity, has emerged to allow training sparse neural networks from scratch by optimizing the sparse structure during training. However, comparing different sparse topologies and determining how sparse topologies evolve during training, especially for the situation in which the sparse structure optimization is involved, remain as challenging open questions. This comparison becomes increasingly complex as the number of possible topological comparisons increases exponentially with the size of networks. In this work, we introduce an approach to understand and compare sparse neural network topologies from the perspective of graph theory. We first propose Neural Network Sparse Topology Distance (NNSTD) to measure the distance between different sparse neural networks. Further, we demonstrate that sparse neural networks can outperform over-parameterized models in terms of performance, even without any further structure optimization. To the end, we also show that adaptive sparse connectivity can always unveil a plenitude of sparse sub-networks with very different topologies which outperform the dense model, by quantifying and comparing their topological evolutionary processes. The latter findings complement the Lottery Ticket Hypothesis by showing that there is a much more efficient and robust way to find "winning tickets". Altogether, our results start enabling a better theoretical understanding of sparse neural networks, and demonstrate the utility of using graph theory to analyze them.

show abstract

Synthesis and Pruning as a Dynamic Compression Strategy for Efficient Deep Neural Networks

Finlinson

Moschoyiannis

2021

From Data to Models and Back

View full text Add to dashboard Cite

The brain is a highly reconfigurable machine capable of taskspecific adaptations. The brain continually rewires itself for a more optimal configuration to solve problems. We propose a novel strategic synthesis algorithm for feedforward networks that draws directly from the brain's behaviours when learning. The proposed approach analyses the network and ranks weights based on their magnitude. Unlike existing approaches that advocate random selection, we select highly performing nodes as starting points for new edges and exploit the Gaussian distribution over the weights to select corresponding endpoints. The strategy aims only to produce useful connections and result in a smaller residual network structure. The approach is complemented with pruning to further the compression. We demonstrate the techniques to deep feedforward networks. The residual sub-networks that are formed from the synthesis approaches in this work form common sub-networks with similarities up to ~90%. Using pruning as a complement to the strategic synthesis approach, we observe improvements in compression.

show abstract

Grow and Prune Compact, Fast, and Accurate LSTMs

Cited by 79 publications

References 46 publications

DiabDeep: Pervasive Diabetes Diagnosis Based on Wearable Medical Sensors and Efficient Neural Networks

DiabDeep: Pervasive Diabetes Diagnosis Based on Wearable Medical Sensors and Efficient Neural Networks

Topological Insights into Sparse Neural Networks

Synthesis and Pruning as a Dynamic Compression Strategy for Efficient Deep Neural Networks

Contact Info

Product

Resources

About