D.T.V. Dharmajee Rao scite author profile

D.T.V. Dharmajee Rao

4Publications

6Citation Statements Received

43Citation Statements Given

How they've been cited

How they cite others

Affiliations

Publications

Order By: Most citations

Winograd’s Inequality: Effectiveness for Efficient Training of Deep Neural Networks

Rao¹,

Ramana²

2018

IJISA

View full text Add to dashboard Cite

Abstract-Matrix multiplication is widely used in a variety of applications and is often one of the core components of many scientific computations. This paper will examine three algorithms to compute the product of two matrices: the Naive, Strassen's and Winograd's algorithms. One of the main factors of determining the efficiency of an algorithm is the execution time factor, how much time the algorithm takes to accomplish its work. All the three algorithms will be implemented and the execution time will be calculated and we find that Winograd's algorithm is the best and fast method experimentally for finding matrix multiplication. Deep Neural Networks are used for many applications. Training a Deep Neural Network is a time consuming process, especially when the number of hidden layers and nodes is large. The mechanism of Backpropagation Algorithm and Boltzmann Machine Algorithm for training a Deep Neural Network is revisited and considered how the sum of weighted input is computed. The process of computing the sum of product of weight and input matrices is carried out for several hundreds of thousands of epochs during the training of Deep Neural Network. We propose to modify Backpropagation Algorithm and Boltzmann Machine Algorithm by using fast Winograd's algorithm. Finally, we find that the proposed methods reduce the long training time of Deep Neural Network than existing direct methods.

show abstract

Accelerating Training of Deep Neural Networks on GPU using CUDA

Rao¹,

Ramana²

2019

IJISA

View full text Add to dashboard Cite

The development of fast and efficient training algorithms for Deep Neural Networks has been a subject of interest over the past few years because the biggest drawback of Deep Neural Networks is enormous cost in computation and large time is consumed to train the parameters of Deep Neural Networks. This aspect motivated several researchers to focus on recent advancements of hardware architectures and parallel programming models and paradigms for accelerating the training of Deep Neural Networks. We revisited the concepts and mechanisms of typical Deep Neural Network training algorithms such as Backpropagation Algorithm and Boltzmann Machine Algorithm and observed that the matrix multiplication constitutes major portion of the work-load for the Deep Neural Network training process because it is carried out for a huge number of times during the training of Deep Neural Networks. With the advent of many-core GPU technologies, a matrix multiplication can be done very efficiently in parallel and this helps a lot training a Deep Neural Network not consuming time as it used to be a few years ago. CUDA is one of the high performance parallel programming models to exploit the capabilities of modern many-core GPU systems. In this paper, we propose to modify Backpropagation Algorithm and Boltzmann Machine Algorithm with CUDA parallel matrix multiplication and test on many-core GPU system. Finally we discover that the planned strategies achieve very quick training of Deep Neural Networks than classic strategies.

show abstract

A Novel Approach for Efficient Training of Deep Neural Networks

Rao¹,

Ramana²

2018

IJEECS

View full text Add to dashboard Cite

Deep Neural Network training algorithms consumes long training time, especially when the number of hidden layers and nodes is large. Matrix multiplication is the key operation carried out at every node of each layer for several hundreds of thousands of times during the training of Deep Neural Network. Blocking is a well-proven optimization technique to improve the performance of matrix multiplication. Blocked Matrix multiplication algorithms can easily be parallelized to accelerate the performance further. This paper proposes a novel approach of implementing Parallel Blocked Matrix multiplication algorithms to reduce the long training time. The proposed approach was implemented using a parallel programming model OpenMP with collapse() clause for the multiplication of input and weight matrices of Backpropagation and Boltzmann Machine Algorithms for training Deep Neural Network and tested on multi-core processor system. Experimental results showed that the proposed approach achieved approximately two times speedup than classic algorithms.

show abstract

A Generational Evolutionary Approach on Large Databases for Quality Records

Deepika¹,

Rao²

2015

IJCTT

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.