Paul Harnack scite author profile

Paul Harnack

3Publications

57Citation Statements Received

36Citation Statements Given

How they've been cited

How they cite others

Affiliations

Chapman University

Publications

Order By: Most citations

A deep learning approach to identifying source code in images and video

Ott

Atchison

Harnack

et al. 2018

View full text Add to dashboard Cite

While substantial progress has been made in mining code on an Internet scale, efforts to date have been overwhelmingly focused on data sets where source code is represented natively as text. Large volumes of source code available online and embedded in technical videos have remained largely unexplored, due in part to the complexity of extraction when code is represented with images. Existing approaches to code extraction and indexing in this environment rely heavily on computationally intense optical character recognition. To improve the ease and efficiency of identifying this embedded code, as well as identifying similar code examples, we develop a deep learning solution based on convolutional neural networks and autoencoders. Focusing on Java for proof of concept, our technique is able to identify the presence of typeset and handwritten source code in thousands of video images with 85.6%-98.6% accuracy based on syntactic and contextual features learned through deep architectures. When combined with traditional approaches, this provides a more scalable basis for video indexing that can be incorporated into existing software search and mining tools. CCS CONCEPTS • Information systems → Video search; • Computing methodologies → Machine learning approaches; • Computer systems organization → Neural networks; • Software and its engineering → Software libraries and repositories;

show abstract

Learning lexical features of programming languages from imagery using convolutional neural networks

Ott

Atchison

Harnack

et al. 2018

View full text Add to dashboard Cite

We demonstrate the ability of deep architectures, specifically convolutional neural networks, to learn and differentiate the lexical features of different programming languages presented in coding video tutorials found on the Internet. We analyze over 17,000 video frames containing examples of Java, Python, and other textual and non-textual objects. Our results indicate that not only can computer vision models based on deep architectures be taught to differentiate among programming languages with over 98% accuracy, but can learn language-specific lexical features in the process. This provides a powerful mechanism for carrying out program comprehension research on repositories where source code is represented with imagery rather than text, while simultaneously avoiding the computational overhead of optical character recognition. CCS CONCEPTS • Computer systems organization → Neural networks; • Software and its engineering → Software libraries and repositories;

show abstract

A Curated Set of Labeled Code Tutorial Images for Deep Learning

Bergh

Harnack

Atchison

et al. 2020

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Paul Harnack

A deep learning approach to identifying source code in images and video

Learning lexical features of programming languages from imagery using convolutional neural networks

A Curated Set of Labeled Code Tutorial Images for Deep Learning

Contact Info

Product

Resources

About