Louis G. Michael scite author profile

Louis G. Michael

5Publications

44Citation Statements Received

149Citation Statements Given

How they've been cited

How they cite others

121

147

Affiliations

Virginia Tech

Publications

Order By: Most citations

Why aren’t regular expressions a lingua franca? an empirical study on the re-use and portability of regular expressions

Davis

Michael

Coghlan

et al. 2019

View full text Add to dashboard Cite

This paper explores the extent to which regular expressions (regexes) are portable across programming languages. Many languages offer similar regex syntaxes, and it would be natural to assume that regexes can be ported across language boundaries. But can regexes be copy/pasted across language boundaries while retaining their semantic and performance characteristics? In our survey of 158 professional software developers, most indicated that they re-use regexes across language boundaries and about half reported that they believe regexes are a universal language. We experimentally evaluated the riskiness of this practice using a novel regex corpus Ð 537,806 regexes from 193,524 projects written in JavaScript, Java, PHP, Python, Ruby, Go, Perl, and Rust. Using our polyglot regex corpus, we explored the hitherto-unstudied regex portability problems: logic errors due to semantic differences, and security vulnerabilities due to performance differences. We report that developers' belief in a regex lingua franca is understandable but unfounded. Though most regexes compile across language boundaries, 15% exhibit semantic differences across languages and 10% exhibit performance differences across languages. We explained these differences using regex documentation, and further illuminate our findings by investigating regex engine implementations. Along the way we found bugs in the regex engines of JavaScript-V8, Python, Ruby, and Rust, and potential semantic and performance regex bugs in thousands of modules. CCS CONCEPTS • Software and its engineering → Reusability; • Social and professional topics → Software selection and adaptation.

show abstract

Regexes are Hard: Decision-Making, Difficulties, and Risks in Programming Regular Expressions

Michael

Donohue

Davis

et al. 2019

View full text Add to dashboard Cite

Regular expressions (regexes) are a powerful mechanism for solving string-matching problems. They are supported by all modern programming languages, and have been estimated to appear in more than a third of Python and JavaScript projects. Yet existing studies have focused mostly on one aspect of regex programming: readability. We know little about how developers perceive and program regexes, nor the difficulties that they face.In this paper, we provide the first study of the regex development cycle, with a focus on (1) how developers make decisions throughout the process, ( 2) what difficulties they face, and (3) how aware they are about serious risks involved in programming regexes. We took a mixed-methods approach, surveying 279 professional developers from a diversity of backgrounds (including top tech firms) for a high-level perspective, and interviewing 17 developers to learn the details about the difficulties that they face and the solutions that they prefer.In brief, regexes are hard. Not only are they hard to read, our participants said that they are hard to search for, hard to validate, and hard to document. They are also hard to master: the majority of our studied developers were unaware of critical security risks that can occur when using regexes, and those who knew of the risks did not deal with them in effective manners. Our findings provide multiple implications for future work, including semantic regex search engines for regex reuse and improved input generators for regex validation.

show abstract

Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular Expressions

Michael¹,

Donohue²,

Davis³

et al. 2023

Preprint

View full text Add to dashboard Cite

Replication package for "Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular Expressions"

Michael¹,

Donohue²,

Davis³

et al. 2019

View full text Add to dashboard Cite

VTLeeLab/LinguaFranca-FSE19: Artifact for the Lingua Franca paper, appearing at ESEC/FSE'19

Davis¹,

Michael²,

Coghlan³

et al. 2019

View full text Add to dashboard Cite

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Louis G. Michael

Why aren’t regular expressions a lingua franca? an empirical study on the re-use and portability of regular expressions

Regexes are Hard: Decision-Making, Difficulties, and Risks in Programming Regular Expressions

Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular Expressions

Replication package for "Regexes are Hard: Decision-making, Difficulties, and Risks in Programming Regular Expressions"

VTLeeLab/LinguaFranca-FSE19: Artifact for the Lingua Franca paper, appearing at ESEC/FSE'19

Contact Info

Product

Resources

About