A method is presented for determining within strict bounds the probability of matching a regular expression with a match start point in a given section of a random data string. The method in general requires time and space exponential in the number of optional characters in the regular expression, but in practice was used to determine bounds for probabilities of matching all the ProSite patterns without difficulty.
Acknowledgements: I am grateful to Matt Jackson for pointing out that UK media don't all prefer the TNCC method, and for detailed comments on the abstract; and to Stephen Mack for reviewing the main text.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.