6Stirling numbers of the first kind are used in the derivation of several 7 population genetics statistics, which in turn are useful for testing evolu-8 tionary hypotheses directly from DNA sequences. Here, we explore the 9 cumulative distribution function of these Stirling numbers, which enables 10 a single direct estimate of the sum, using representations in terms of the 11 incomplete beta function. This estimator enables an improved method for 12 calculating an asymptotic estimate for one useful statistic, Fu's Fs. By 13 reducing the calculation from a sum of terms involving Stirling numbers to 14 a single estimate, we simultaneously improve accuracy and dramatically 15 increase speed. 16 Keywords Population genetics statistics; Evolutionary inference from sequence align-17 ments; Stirling numbers of the first kind; Asymptotic analysis; Numerical algorithms; 18 Cumulative distribution function. 19 1 Introduction 20 The dominant paradigm in population genetics is based on a comparison of ob-21 served data with parameters derived from a theoretical model [1, 2]. Specifically 22 for DNA sequences, many techniques have been developed to test for extreme 23 relationships between average sequence diversity (number of DNA differences 24 between individuals) and the number alleles (distinct DNA sequences in the 25 population). In particular, such methods are widely used to predict selective 26 pressures, where certain mutations confer increased or decreased survival to the next generation [2]. Such selective pressures are relevant for understanding and 28 modeling practical problems such as influenza evolution over time [3] and during 29 vaccine production [4]; adaptations in human populations, which may impact 30 disease risk [5, 6]; and the emergence of new infectious diseases and outbreaks 31 [7]. 32 Many population genetics tests are therefore formulated as unidimensional 33 test statistics, where the pattern of DNA mutations in a sample of individuals 34 is reduced to a single number [2, 1, 8]. Such statistics are heavily informed 35 by combinatorial sampling and probability distribution theories, many of which 36 are built upon the foundational Ewens's sampling formula [9]. Ewens's sam-37 pling formula describes the expected distribution of the number of alleles in a 38 sample of individuals, given the nucleotide diversity. Calculation of subsets of 39 this distribution are useful for testing deviations of observed data from a null 40 model; such subsets often require the calculation of Stirling numbers of the first 41 kind (hereafter referred to simply as Stirling numbers). In particular, two pop-42 ulation genetics statistics, the Fu's F s and Strobeck's S statistics, utilize this 43 approach [8, 10]. The former has recently been shown to be potentially useful 44 for detecting genetic loci under selection during population expansions (such as 45 an infectious outbreak) both in theory and in practice [7]. However, Stirling 46 numbers rapidly grow large and overwhelm the standard floating point range of 47 55Her...