Strings of nucleotides that carry biological information are typically described using sequence motifs that can be represented by weight matrices or consensus sequences. However, many biological signals in DNA or RNA are recognized by multiple factors in temporal sequence, consist of a mixture of sometimes dissimilar alternative motifs, or may be described better by base composition. Here we apply the Latent Dirichlet Allocation (LDA) mixture model to nucleotide sequences, using k-mers as features, in three related approaches. First, positions in aligned sequences are used as samples. Alternatively, whole sequences are used as samples, either with positional k-mers or with bulk k-mers occurring throughout the sequence. LDA readily identifies motifs, including such elusive cases as the intron branch site. LDA can also identify subtypes of sequence, such as splice site subtypes enriched in long vs. short introns, and can reliably distinguish such properties as reading frame or species of origin. Our results show that LDA is a useful model for describing heterogeneous signals, for assigning individual sequences to subtypes, and for identifying and characterizing sequences that do not fit recognized subtypes. Because LDA topic models are interpretable, they also aid the discovery of new motifs, even those present in a small fraction of samples, allowing a user to hypothesize potential regulatory factors. In summary, LDA can identify and characterize signals in nucleotide sequences and is useful for identifying candidate regulatory factors involved in complex biological processes.