Markov chains are a natural and well understood tool for describing one-dimensional patterns in time or space. We show how to infer k-th order Markov chains, for arbitrary k, from finite data by applying Bayesian methods to both parameter estimation and model-order selection. Extending existing results for multinomial models of discrete data, we connect inference to statistical mechanics through information-theoretic (type theory) techniques. We establish a direct relationship between Bayesian evidence and the partition function which allows for straightforward calculation of the expectation and variance of the conditional relative entropy and the source entropy rate. Finally, we introduce a novel method that uses finite data-size scaling with model-order comparison to infer the structure of out-of-class processes.