We consider measurements to estimate the unknown parameters (variance, smoothness, and covariance length) of a covariance function by maximizing the joint Gaussian log-likelihood function. To overcome cubic complexity in the linear algebra, we approximate the discretized covariance function in the hierarchical (H-) matrix format. The H-matrix format has a log-linear computational cost and storage O(kn log n), where the rank k is a small integer, and n is the number of locations. The H-matrix technique allows us to work with general covariance matrices efficiently, since H-matrices can approximate inhomogeneous covariance functions, with a fairly general mesh that is not necessarily axes-parallel, and neither the covariance matrix itself nor its inverse has to be sparse. We research how the H-matrix approximation error influences on the estimated parameters. We demonstrate our method with Monte Carlo simulations with known true values of parameters and an application to soil moisture data with unknown parameters. The C, C++ codes and data are freely available. becomes challenging, due to the computation of the inverse and log-determinant of the n-by-n covariance matrix C(θ). Indeed, this requires O(n 2 ) memory and O(n 3 ) computational steps. Hence, scalable methods that can process larger sample sizes are needed.Stationary covariance functions, discretized on a rectangular grid, have block Toeplitz structure. This structure can be further extended to a block circulant form and resolved with the Fast Fourier Transform (FFT) [79,16,25,74,18]. The computing cost, in this case, is O(n log n). However, this approach either does not work for data measured at irregularly spaced locations or requires expensive, non-trivial modifications.During the past two decades, a large amount of research has been devoted to tackling the aforementioned computational challenge of developing scalable methods: for example, low-rank tensor methods [52,56], covariance tapering [21,38,68], likelihood approximations in both the spatial [72,73] and spectral [19] domains, latent processes such as Gaussian predictive processes [6] and fixed-rank kriging [15], and Gaussian Markov random-field approximations [64,63,20]; see [77] for a review. A generalization of the Vecchia approach and a general Vecchia framework was introduced in [37, 78]. Each of these methods has its strengths and drawbacks. For instance, covariance tapering sometimes performs even worse than assuming independent blocks in the covariance [70]; low-rank approximations have their own limitations [71]; and Markov models depend on the observation locations, requiring irregular locations to be realigned on a much finer grid with missing values [76]. A matrix-free approach for solving the multi-parametric Gaussian maximum likelihood problem was developed in [4]. To further improve on these issues, other methods that have been recently developed include the nearest-neighbor Gaussian process models [17], low-rank update [66], multiresolution Gaussian process models [57], equivalent ...