Essay writing tests, integral in many educational settings, demand significant resources for manual scoring. Automated essay scoring (AES) can alleviate this by automating the process, thereby reducing human effort. However, the multitude of AES models, each varying in its features and scoring approaches, complicates selecting one optimal model, especially when evaluating diverse content-related aspects across multiple rating items. Therefore, we propose a hierarchical rater model-based approach to integrate predictions from multiple AES models, accounting for their distinct scoring behaviors. We investigated its performance on data from a university essay writing test. The proposed method achieved accuracy that was comparable to the best individual AES model. This is a promising result because it additionally reduced the amount of differential item functioning between human and automated scoring and thus established a higher degree of measurement invariance compared to the individual AES models.