Abstract. The Canadian Land Surface Scheme and Canadian Terrestrial Ecosystem Model (CLASS-CTEM) together form the land surface component of the Canadian Earth System model (CanESM). Here we investigate the impact of changes to CLASS-CTEM that are designed to improve the simulation of permafrost physics. Eighteen tests were performed including changing the model configuration (number and depth of ground layers, different soil permeable depth datasets, adding a surface moss layer), and investigating alternative parameterizations of soil hydrology, soil thermal conductivity and snow properties. To evaluate these changes, outputs from CLASS-CTEM were compared to 1570 active layer thickness (ALT) measurements from 97 observation sites that are part of the Global Terrestrial Network for Permafrost (GTN-P), 105 106 monthly ground temperature observations from 132 GTN-P borehole sites, a blend of 5 observation-based snow water equivalent (SWE) datasets (Blended-5), remotely-sensed albedo, and seasonal discharge for major rivers draining permafrost regions. From the tests performed, the final revised model configuration has more ground layers (increased from 3 to 20) extending to greater depth (from 4.1 m to 61.4 m) and uses a new soil permeable depths dataset with a surface layer of moss added. The most beneficial change to the model parameterizations was incorporation of unfrozen water in frozen soils. These changes to CLASS-CTEM cause a small improvement in simulated SWE with little change in surface albedo but greatly improve the model performance at the GTN-P ALT and borehole sites. Compared to the GTN-P observations, the revised CLASS-CTEM ALTs have a weighted mean absolute error (wMAE) of 0.41–0.47 m (depending on configuration), improved from > 2.5 m for the original model, while the borehole sites see a consistent improvement in wMAE for most seasons and depths considered, with seasonal wMAE values for the shallow surface layers of the revised model simulation at most 1.2 °C greater than those calculated for the model driving screen-level air temperature compared to observations at the sites. Sub-grid heterogeneity estimates were derived from the standard deviation of ALT on the 1 km2 measurement grids at the GTN-P ALT sites, the spread in wMAE in grid cells with multiple GTN-P ALT sites, as well as from 35 boreholes measured within a 1200 km2 region as part of the Slave Province Surficial Materials and Permafrost Study. Given the size of the model grid cells (ca. 2.8°), sub-grid heterogeneity makes it likely difficult to appreciably reduce the wMAE of ALT or borehole temperatures much further.