Medical coding plays an essential role in medical billing, health resource planning, clinical research and quality assessment. Automated coding systems offer promising solutions to streamline the coding process, improve accuracy and reduce the burden on medical coders. To date, there has been limited research focusing on inpatient diagnosis coding using an extensive comprehensive dataset and encompassing the full ICD-10 code sets. In this study, we investigate the use of language models for coding inpatient diagnoses and examine their performance using an institutional dataset comprising 230,645 inpatient admissions and 8677 diagnosis codes spanning over a six-year period. A total of three language models, including two general-purpose models and a domain-specific model, were evaluated and compared. The results show competitive performance among the models, with the domain-specific model achieving the highest micro-averaged F1 score of 0.7821 and the highest mean average precision of 0.8097. Model performance varied by disease and condition, with diagnosis codes with larger sample sizes producing better results. The rarity of certain diseases and conditions posed challenges to accurate coding. The results also indicated the potential difficulties of the model with long clinical documents. Our models demonstrated the ability to capture relevant associations between diagnoses. This study advances the understanding of language models for inpatient diagnosis coding and provides insights into the extent to which the models can be used.