SUMMARYIt is widely accepted that features such as pI, length, molecular mass and amino acid (AA) sequence have a significant influence on protein solubility. Here, we mainly focused on AA composition and explored those that most affected the soluble expression level of human serum albumin (HSA) domain antibody (dAb). The soluble expression and sequence of 65 dAb variants were analysed using clustering and linear modelling. Certain AAs significantly affected the soluble expression level of dAb, with the specific AA combinations being (S, R, N, D, Q), (G, R, C, N, S) and (R, S, G); these combinations respectively affected the dAb expression level in the broth supernatant, the level in the pellet lysate and total soluble dAb. Among the 20 AAs, R displayed a negative influence on the soluble expression level, whereas G and S showed positive effects. A linear model was built to predict the soluble expression level from the sequence; this model had a prediction accuracy of 80 %. In summary, increasing the content of polar AAs, especially G and S, and decreasing the content of R, was helpful to improve the soluble expression level of HSA dAb.