BackgroundDespite new treatment options for hepatocellular carcinomas (HCC) recently, 5-year survival remains poor, ranging from 50 to 70%, which may attribute to the lack of early diagnostic biomarkers. Thus, developing new biomarkers for early diagnosis of HCC, is extremely urgent, aiming to decrease HCC-related deaths.MethodsIn the study, we conducted a comprehensive characterization of gene expression data of HCC based on a bioinformatics method. The results were confirmed by real time polymerase chain reaction (RT-PCR) and TCGA database to prove the credibility of this integrated analysis.ResultsAfter integrating analysis of seven HCC gene expression datasets, 1167 differential expressed genes (DEGs) were identified. These genes mainly participated in the process of cell cycle, oocyte meiosis, and oocyte maturation mediated by progesterone. The results of experiments and TCGA database validation in 10 genes was in full accordance with findings in integrated analysis, indicating the high credibility of our integrated analysis of different gene expression datasets. ASPM, CCT3, and NEK2 was showed to be significantly associated with overall survival of HCC patients in TCGA database.ConclusionThis method of integrated analysis may be a useful tool to minish the heterogeneity of individual microarray, hopefully outputs more accurate HCC transcriptome profiles based on large sample size, and explores some potential biomarkers and therapy targets for HCC.Electronic supplementary materialThe online version of this article (doi:10.1186/s13000-016-0596-x) contains supplementary material, which is available to authorized users.