Genome-scale models of metabolism can illuminate the molecular basis of cell phenotypes. Since many enzymes are only active in specific cell types, several algorithms use omics data to construct cell line- and tissue-specific metabolic models from genome-scale models. However, these methods have not been rigorously benchmarked, and it is unclear how algorithm and parameter selection (e.g., gene expression thresholds, metabolic constraints) impacts model content and predictive accuracy. To investigate this, we built hundreds of models of four different cancer cell lines using six algorithms, four gene expression thresholds and three sets of metabolic constraints. Model content varied substantially across different parameter sets, but model extraction method choice had the largest impact on the accuracy of model-predicted gene essentiality. We further highlight how assumptions during model development influence the accuracy of model prediction. These insights will guide further development of context-specific models, thus more accurately resolving genotype-phenotype relationships.