There is a pressing need to understand the impact of contaminants on Arctic ecosystems; however, most toxicity tests are based on temperate species, and there are issues with reliability and relevance of bioassays in general. Together this may result in an underestimation of harm to Arctic organisms and contribute to significant uncertainty in risk assessments. To help address these concerns, a critical review to assess reported effects for these species, quantify methodological and endpoint relevance gaps, and identify future research needs for testing was performed. We developed uniform criteria to score each study, allowing an objective comparison across experiments to quantify their reliability and relevance. We scored a total of 48 individual studies, capturing 39 tested compounds, 73 unique Arctic test species, and 95 distinct endpoints published from 1975 to 2021. Our analysis shows that of 253 test substance and species combinations scored (i.e., a unique toxicity test), 207 (82%) failed to meet at least one critical study criterion that contributes to data reliability for use in risk assessment. Arctic-focused toxicity testing needs to ensure that exposures can be analytically confirmed, include environmentally realistic exposure scenarios, and report test methods more thoroughly. Significant data gaps were identified as related to standardized toxicity testing with Arctic species, diversity of compounds tested with these organisms, and the inclusion of ecologically relevant sublethal and chronic endpoints assessed in Arctic toxicity testing. Overall, there needs to be ongoing improvement in test conduction and reporting in the scientific literature to support effective risk assessments in an Arctic context.