Introduction The introduction of immuno-oncology (IO) therapies has changed the treatment landscape of non-small cell lung cancer (NSCLC). Numerous cost-effectiveness analyses (CEAs) and technology appraisals (TAs) evaluating IO therapies have been recently published. Objective We reviewed economic models of first-line (1L) IO therapies for previously untreated advanced or metastatic NSCLC to identify methodological challenges associated with modeling cost effectiveness from published literature and TAs and to make recommendations for future CEAs in this disease area. Methods A systematic literature review was conducted following Cochrane and PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines. We searched MEDLINE, Embase, EconLit (January 2009-January 2020), and select conferences (since 2016) for CEAs of 1L IO treatments in patients with recurrent or metastatic, epidermal growth factor receptor (EGFR)/anaplastic lymphoma kinase (ALK) mutation-negative NSCLC, published in English. TAs from England, Scotland, Canada, Australia, Germany, and France were also examined. Two reviewers screened the results and extracted the data. The quality of the CEAs was described using the Drummond checklist. Results In total, 46 records reporting on 38 unique models met protocol-defined criteria and were included. Five models adjusted for treatment switching or crossover in base-case analyses, and the remainder considered treatment switching or crossover to represent clinical practice and made no adjustment. Seven models used external real-world data for survival modeling or extrapolation validation. Six models that assumed long-term treatment benefit stopped at 3 or 5 years after initiation. Seven models used the observed time-on-treatment distribution from the trial, and eight used progression-free survival for treatment duration. All models compared one or more IO monotherapies or combination therapies with chemotherapy. Only one study directly compared different IO agents but did not consider the concordance issue across programmed death-ligand 1 (PD-L1) testing methods. Utilities were modeled by health state in 12 models, four applied a time-to-death approach, and ten explored both. None applied cure models. Conclusion Variations in methodological challenges were seen across studies. Previous models took approaches that were followed in subsequent models, such as a 2-year stopping rule of IO duration or treatment-effect waning. Challenges such as heterogeneity in PD-L1 testing and survival extrapolation and validation using real-world data should be further considered for future models in advanced or metastatic NSCLC.