Summary. Currently, most surveys ask for occupation with open-ended questions. The verbal responses are coded afterwards, which is error prone and expensive. We present an alternative approach that allows occupation coding during the interview. Our new technique uses a supervised learning algorithm to predict candidate job categories. These suggestions are presented to the respondent, who in turn can choose the most appropriate occupation. 72.4% of the respondents selected an occupation when the new instrument was tested in a telephone survey, entailing potential cost savings. To aid further improvements, we identify some factors for how to increase quality and to reduce interview duration.
Asking people about their occupation is common practice in surveys and censuses around the world. The answers are typically recorded in textual form and subsequently assigned (coded) to categories, which have been defined in official occupational classifications. While this coding step is often done manually, substituting it with more automated workflows has been a longstanding goal, promising reduced data-processing costs and accelerated publication of key statistics. Although numerous researchers have developed different algorithms for automated occupation coding, the algorithms have rarely been compared with each other or tested on different data sets. We fill this gap by comparing some of the most promising algorithms found in the literature and testing them on five data sets from Germany. The first two algorithms we test exemplify a common practice in which answers are coded automatically according to a predefined list of job titles. Statistical learning algorithms—that is, regularized multinomial regression, tree boosting, or algorithms developed specifically for occupation coding (algorithms three to six)—can improve upon algorithms one and two, but only if a sufficient number of training observations from previous surveys is available. The best results are obtained by merging the list of job titles with coded answers from previous surveys before using this combined training data for statistical learning (algorithm 7). However, the differences between the algorithms are often small compared to the large variation found across different data sets, which we ascribe to systematic differences in the way the data were coded in the first place. Such differences complicate the application of statistical learning, which risks perpetuating questionable coding decisions from the training data to the future.
Short-time work (STW) in Germany allows for a lot of flexibility in actual usage. Ex ante, firms notify the Employment Agency about the total number of employees eligible, and, up to the total granted, firms can flexibly choose how many employees actually use STW. In firm-level surveys, which provide timely information on STW in Germany, over-reporting of the number of employees on STW is prevalent. This study explores reasons for STW over-reporting based on a high-frequency and low-cost survey initiated during the Covid-19-pandemic (BeCovid) and a low-frequency and high-cost long-running survey (BP). Merging administrative records on actual use of STW, firms that use STW prove more likely to participate in the BeCovid survey. Multi-establishment firms over-report STW because they tend to report STW for all subfirms. The BP uses more interview time and confirms the over-reporting of STW use in the survey month, while—crucially—the over-reporting drops sharply with a few months of retrospection.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2025 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.