We present LaMDA: Language Models for Dialog Applications. LaMDA is a family of Transformerbased neural language models specialized for dialog, which have up to 137B parameters and are pre-trained on 1.56T words of public dialog data and web text. While model scaling alone can improve quality, it shows less improvements on safety and factual grounding. We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of safety and factual grounding. The first challenge, safety, involves ensuring that the model's responses are consistent with a set of human values, such as preventing harmful suggestions and unfair bias. We quantify safety using a metric based on an illustrative set of human values, and we find that filtering candidate responses using a LaMDA classifier fine-tuned with a small amount of crowdworker-annotated data offers a promising approach to improving model safety. The second challenge, factual grounding, involves enabling the model to consult external knowledge sources, such as an information retrieval system, a language translator, and a calculator. We quantify factuality using a groundedness metric, and we find that our approach enables the model to generate responses grounded in known sources, rather than responses that merely sound plausible. Finally, we explore the use of LaMDA in the domains of education and content recommendations, and analyze their helpfulness and role consistency. * Work done while at Google.
Expert reviews are frequently used as a questionnaire evaluation method but have received little empirical attention. Questions from two surveys are evaluated by six expert reviewers using a standardized evaluation form. Each of the questions has validation data available from records. Large inconsistencies in ratings across the six experts are found. Despite the lack of reliability, the average expert ratings successfully identify questions that had higher item nonresponse rates and higher levels of inaccurate reporting. This article provides empirical evidence that experts are able to discern questions that manifest data quality problems, even if individual experts vary in what they rate as being problematic. Compared to a publicly available computerized question evaluation tool, ratings by the human experts positively predict questions with data quality problems, whereas the computerized tool varies in success in identifying these questions. These results indicate that expert reviews have value in identifying question problems that result in lower survey data quality. 295
A common hypothesis about practices to reduce survey nonresponse is that those persons brought into the respondent pool through persuasive efforts may provide data filled with measurement error. Two questions flow from this hypothesis. First, does the mean square error of a statistic increase when sample persons who are less likely to be contacted or cooperate are incorporated into the respondent pool? Second, do nonresponse bias estimates made on the respondents, using survey reports instead of records, provide accurate information about nonresponse bias? Using a unique data set, the Wisconsin Divorce Study, with divorce records as the frame and questions about the frame information included in the questionnaire, this article takes a first look into these two issues. We find that the relationship between nonresponse bias, measurement error bias, and response propensity is statistic-specific and specific to the type of nonresponse. Total bias tends to be lower on estimates calculated using all respondents, compared with those with only the highest contact and cooperation propensities, and nonresponse bias analyses based on respondents yield conclusions similar to those based on records. Finally, we find that error properties of statistics may differ from error properties of the individual variables used to calculate the statistics.
Non-response weighting is a commonly used method to adjust for bias due to unit nonresponse in surveys. Theory and simulations show that, to reduce bias effectively without increasing variance, a covariate that is used for non-response weighting adjustment needs to be highly associated with both the response indicator and the survey outcome variable. In practice, these requirements pose a challenge that is often overlooked, because those covariates are often not observed or may not exist. Surveys have recently begun to collect supplementary data, such as interviewer observations and other proxy measures of key survey outcome variables. To the extent that these auxiliary variables are highly correlated with the actual outcomes, these variables are promising candidates for non-response adjustment. In the present study, we examine traditional covariates and new auxiliary variables for the National Survey of Family Growth, the Medical Expenditure Panel Survey, the American National Election Survey, the European Social 389 390 K r e u t e r e t a l . 1 7 3 ( 2 0 1 0 ) Surveys and the University of Michigan Transportation Research Institute survey. We provide empirical estimates of the association between proxy measures and response to the survey request as well as the actual survey outcome variables. We also compare unweighted and weighted estimates under various non-response models. Our results from multiple surveys with multiple recruitment protocols from multiple organizations on multiple topics show the difficulty of finding suitable covariates for non-response adjustment and the need to improve the quality of auxiliary data. i n J o u r n a l o f t h e ro y a l S t a t i S t i c a l S o c i e t y a
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.