Much research on natural language processing (NLP), computational linguistics and lexicography has relied and depended on linguistic corpora. In recent years, many organizations around the world have been constructing their own large corpora to achieve corpus representativeness and/or linguistic comprehensiveness. However, there is no reliable guideline as to how large machine readable corpus resources should be compiled to develop practical NLP software and/or complete dictionaries for humans and computational use. In order to shed some new light on this issue, we shall reveal the flaws of several previous researches aiming to predict corpus size, especially those using pure regression or curve‐fitting methods. To overcome these flaws, we shall contrive a new mathematical tool: a piecewise curve‐fitting algorithm, and next, suggest how to determine the tolerance error of the algorithm for good prediction, using a specific corpus. Finally, we shall illustrate experimentally that the algorithm presented is valid, accurate and very reliable. We are confident that this study can contribute to solving some inherent problems of corpus linguistics, such as corpus predictability, compiling methodology, corpus representativeness and linguistic comprehensiveness.
The concepts of explicit and implicit (knowledge) are at the core of SLA studies. We take explicit as conscious and declarative (knowledge); implicit as unconscious, automatic and procedural (knowledge) (DeKeyser, 2003; R. Ellis, 2005a, 2005b, 2009; Hulstjin, 2005; Robinson, 1996; Schmidt, 1990, 1994). The importance of those concepts and components, we believe, must also be acknowledged in language teaching, and consequently in language teaching materials. However, explicitness and implicitness are rather complex constructs; such complexity allows for multiple nuances and perspectives in their analysis, and this fact poses a real challenge for their identification in the learning and teaching process and materials. We focus here on ELT materials and aim at the building of a reliable construct which may help in the identification of their potential for promoting implicit and explicit components. We first determined the features to identify the construct for implicitness and explicitness; next, we validated it and then we applied it to the analysis of the activities included in three sample units of three textbooks. The results were computed along a continuum ranging from 0 to 10 in each activity. The systematization and computation of results will hopefully offer a reliable figure regarding the identification of the degree of explicitness and/or implicitness in the materials analysed.
This chapter starts exploring the potential of co-occurrence data for word sense disambiguation. The findings on the robustness of the different distribution of co-occurrence data on the assumption that distinct meanings of the same word attract different co-occurrence data, has taken the author to experiment (i) on possible grouping of word meanings by means of cluster analysis and (ii) on word sense disambiguation using discriminant function analysis. In addition, two priorities have been pursued: first, find robust statistical techniques, and second, minimize computational costs. Future research aims at the transition from coarse-grained senses to finer-grained ones by means of reiteration of the same model on different levels of contextual differentiation.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.