The purpose of this paper is to describe a new version of the Spoken English Corpus which will be of interest to phoneticians and other speech scientists. The Spoken English Corpus is a well-known collection of spoken-language texts that was collected and transcribed in the 1980's in a joint project involving IBM UK and the University of Lancaster (Alderson and Knowles forthcoming, Knowles and Taylor 1988). One valuable aspect of it is that the recorded material on which it was based is fairly freely available and the recording quality is generally good. At the time when the recordings were made, the idea of storing all the recorded material in digital form suitable for computer processing was of limited practicality. Although storage on digital tape was certainly feasible, this did not provide rapid computer access. The arrival of optical disk technology, with the possibility of storing very large amounts of digital data on a compact disk at relatively low cost, has brought about a revolution in ideas on database construction and use. It seemed to us that the recordings of the Spoken English Corpus (hereafter SEC) should now be converted into a form which would enable the user to gain access to the acoustic signal without the laborious business of winding through large amounts of tape. Once this was done, we should be able not only to listen to the recordings in a very convenient way, but also to carry out many automatic analyses of the material by computer.
This paper makes an instrumental analysis of English vowel monophthongs produced by 47 female Malaysian speakers. The focus is on the distribution of Malaysian English vowels in the vowel space, and the extent to which there is phonetic contrast between traditionally paired vowels. The results indicate that, like neighbouring varieties of English, Malaysian English vowels occupy a smaller vowel space than those of British English. The lack of contrast in vowel quality between vowel pairs was more apparent for /i…
The notion of alemmais so familiar in corpus linguistics that it scarcely needs a formal definition. When a wordlist or a text is lemmatised, the process is apparently transparent, so that any observer can understand how the lemma relates to the original set or string of words. We shall argue in this paper that, on the contrary, the concept of lemma is not well defined, and is in need of a clear formal definition. The lemma is a fundamental concept in the processing of texts in at least some languages, a point we shall illustrate with respect to Arabic and Malay. It so happens that English lemmas are not typical of the general category, so that linguists who base their understanding of the lemma on English obtain a distorted view. It is essential to reverse the direction of argument, and to start with a general understanding of the lemma, and to consider English lemmas in the wider context.
In this article we make use of the methodology of corpus linguistics to organize and search the several millions of words contained in the published speeches of Tun Dr. Mahathir as prime minister. Our corpus consists of over 2.5 million words of speeches, roughly half in English and half in Malay, with over 900 English speeches and over 800 Malay speeches. This study concentrates on the Malay speeches, and follows up previous work on Dr. Mahathir's English speeches. The approach is entirely data driven. Instead of making a subjective choice of words to investigate, we begin by identifying key words. We then use the ranked list of key words to investigate the immediate context in which those words occur. By concentrating on key words connected to Malaysian identity, we obtain a remarkable insight into how that identity is presented by Dr. Mahathir.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
customersupport@researchsolutions.com
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.