Дипломдук иш темасы: Жаңы кыргыз корпусундагы "Атоочтуктарды" энтектөө


 Brief History of Corpus Linguistics



Pdf көрінісі
бет13/66
Дата08.02.2022
өлшемі1,4 Mb.
#98772
түріДиплом
1   ...   9   10   11   12   13   14   15   16   ...   66
Байланысты:
diploma paper alinapdf

1.1.2 Brief History of Corpus Linguistics 
The Brown Corpus was the first computer-readable general corpus of texts prepared for 
linguistic research on modern English. This corpus includes about 500 texts from American 
newspapers, books and magazines and published in 1961 in US. In the Brown corpus each text 
consists of 2000 words and whole collection includes 1 million words (500 texts with 2000 
words) and he authors of this corpus are W. Francis and G. Kučera. (Zacharov, Bagdanova, 
2011) 
It is by now customary to distinguish between the pre-electronic and post-electronic eras in 
the development of corpus linguistics. Svartvik (2007), for example, notes that the initials BC, 
for a corpus linguist, stands for Before Computers. The preelectronic period refers to corpus 
studies that were predecessors of contemporary work and which were mostly done before the 


18 
1960s. For some, the early studies go back to the thirteenth century indexing work on the Bible 
and for others, to recent times as recently as the beginnings of the twentieth century work of 
American Structuralism in collecting textual samples of language use (Leech, 1992). 
Advances in computer technology, such as increase in storage capacities and the sophistication 
of available software, had a major impact on the progress of corpus linguistics. In fact, it is such 
advances that have empowered corpus linguistics to achieve its status today. Equally, we may 
say that linguistics also provided a strong impetus in developing many practical applications in 
computing in general, because it demanded new types of software in processing natural language 
for its complex manifestations at different levels. Apart from the concordances derived from data 
stored on punch cards that appeared in the late 1950s, Francis and Kučera (1964) constructed the 
first ever electronic corpus of written English at Brown University in 1961. The Brown Corpus 
set the standards for corpus design with a size of one million words. The developments following 
the Brown Corpus are described as five phases or stages in Renouf (2007, p. 28). The stages are 
determined on the basis of the periods in which a specific corpus was constructed as well as the 
“types, styles and design” of the corpora of the time. 
1. 1960s onwards: the one-million-word (or less) Small Corpus (standard, general and 
specialized, sampled, multimodal, multidimensional); 
2. 1980s onwards: the multimillion-word Large Corpus (standard, general and specialized, 
sampled, multimodal, multidimensional); 
3. 1990s onwards: the ‘Modern Diachronic’ Corpus (dynamic, open-ended, chronological data 
flow); 
4. 1998 onwards: The Web as corpus (Web texts as sources of linguistic information); 
5. 2005 onwards: The Grid (pathway to distributed corpora, consolidation of existing corpus 
types).
 
1.1.1.1 Early corpus linguistics
The idea of collecting the text for the use of language analysis is not new concept. People 
in the middle ages began to make the list of all words which take place in the texts with their 


19 
contexts. Other scholars made their own list of the most frequent words from the collections of 
texts. 
McEnery and Wilson use the term “early corpus linguistics” for all works based on corpus 
done before the advent of Chomsky (Tony McEnery, Andrew Wilson, 2001). 
Early methods based on corpus used as the fundament for different linguistic studies. The 
naturally exiting data are collected and analyzed by researchers in order to describe the change of 
language and phenomena of language etc. While texts were collected by linguists the objective 
materials also found which answers to all linguistic questions were. McEnery and Wilson believe 
that if language is finite then it is easy to collect texts and enumerate. (McEnery,Wilson, 1996) 
1.1.2.1. Corpus-based work up to the end of the 1950s 
The early empirical studies played a great role in the development of corpus linguistics.
These studies built the basis for an idea which will be improved later. In 1897 German scholar 
Käding began to compare frequency of letters and sequences of letters derive spelling 
conversations from it. Approximately 11 million German words used by him. Nowadays it looks 
unbelievable how he could work through this kind of large number of words without computer. 
Between 1876-1926 the research in language acquisition based on diaries of parents who record 
their children’s language. The interesting thing is that those findings are used nowadays as a 
source of normative information in language over half century later (Kennedy, 1998). 
In research on foreign language pedagogy two scholars also used corpus based date whose 
vocabulary lists derived from corpora based on the studies of Throndike. Those two scholars 
were Fries and Traver (Kennedy, 1998). 
Eaton who made research in field of comparative linguistics he collocates the words 
which frequently used in French, Italian and Dutch. Nowadays also his work still used as an 
example and considered the best works ever. The other scholars used his list of words in their 
works and one of them is Lorge who used semantic frequency list as Eaton. One more scholar 
named Fries whose work in descriptive grammar based on telephone conversation. All these 
works are still considered to be sophisticated and provide developing further corpora.
1.1.2.2. Corpus Linguistics and its Methodology 


20 
Corpus is a body of text, also it can be described as a large collection of texts that have been 
collected and systemized electronically from different types of texts or specific set of criteria. 
Here is four the most important characteristics to corpus: “authentic”, “large”, “electronic” and 
“specific set”. These features of corpora make them different from other types of text (Lynne 
Bowker, Jennifer Pearson, 2002).Corpus linguistics has generated number research methods. 
According to Nelson and Wallis there are 3A perspectives which named as annotation, 
abstraction and analysis (Sean Wallis, Gerald Nelson, 2001). 



Достарыңызбен бөлісу:
1   ...   9   10   11   12   13   14   15   16   ...   66




©engime.org 2024
әкімшілігінің қараңыз

    Басты бет