Дипломдук иш темасы: Жаңы кыргыз корпусундагы "Атоочтуктарды" энтектөө


The theoretical and practical importance



Pdf көрінісі
бет11/66
Дата08.02.2022
өлшемі1,4 Mb.
#98772
түріДиплом
1   ...   7   8   9   10   11   12   13   14   ...   66
Байланысты:
diploma paper alinapdf

The theoretical and practical importance: 
In this diploma paper important priorities of the 
creation of the Corpus of the Kyrgyz language are considered, as well as the tasks of the first stage, 
which allow to deduce the most important features of the Kyrgyz language Corpus, the annotation 
process of which has been carried out completely manually. We favored a manual annotation over 
a semi-automatic one, to find language independent tools which support a fine grained level of 
annotation. 
1.1.1
 
What is corpus linguistics?
The word “Corpus” derived from Latin language “Korpus”, singular form is corpus and 
plural is corpora which is a systematic collection of text usually used for linguistic analysis. The 
texts are stored electronically so by the help of computers the information can be available. 
“Systematic” means content and structure of corpus follows extra linguistic principles which 
based on the basis of included text were chosen (Nesselhauf, 2005). Corpora are formed from 
any life situation it can be composed from books, magazines, fiction, non-fiction, phone 
conversation, business meeting in another word, it takes place where the linguistic conversation 
occurs. Such corpora consist of thousand or billon of words which are based on authentic 
naturally written and spoken usage. Corpora provide the data like language use, register and 
frequency (Kennedy, 1998). There are examples of corpora which are balanced and best 
examples of nowadays: International Corpus of English, the British National Corpus (BNC). 
Based on a definition of corpus, the corpus linguistics is the study of language through 
corpora. According to David Crystal “Corpora are large and systematic enterprises: whole text or 
whole sections of text are included from different genres.” (Crystal, 2003) This analysis mostly 
carried out on computer by software programs. Corpus linguistics is a method which analyses 
data qualitatively and quantitatively. The corpus linguistic approach also can be used as a 
description of language features and to test hypothesis structured in different linguistic 


16 
framework. Many well-known scholars made their contribution to modern corpus linguistic, 
including: Biber, McCarthy, Leech, Hungston etc. These scholars made significant contribution 
to development of present and past corpus linguistics. One of the influential scholars of modern 
corpus linguistics is John Sinclair, he mentioned that a word doesn’t carry meaning when stands 
alone but it can be meaningful within the combination with other words or the meaning mostly 
made through several words in sequence (Sinclair, 1991). Corpus linguistics answers to the two 
fundamental research questions: 

What particular patterns are associated with grammatical and lexical features? 

How do these patterns differ within varieties and registers? 
Corpus linguistics is not able to show correctness or incorrectness in language; it can 
only show what is present in the corpus. Most directors of corpora believe that corpus is faulty 
when it does not present all manners to express a certain idea, instead of this they should believe 
that the manner is not mostly common in register represented by the corpus (Kennedy, 1998). 
The one of the great advantages of corpus linguistics is that researchers do not have to be 
dependent on their own or other native speaker’s institution or examples which are made up by 
them rather they can test their hypotheses through on authentic large amount of naturally 
occurring language information produced by different writers or speakers.
The linguistics uses 6 types of data for linguistic analysis: 
1.
Data created by intuition;
2.
The researchers own intuition;
3.
Other people’s intuition; 
4.
Naturally existing text; 
5.
Randomly collected texts; 
6.
Systematic collection of text (Fillmore, 1992). 
Following table (Vladimir V. Rykov) shows the main differences between corpus linguistics 
from 


Достарыңызбен бөлісу:
1   ...   7   8   9   10   11   12   13   14   ...   66




©engime.org 2024
әкімшілігінің қараңыз

    Басты бет