Дипломдук иш темасы: Жаңы кыргыз корпусундагы "Атоочтуктарды" энтектөө



Pdf көрінісі
бет65/66
Дата08.02.2022
өлшемі1,4 Mb.
#98772
түріДиплом
1   ...   58   59   60   61   62   63   64   65   66
Байланысты:
diploma paper alinapdf

Our recommendations

1
. Verbal Adjectives have forms, meanings (aspect, voice, tense, case, number) and 
functions (subject, predicate, attribute, object) and they have necessary features to be 
included into part of speech range as an independent part of speech on its own. 


84 
 2
. Taking into account all morphological and semantic peculiarities of verbal adjective, 
i.e. Atoochtuk in the Kyrgyz language we suggest following tagsets for verbal adjectives, 
leaning on the tools from Turkic Lexicon Apertium:

for ‘past-tense verbal adjective’, e.g., келген конок – arrived guest (past)

for ‘future-tense verbal adjective’, e.g., келер конок – the guests who should 
come (soon) (future) 

for ‘present-tense verbal adjective’, e.g., келүүчү конок – arriving guests 
(present) 
Acknowledgment 
I would like to express my deepest gratitude and appreciation for my supervisor Aida 
Kasieva, whose guidance, support, encouragement, feedbacks and advices have been 
invaluable throughout this research.
It was a great honour for me to write about ‘Kyrgyz Corpus Linguistics’ as it allows us to 
trace the Kyrgyz language.
GLOSSARY 
Apertium 
a free/open source platform for developing rule-based machine 
translation system 
Annotation
tagging of language data in text or spoken form 
Annotated / labeled 
corpus 
a corpus of texts that contains special labels that allow 
receiving data (statistics, language examples, etc.) from a 
corpus for any linguistic parameters (part of speech, 
grammatical form, syntactic function, etc.) 
Balanced corpus 
a representative corpus in which various components are 
presented in a “layered” form, which allows you to create a 
pattern of occurrence of a linguistic phenomenon investigated 
against the background of extralinguistic information 
Grammar markup, 
“tagger” 
a program that automatically performs grammatical 
(morphological) markup of texts-corpus 


85 
Colloquate 
a word or word form that occurs as a close neighbor of a given 
word (word form) 
Collocation 
a regular, stable combination of words, taking into account 
morphological and syntactic conditions that ensure the 
compatibility of linguistic units 
Concordance 
1) a pointer that associates each usage with context; 2) 
automatically obtained set of contexts for a given phenomenon 
(word / phrase / grammatical form, etc.) 
Corpus 
a collection of texts, usually in a machine-readable format, 
including information about the situation in which the text was 
produced, such as information about the speaker, author, 
recipient, or audience 
Corpus markup 
system of standart codes inserted into a document stored in 
electronic form to provide information about the text itself 
Lemma 
an initial (dictionary) form for a given word form 
Lemmatization 
a process of generating initial forms for word forms 
Parser 
a computer program that performs automatic markup of text at 
the syntactic or semantic level 
Parsing 
analysis of the syntactic structure of a sentence and its 
presentation in the form of a tree or structure of components 
Subcorpus
a group of texts of the corpus, united on the basis of the 
coincidence of some parameter (language, genre, etc.) 
Token 
a specific word in the text, word form, text form, word usage 
Tokenization 
splitting the flow of characters in natural language texts into 
separate significant units (tokens) 


86 


Достарыңызбен бөлісу:
1   ...   58   59   60   61   62   63   64   65   66




©engime.org 2024
әкімшілігінің қараңыз

    Басты бет