SEMANTIC MARKUP IS ONE OF THE COMPONENTS OF THE NATIONAL LANGUAGE CORPUS

A.B. Amirbekova; G. Talgatkyzy; L. Urakova

doi:10.48371/PHILS.2022.64.1.001

Authors

A.B. Amirbekova Институт языкознания имени А.Байтурсынулы
G. Talgatkyzy
L. Urakova

DOI:

https://doi.org/10.48371/PHILS.2022.64.1.001

Keywords:

Keywords: semantic tags, semantic classification, markup, vocabulary, subgroup,, semantic tags, semantic classification, markup, vocabulary, subgroup, subcorpus

Abstract

Abstract. The article describes the principles of semantic markup in the National Corpus of the Kazakh language. The purpose of the article is to consider and develop a system of semantic tags ready for use in the language corpus. This approach is based on the semantic classification of vocabulary and is universal and applicable to any language. The practical significance of dictionary and text corpus markup is to improve the quality of search and expand user capabilities. The scientific significance of the article is determined by the fact that the markup and semantic classification should be focused on any programming paradigm. We have chosen a functional paradigm. The main results of the article are, firstly, the semantic marking of national corpora significantly improves the quality of the search and expands the user's capabilities when requesting linguistic information; secondly, the semantic information about each token in which an entry is made is presented as a set of semantic markups or tags and is usually reflected in the semantic classification of the language's vocabulary. Conclusions are drawn about further possibilities of using corpus data for modern studies of lexical and grammatical semantics. The publication was made within the framework of a scientific project on No. BR on the topic " DEVELOPMENT OF THE NATIONAL CORPUS OF THE KAZAKH LANGUAGE AS INFORMATION-INNOVATION STATE LANGUAGE BASE: RESEARCH AND TRAINING INTERNET RESOURCE", supported by the Ministry of Education and Science of the Republic of Kazakhstan.

SEMANTIC MARKUP IS ONE OF THE COMPONENTS OF THE NATIONAL LANGUAGE CORPUS

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section

Language

Information