METHODS OF AUTOMATION OF TEXTS IN THE KAZAKH NATIONAL CORPUS

Authors

  • Amirbekova A.B. Институт языкознания имени А.Байтурсынулы
  • Konyrova A.T. КазУМОИМЯ имени Абылай хана
  • Kaіyrbekova U.S. Университет дружбы народов имени академика А. Куатбекова

DOI:

https://doi.org/10.48371/PHILS.2023.70.3.001

Keywords:

National corpus, semantics, automation, language base, translation, lexical layer, educational building, digitalization of the language

Abstract

In the era of language globalization, when the lexicographic base becomes fully accessible in the digital system, it becomes possible to optimize language acquisition. The national Corpus of the Kazakh language is a digitized version of the Kazakh word. Since the Kazakh language corpus is a system of linguistic knowledge, on the basis of which it consists of several subcorps, the demand for the corpus is growing day by day. This is due to the fact that Kazakhstan is a multinational state. Therefore, representatives of other nationalities who consume Kazakh culture want to determine the translation equivalent. The corpus will also become an effective linguistic base for language learners. 

The purpose of the article is to semanticize the lexicographic base included in the training corpus, especially the words of the lexical layer of the Kazakh language and adapt to automation in accordance with the digital system. The article suggests the difference of the corpus from other subcorpuses, the classification of semantics and methods of interpretation (automation) of semantic groups. This is the scientific significance of the article.

 The methods of content analysis, generalization, and description were used when introducing the lexical base into the training corpus. The scientific conclusions presented in the article are of practical importance, contributing to the development of the corpus, the development of electronic applications for language acquisition. 

Published

2023-09-29

Issue

Section

Статьи