FROM THE WORLD EXPERIENCE OF THE DEVELOPMENT OF THE HISTORICAL SUBCORPUS

Authors

  • Seitbekova A.A. А.Байтұрсынұлы атындағы тіл білімі институты
  • A.K. Seidamat

DOI:

https://doi.org/10.48371/PHILS.2022.66.3.012

Keywords:

historical subcorpus, facsimile, transcription, Arabic graphics, written monuments

Abstract

In the era of information technology development, the preparation of written forms in electronic format has become a requirement of the time. Many countries of the world are developing their own national buildings. Such large-scale research work is also being carried out in Kazakh linguistics ("corpus linguistics"). To date, the National Corpus of the Kazakh language has collected an impressive database of texts. It is continuously being improved as an innovative information source. The concept of a national corpus is a tool not only for synchronic, but also for diachronic research. This article discusses the need to create a "historical subcorpus" within the National Corpus of the Kazakh language. "Historical Subcorpus" is one of the most popular linguistic tools for any user in online mode, in order to find the necessary materials for them to study the language, history, culture, literature of the written heritage of the V-XX centuries. The purpose of this article is to collect, digitize and add texts of ancient and medieval written heritage to the corpus with informational and linguistic meta–meanings. In the field of applied linguistics, such a historical subcorpus is being created for the first time. There are special difficulties in digitizing extant manuscripts with different graphics. The article examines the structure and practical application in the world practice of the development of the historical subcorpus: the process of developing the historical subcorpus of world languages, in particular the historical subcorpus of the Russian and German languages, is analyzed. Taking into account the experience of other countries, the main directions for the development of the historical corpus of the Kazakh language will be determined: issues of identification and collection of texts of written monuments of different eras; sorting, classification, processing of the quality and composition of collected materials, introduction and demonstration of the text, definition of structures of informational meta-meanings for each text. The problem of the formulation of linguistic designations is also taken into account – this is one of the difficulties in developing a historical subcorpus. The presented research work can be used in the development of the historical subcorpus of any language.

Published

2022-09-30

Issue

Section

Статьи