dialect, dialect corpus, contextual editor, analyzer, internal corpus, metamarkup, Regional Dictionary, metadataAbstract
The article deals with the world experience of creating a dialectological corpus. The introduction of the dialect corpus into the national corpus of the Russian language on the materials of oral speech, the problems of phonetic transcription and their spelling, the development of prosodic notation and methods of automatic morphological analysis of dialect usage.
The main purpose of the article is to describe the development of the dialectological corpus of the Kazakh language on the basis of the world's experience in creating a dialect corpus and ways to improve it.
Dialectological corpus is necessary for researchers of the language to simplify and speed up research required for scientific articles, monographs and dissertations.
The article uses methods of review, description, narrative, analytical analysis, algorithmic programming in developing the dialectological division of the National Corpus of the Kazakh language, studying applied works in world linguistics in this area.
In conclusion, it should be noted that in the corpus both standardized words of the literary language and words of dialect character should be divided primarily into roots and affixes. In the dialectal corpus, a dictionary of dialect keywords is created in the same way, which is placed in the corpus database. The second stage of morphological analysis, the second part of the word forms, consists in dividing the suffixes into morphemes and labeling them according to their grammatical nature. At the same time, the dialect word forms in the dialect corpus must also be subjected to morphological analysis.
In the future, it is planned to record oral language materials from the regions into a dialect corpus and include them in the corpus. The article gives recommendations for organizing the work in accordance with this goal. This increases the value of the work.