Article information

2021 , Volume 26, ¹ 5, p.95-105

Zhukova G.N., Ulyanov M.V.

The influence of the cardinality of the alphabet on the quality of reconstruction of a symbolic periodic sequence from a sequence with noise

The relevance of this study is associated with the presence of a wide range of applied problems in real-world data processing and analysis. It is sensible to encode information using symbols from a finite alphabet in such problems. By varying the cardinality of the alphabet, in the description of the process, the symbolic representation provides a level of detail sufficient for real-world data analysis. However, for a number of subject areas in which it is possible to use symbolic coding of trajectories of the examined processes researchers face the presence of distortions, noise, and fragmentation of information. This occurs in bioinformatics, medicine, digital economy, time series forecasting and analysis of business processes. Periodic processes are widely represented in these subject areas. Without noise, these processes correspond to periodic symbolic sequences, i.e. words over a finite alphabet. A researcher often receives a sequence distorted by noises of various origins as the experimental data, instead of the expected periodic symbolic sequence. Under these conditions, when solving the problem of identifying the periodicity, which includes both the determination of a periodically repeating symbolic fragment and its length, hereinafter called the period, the problem requires reducing the effect of noise on the experimental results.

The article deals with the problem of recovering periodic sequences, distorted by presence of noise along the replaced and deleted symbols. Since the level of detail in the description of the process depends on the cardinality of the alphabet, it is of interest to study the influence of the level of detail in the symbolic description on the possibility of recovering complete information about the initially periodic sequences.

The article experimentally examines the dependence of the cardinality of the alphabet on the quality characteristics of the period recovery method proposed by the authors. For alphabets of different cardinalities, the proportion of sequences with a satisfactorily reconstructed period and the relative error in determining the length of the period are given. The quality of reconstruction of a periodically repeating fragment is estimated by the ratio of the editing distance from the reconstructed periodic sequence to the original sequence distorted by noise

[full text]
Keywords: symbolic sequence, cardinality of an alphabet, periodic sequence, sequence with noise, noise of insertion, noise of deletion, noise of change

doi: 10.25743/ICT.2021.26.5.008

Author(s):
Zhukova Galina Nikolayevna
PhD. , Associate Professor
Position: Associate Professor
Office: HSE University
Address: 101000, Russia, Moscow, 20, Myasnitskaya ulitsa
E-mail: gzhukova@hse.ru
SPIN-code: 5754-5615

Ulyanov Mikhail Vasilievich
Dr. , Professor
Position: Professor
Office: V. A. Trapeznikov Institute of Control Sciences of Russian Academy of Sciences, Lomonosov Moscow State University
Address: 117997, Russia, Moscow, 65 Profsoyuznaya street
Phone Office: (495) 334-89-10
E-mail: muljanov@mail.ru

References:
1. Zhukova G., Smetanin Yu., Uljanov M. Informative symbolic representations as a way to qualitatively analyses time series. Proceedings of the 2019 International Conference on Engineering Technologies and Computer Science: Innovation & Application. 2019: 43–47.

2. Lin J. et al. A symbolic representation of time series, with implications for streaming algorithms. Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. ACM; 2003: 2–11.

3. Nesterenko A.Yu. Cycle detection algorithms and their applications. Journal of Mathematical Sciences (New York). 2012; 182(4):518–526.

4. Sklyar A.Ya. Analysis and elimination of noise component in time series with variable step. Kibernetika i Programmirovanie. 2019: (1):51–59. DOI:10.25136/2306-4196.2019.1.27031.

5. Zhukova G.N., Zhukov A.V., Smetanin Yu.G., Ulyanov M.V. The method of estimating the period of a symbolic periodic sequence with noise, based on the sub-words positions in the sequence. Modern Information Technology and IT-education. 2020; 16(1):23–32. DOI:10.25559/SITITO.16.202001.23-32. (In Russ.)

6. Ulyanov M.V. Podkhod k identifikatsii dliny tsikla v simvol’nykh posledovatel’nostyakh s shumom, osnovannyy na entropii slov [An approach to identifying the cycle length in symbolic sequences with noise based on the entropy of words]. Ryazan: Ryazanskiy Gosudarstvennyy Radiotekhnicheskiy Universitet; 2020: 124–128. ISBN:978-5-7722-0301-9. (In Russ.)

7. Zhukova G.N., Smetanin Yu.G., Ulyanov M.V. A stochastic model of noises for periodic symbol sequences. Modern Information Technology and IT-education. 2019; 15(2):431–440. DOI:10.25559/SITITO.15.201902.431-440.

8. Levenshtein V.I. Binary codes capable of correcting deletions, insertions, and reversals. Dokl. Akad.Nauk SSSR. 1965; 163(4):845–848. (In Russ.)

Bibliography link:
Zhukova G.N., Ulyanov M.V. The influence of the cardinality of the alphabet on the quality of reconstruction of a symbolic periodic sequence from a sequence with noise // Computational technologies. 2021. V. 26. ¹ 5. P. 95-105
Home| Scope| Editorial Board| Content| Search| Subscription| Rules| Contacts
ISSN 1560-7534
© 2024 FRC ICT