№1, 2025

AN EMPIRICAL ANALYSIS OF TRADITIONAL RECOGNITION METHODS USING EXAMPLES OF IDENTIFYING WORDS SPOKEN BY NATIVE SPEAKERS
Elchin Ismailov

Many users now interact with a form of artificial intelligence on a daily basis through search engines, social media, and voice recognition software. As the field matures, it is likely to permeate our lives in ever more surprising ways, so it will be important to create new governance structures to ensure its fair and transparent use. Along with machine vision algorithms for processing photo and video information, as well as natural language techniques for semantic analysis of texts, working with audio information is also the most demanded procedure for conducting business analytics. The article considers the problem of speech signal recognition using the example of an audio database formed on the basis of words reproduced by a native speaker in different tonalities with his characteristic pronunciation. In the proposed approach, the sound signal is considered as a one-dimensional representation of sound wave oscillations with a certain sampling frequency. To implement the task, classical DTW and DDTW methods, as well as methods based on the Fourier transform, discrete and continuous wavelet transforms are used. A computational experiment with the recognition of speech signals reproduced in the Azerbaijani language revealed the advantages of the continuous wavelet transform as the most accurate recognition method in the context of solving the problem (pp.68-74).

Keywords:Signal recognition, Recognition method, Audio database, Sound recording, Adequacy criteria, Distance metric, Pairwise comparison of signals
References

Afouras, T., Chung, J.S., Senior, A., Vinyals, O., Zisserman, A. (2018). Deep audio-visual speech recognition. IEEE Trans. Pattern Anal. Mach. Intell., 44(12): 8717–8727.

Elmir, B.S., Abdeslam, Y.D. (2019). A study on automatic speech recognition. Journal of Information Technology Review, 10, 77–85.

Geler, Z., Kurbalija, V., Ivanović, M., Radovanović, M., Dai, W. (2024). Dynamic time warping: Itakura vs Sakoe-Chiba, In Proceedings of International Symposium on Innovations in Intelligent Systems and Applications, https://ieeexplore.ieee.org/document/8778300.

Haridas, A.V., Marimuthu, R., Sivakumar, V.G. (2018). A critical review and analysis on techniques of speech recognition: the road ahead. Int. J. Knowl. Base. Intell. Eng. Syst, 22, 39–57.

Hindarto, H., Anshory, I.; Efiyanti, A. (2024). Feature extraction of heart signals using fast Fourier transform. https://jurnal.unej.ac.id/index.php/prosiding/article/view/4187

Itakura, F. (1978) Minimum prediction residual principle applied to speech recognition. Transactions on Acoustics, Speech and Signal Processing, 23(1), 67–72.

Jiang, Sh., Chen, Z. (2024). Application of dynamic time warping optimization algorithm in speech recognition of machine translation. Research Article, 9(11), https://doi.org/10.1016/j.heliyon.2023.e21625.

Keogh, E.J., Pazzani M.J. Derivative Dynamic Time Warping. (2001). In Proceedings of the 2001 SIAM International Conference on Data Mining,
https://doi.org/10.1137/1.9781611972719.1

Kerimov, A.B. (2022). Accuracy comparison of signal recognition methods on the example of a family of successively horizontally displaced curves. Informatics and Control Problems, 42(2), 80–91.

Kerimov, A.B. (2022). Accuracy comparison of signal recognition methods on the example of a family of successively horizontally displaced curves. Informatics and Control Problems, 42(2), 80–91.

Linh, L.H., Hai, N.T.; Thuyen, N.V., Mai, T.T., Toi, V.V. (2014). MFCC-DTW algorithm for speech recognition in an intelligent wheelchair. In Proceedings of 5th International Conference on Biomedical Engineering, pp. 417–421.

Novozhilov, B.M. (2016). Calculation of the derivative of an analog signal in a programmable logic controller. Aerospace Scientific Journal of Moscow State Technical University, 4, 1–12 (In Russian).

Rajeev, R., Abhishek, T. (2019). Analysis of feature extraction techniques for speech recognition system. Int. J. Innovative Technol. Explor. Eng, 8, 197–200.

Rzayev, R.R., Kerimov, A.B., Garibli, U.G., Salmanov, F.M. (2024). Criteria for assessing the adequacy of image recognition methods and their verification using examples of artificial series of signals. Problems of Information Society, 15(1), 10–17.

Sakoe, H., Chiba, S. (1978). Dynamic programming algorithm optimization for spoken word recognition. IEEE Transactions on Acoustics, Speech, and Signal Processing, 26(1), 43–49.

Saraswat, S., Srivastava, G., Sachchidanand, N. (2024). Wavelet transform based feature extraction and classification of atrial fibrillation arrhythmia.
http://biomedpharmajournal.org/?p=17470

Zhao, M., Chai, Q., Zhang, Sh. (2009). A method of image feature extraction using wavelet. In Proceedings, International Conference on Intelligent Computing, ICIC, Emerging Intelligent Computing Technology and Applications, pp. 187–192.

Zhi-Qiang, U., Jia-Qi, Z., Xin, W., Zi-Wei, L., Yong, L. (2024). Improved algorithm of DTW in speech recognition. IOP Conference Series: Materials Science and Engineering, 563(5), 24–36.

Yu, Dong, and Li, Deng. (2016). Automatic Speech Recognition. Springer London limited.

Khurana, D., Koli, A., Khatter, K. et al. (2023). Natural language processing: state of the art, current trends and challenges. Multimed Tools Appl 82, 3713–3744.