The recognition of speech defects using convolutional neural network

Olha Pronina; Olena Piatykop

doi:10.55056/cte.554

Authors

Olha Pronina Pryazovskyi State Technical University https://orcid.org/0000-0001-7085-8027
Olena Piatykop Pryazovskyi State Technical University https://orcid.org/0000-0002-7731-3051

DOI:

https://doi.org/10.55056/cte.554

Keywords:

speech defects, smart data processing, CNN, model of a convolutional neural network, Deep Learning

Abstract

The paper proposes a solution to improve the efficiency of recognition of speech defects in children by processing the sound data of the spectrogram based on convolutional neural network models. For a successful existence in society, a person needs the most important skill - the ability to communicate with other people. The main part of the information a person transmits through speech. The normal development of children necessarily includes the mastery of coherent speech. Speech is not an innate skill for people, and children learn it on their own. Speech defects can cause the development of complexes in a child. Therefore, it is very important to eliminate them at an early age. So, the problem of determining speech defects in children today is a very urgent problem for parents, speech therapists and psychologists. Modern information technologies can help in solving this problem. The paper provides an analysis of the literature, which showed that models of CNN can be successfully used for this. But the results that are available today have not been applied to speech in Ukrainian. Therefore, it is important to develop and study models and methods of convolutional neural networks to identify violations in the speech of children. The paper describes a mathematical model of oral speech disorders in children, the structure of a convolutional neural network and the results of experiments. The results obtained in the work allow to establish one of the speech defects: dyslexia, stuttering, difsonia or dyslalia with recognition results of 77-79%.

Downloads

Download data is not yet available.

Abstract views: 1342 / PDF views: 455

References

Alam, M., Samad, M., Vidyaratne, L., Glandon, A. and Iftekharuddin, K., 2020. Survey on Deep Neural Networks in Speech and Vision Systems. Neurocomputing, 417, pp.302–321. Available from: https://doi.org/10.1016/j.neucom.2020.07.053. DOI: https://doi.org/10.1016/j.neucom.2020.07.053

Bahuleyan, H., 2018. Music Genre Classification using Machine Learning Techniques. Available from: https://doi.org/10.48550/ARXIV.1804.01149.

Boryak, O.V., 2016. The specifity of a cognitive component in the speech activity at mental retardation. Aktualni pytannia korektsiinoi osvity. Pedahohichni nauky, 7(1), pp.38–49. Available from: http://nbuv.gov.ua/UJRN/apko_2016_7%281%29__6.

Chlasta, K., Wołk, K. and Krejtz, I., 2019. Automated speech-based screening of depression using deep convolutional neural networks. Procedia Computer Science, 164, pp.618–628. CENTERIS 2019 - International Conference on ENTERprise Information Systems / Proj-MAN 2019 - International Conference on Project MANagement / HCist 2019 - International Conference on Health and Social Care Information Systems and Technologies, CENTERIS/ProjMAN/HCist 2019. Available from: https://doi.org/10.1016/j.procs.2019.12.228. DOI: https://doi.org/10.1016/j.procs.2019.12.228

Dawodi, M., Baktash, J.A., Wada, T., Alam, N. and Joya, M.Z., 2020. Dari Speech Classification Using Deep Convolutional Neural Network. 2020 IEEE International IOT, Electronics and Mechatronics Conference (IEMTRONICS). pp.1–4. Available from: https://doi.org/10.1109/IEMTRONICS51293.2020.9216370. DOI: https://doi.org/10.1109/IEMTRONICS51293.2020.9216370

Kolomoiets, T.H. and Kassim, D.A., 2018. Using the Augmented Reality to Teach of Global Reading of Preschoolers with Autism Spectrum Disorders. In: A.E. Kiv and V.N. Soloviev, eds. Proceedings of the 1st International Workshop on Augmented Reality in Education, Kryvyi Rih, Ukraine, October 2, 2018. CEUR-WS.org, CEUR Workshop Proceedings, vol. 2257, pp.237–246. Available from: http://ceur-ws.org/Vol-2257/paper24.pdf.

Kondratenko, Y., Atamanyuk, I., Sidenko, I., Kondratenko, G. and Sichevskyi, S., 2022. Machine Learning Techniques for Increasing Efficiency of the Robot’s Sensor and Control Information Processing. Sensors, 22(3), p.1062. Available from: https://doi.org/10.3390/s22031062. DOI: https://doi.org/10.3390/s22031062

Kourkounakis, T., Hajavi, A. and Etemad, A., 2020. Detecting Multiple Speech Disfluencies Using a Deep Residual Network with Bidirectional Long Short-Term Memory. ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pp.6089–6093. Available from: https://doi.org/10.1109/ICASSP40776.2020.9053893. DOI: https://doi.org/10.1109/ICASSP40776.2020.9053893

Kourkounakis, T., Hajavi, A. and Etemad, A., 2021. FluentNet: End-to-End Detection of Stuttered Speech Disfluencies With Deep Learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29, pp.2986–2999. Available from: https://doi.org/10.1109/TASLP.2021.3110146. DOI: https://doi.org/10.1109/TASLP.2021.3110146

Latif, S., Rana, R., Khalifa, S., Jurdak, R., Qadir, J. and Schuller, B.W., 2021. Survey of Deep Representation Learning for Speech Emotion Recognition. IEEE Transactions on Affective Computing, pp.1–1. Available from: https://doi.org/10.1109/TAFFC.2021.3114365. DOI: https://doi.org/10.1109/TAFFC.2021.3114365

Martynenko, I.V., 2017. Psychological principals for communication activity development in senior preschool age children with system speech disorders. The thesis for obtaining the Scientific Degree of the Doctor of Psychological Sciences in speciality 19.00.08. – Special Psychology. M. P. Dragomanov National Pedagogical University, Kyiv. Available from: https://npu.edu.ua/images/file/vidil_aspirant/dicer/%D0%94_26.053.23/Martynenko.pdf.

Medhat, F., Chesmore, D. and Robinson, J., 2017. Music Genre Classification Using Masked Conditional Neural Networks. In: D. Liu, S. Xie, Y. Li, D. Zhao and E.M. El-Alfy, eds. Neural Information Processing - 24th International Conference, ICONIP 2017, Guangzhou, China, November 14-18, 2017, Proceedings, Part II. Springer, Lecture Notes in Computer Science, vol. 10635, pp.470–481. Available from: https://doi.org/10.1007/978-3-319-70096-0_49. DOI: https://doi.org/10.1007/978-3-319-70096-0_49

Purwins, H., Li, B., Virtanen, T., Schlüter, J., Chang, S.Y. and Sainath, T., 2019. Deep Learning for Audio Signal Processing. IEEE Journal of Selected Topics in Signal Processing, 13(2), pp.206–219. Available from: https://doi.org/10.1109/JSTSP.2019.2908700. DOI: https://doi.org/10.1109/JSTSP.2019.2908700

Semerikov, S., Teplytskyi, I.O., Yechkalo, Y.V., Markova, O.M., Soloviev, V.N. and Kiv, A., 2019. Computer Simulation of Neural Networks Using Spreadsheets: Dr. Anderson, Welcome Back. In: V. Ermolayev, F. Mallet, V. Yakovyna, V.S. Kharchenko, V. Kobets, A. Kornilowicz, H. Kravtsov, M.S. Nikitchenko, S. Semerikov and A. Spivakovsky, eds. Proceedings of the 15th International Conference on ICT in Education, Research and Industrial Applications. Integration, Harmonization and Knowledge Transfer. Volume II: Workshops, Kherson, Ukraine, June 12-15, 2019. CEUR-WS.org, CEUR Workshop Proceedings, vol. 2393, pp.833–848. Available from: http://ceur-ws.org/Vol-2393/paper_348.pdf.

Sheikh, S.A., Sahidullah, M., Hirsch, F. and Ouni, S., 2022. Machine learning for stuttering identification: Review, challenges and future directions. Neurocomputing, 514, p.385–402. Available from: https://doi.org/10.1016/j.neucom.2022.10.015. DOI: https://doi.org/10.1016/j.neucom.2022.10.015

Sokoliuk, A., Kondratenko, G., Sidenko, I., Kondratenko, Y., Khomchenko, A. and Atamanyuk, I., 2020. Machine Learning Algorithms for Binary Classification of Liver Disease. 2020 IEEE International Conference on Problems of Infocommunications. Science and Technology (PIC S&T). pp.417–421. Available from: https://doi.org/10.1109/PICST51311.2020.9468051. DOI: https://doi.org/10.1109/PICST51311.2020.9468051

Wang, D. and Chen, J., 2018. Supervised Speech Separation Based on Deep Learning: An Overview. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(10), p.1702–1726. Available from: https://doi.org/10.1109/taslp.2018.2842159. DOI: https://doi.org/10.1109/TASLP.2018.2842159