Статьи журнала

Современные методы распознавания речи

PDF файл (Скачать)

МРНТИ 28.23.37                                                                  №1 (2021г.) Оралбекова   Д.О., Мамырбаев О.Ж. В статье представлены основные идеи, преимущества и недостатки моделей, на основе скрытых марковских моделей (НММ) - смеси гауссовских распределений (GMM) и интегральных систем (end-to-end), а также указано, что интегральная модель является развивающим направлением в области распознавания речи. Рассмотрен аналитический обзор разновидностей интегральных систем автоматического распознавания речи, а именно модели, основанные на коннекционной временной классификации (CTC), на основе механизма внимания и условных случайных полей (CRF), и делаются теоретические сравнения. В конечном итоге указываются их соответствующие преимущества и недостатки и возможное будущее развитие этих систем. Ключевые слова: автоматическое распознавание речи, скрытые марковские модели, end-to-end; нейронные сети, CTC. Список литературы: 1 Казачкин А. Е. Методы распознавания речи, современные речевые технологии // Молодой ученый. — 2019. — №39. — С. 6-8. — URL https:// moluch.ru/archive/277/62675/ (дата обращения: 28.01.2020). [Kazachkin A.E. Metody` raspoznavaniya rechi, sovremenny`e rechevy`e tekhnologii// Molodoy uchyony`j.-2019.-N39.-S.6-8] 2 Ронжин А.Л., Карпов А.А., Ли И.В. Речевой и многомодальный интерфейсы // М.: Наука. 2006. -173 с.]. [Ronzhin A.L., Karpov A.A., Li I.V. Rechevoy i mnogomodal`ny`j interfeysy`// M.: Nauka, 2006.- 173s.] 3 Гусев М.Н, Дегтярев В.М. Система распознавания речи: основные модели и алгоритмы / СПб.: Знак, 2013. – 128 с. [Gusev M.N., Degtearyov V.M. Sistema raspoznavaniya rechi: osnovny`e modeli i algoritmy`/ SPb: Znak, 2013.-128s.] 4 Ibrahim M. El-Henawy, Walid I. Khedr, Osama M. ELkomy, Al-Zahraa M.I. Abdalla, Recognition of phonetic Arabic figures via wavelet based Mel Frequency Cepstrum using HMMs, HBRC Journal, Volume 10, Issue 1, 2014, Pages 49-54, ISSN 1687-4048 5 Воробьева С. А. Методы распознавания речи // Молодой ученый. — 2016. — №26. — С. 136-141. — URL https://moluch.ru/archive/130/36213/ (дата обращения: 28.01.2020. [Vorob`yova S.A. Metody` raspoznovaniya rechi// Molodoj uchyony`j .-2016.-N26.-S.136-141] 6 Sirko Molau, Michael Pitz, Ralf Schluter and Hermann Ney. (2001) “Computing Mel frequency Cepstral Coefficients on the power spectrum.” IEEE Transactions on Audio, Speech and Language Processing 7 Bezoui Mouaz, Beni Hssane Abderrahim, Elmoutaouakkil Abdelmajid, Speech Recognition of Moroccan Dialect Using Hidden Markov Models, Procedia Computer Science, Volume 151, 2019, Pages 985-991, ISSN 1877-0509 8 Rabiner L-R., Juang B-H., Fundamentals of Speech Recognition, Prentice-Hall, 1993. 9 Rao, K.; Sak, H.; Prabhavalkar, R. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer. In Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan, 16–20 December 2017; pp. 193–199.]. 10 Lu, L.; Zhang, X.; Cho, K.; Renals, S. A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition. In Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, Dresden, Germany, 6–10 September 2015; pp. 3249–3253. 11 Rahhal Errattahi, Asmaa El Hannani, Hassan Ouahmane, Automatic Speech Recognition Errors Detection and Correction: A Review, Procedia Computer Science, Volume 128, 2018, Pages 32-37, ISSN 1877-0509, https:// doi.org/10.1016/j.procs.2018.03.005 12 Ueno, Sei & Inaguma, Hirofumi & Mimura, Masato & Kawahara, Tatsuya. (2018). Acoustic-to-Word Attention-Based Model Complemented with CharacterLevel CTC-Based Model. 5804-5808. 10.1109/ICASSP.2018.8462576.]. 13 Prabhavalkar, R.; Rao, K.; Sainath, T.N.; Li, B.; Johnson, L.; Jaitly, N. A comparison of sequence-to-sequence models for speech recognition. In Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017; pp. 939–943. 14 Wang, Dong & Wang, Xiaodong & Lv, Shaohe. (2019). An Overview of End-to-End Automatic Speech Recognition. Symmetry. 11. 1018. 10.3390/ sym11081018. 15 Мамырбаев О., Шаяхметова А., Кыдырбекова А., Турдалыулы М. Интегральный подход распознавания речи для агглютинативных языков, АУЭС Вестник, № 1(48).- 2020, [Mamy`rbaev O., Shayakhmetova A., Kady`rbekova A., Turdaly`uly` M. Integral`ny`j podkhod raspoznavaniya rechi agglyutinativny`x yazy`kov, AUE`S Vesstnik, N1 (48).-2020] 16 Bahdanau, D.; Chorowski, J.; Serdyuk, D.; Brakel, P.; Bengio, Y. Endto-end attention-based large vocabulary speech recognition. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 4945–4949.] 17 J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in Proceedings of the International Conference on Machine Learning (ICML’01), Williamstown, MA, USA, Jun. 2001, pp. 282–289. 18 E. Fosler-Lussier, Y. He, P. Jyothi, and R. Prabhavalkar, “Conditional random fields in speech, audio, and language processing,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1054–1075, 2013. 19 Марковников Н.М., Кипяткова И.С. Аналитический обзор интегральных систем распознавания речи, Тр. СПИИРАН, 58 (2018), 77–110 [Markovnikov N.M., Kipyatkova I.S. Analiticheskij obzor integral`ny`kh system raspoznavaniya rechi, Tr.SPIIRAN, 58 (2018)] 20 Hifny Y., Renals S. Speech recognition using augmented conditional random fields // IEEE Transactions on Audio, Speech, and Language Processing. 2009. vol. 17. no. 2. pp. 354–365. 21 H. Tang et al., “End-to-End Neural Segmental Models for Speech Recognition,” in IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1254-1264, Dec. 2017. References 1 Kazachkin A. E. Metody` raspoznavaniya rechi, sovremenny`e rechevy`e tekhnologii // Molodoj ucheny`j. — 2019. — №39. — S. 6-8. — URL https:// moluch.ru/archive/277/62675/ (data obrashheniya: 28.01.2020). [Kazachkin A.E. Metody` raspoznavaniya rechi, sovremenny`e rechevy`e tekhnologii// Molodoy uchyony`j.-2019.-N39.-S.6-8] 2 Ronzhin A.L., Karpov A.A., Li I.V. Rechevoj i mnogomodal`ny`j interfejsy` // M.: Nauka. 2006. -173 s.]. [Ronzhin A.L., Karpov A.A., Li I.V. Rechevoy i mnogomodal`ny`j interfeysy`// M.: Nauka, 2006.- 173s.] 3 Gusev M.N, Degtyarev V.M. Sistema raspoznavaniya rechi: osnovny`e modeli i algoritmy` / SPb.: Znak, 2013. – 128 s. [Gusev M.N., Degtearyov V.M. Sistema raspoznavaniya rechi: osnovny`e modeli i algoritmy`/ SPb: Znak, 2013.-128s.] 4 Ibrahim M. El-Henawy, Walid I. Khedr, Osama M. ELkomy, Al-Zahraa M.I. Abdalla, Recognition of phonetic Arabic figures via wavelet based Mel Frequency Cepstrum using HMMs, HBRC Journal, Volume 10, Issue 1, 2014, Pages 49-54, ISSN 1687-4048 5 Vorob`eva S. A. Metody` raspoznavaniya rechi // Molodoj ucheny`j. — 2016. — №26. — S. 136-141. — URL https://moluch.ru/archive/130/36213/ (data obrashheniya: 28.01.2020. [Vorob`yova S.A. Metody` raspoznovaniya rechi// Molodoj uchyony`j .-2016.-N26.-S.136-141] 6 Sirko Molau, Michael Pitz, Ralf Schluter and Hermann Ney. (2001) “Computing Mel frequency Cepstral Coefficients on the power spectrum.” IEEE Transactions on Audio, Speech and Language Processing 7 Bezoui Mouaz, Beni Hssane Abderrahim, Elmoutaouakkil Abdelmajid, Speech Recognition of Moroccan Dialect Using Hidden Markov Models, Procedia Computer Science, Volume 151, 2019, Pages 985-991, ISSN 1877-0509 8 Rabiner L-R., Juang B-H., Fundamentals of Speech Recognition, Prentice-Hall, 1993. 9 Rao, K.; Sak, H.; Prabhavalkar, R. Exploring architectures, data and units for streaming end-to-end speech recognition with RNN-transducer. In Proceedings of the 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Okinawa, Japan, 16–20 December 2017; pp. 193–199.]. 10 Lu, L.; Zhang, X.; Cho, K.; Renals, S. A study of the recurrent neural network encoder-decoder for large vocabulary speech recognition. In Proceedings of the Sixteenth Annual Conference of the International Speech Communication Association, Dresden, Germany, 6–10 September 2015; pp. 3249–3253. 11 Rahhal Errattahi, Asmaa El Hannani, Hassan Ouahmane, Automatic Speech Recognition Errors Detection and Correction: A Review, Procedia Computer Science, Volume 128, 2018, Pages 32-37, ISSN 1877-0509, https:// doi.org/10.1016/j.procs.2018.03.005 12 Ueno, Sei & Inaguma, Hirofumi & Mimura, Masato & Kawahara, Tatsuya. (2018). Acoustic-to-Word Attention-Based Model Complemented with CharacterLevel CTC-Based Model. 5804-5808. 10.1109/ICASSP.2018.8462576.]. 13 Prabhavalkar, R.; Rao, K.; Sainath, T.N.; Li, B.; Johnson, L.; Jaitly, N. A comparison of sequence-to-sequence models for speech recognition. In Proceedings of the Interspeech, Stockholm, Sweden, 20–24 August 2017; pp. 939–943. 14 Wang, Dong & Wang, Xiaodong & Lv, Shaohe. (2019). An Overview of End-to-End Automatic Speech Recognition. Symmetry. 11. 1018. 10.3390/ sym11081018. 15 Mamy`rbaev O., Shayakhmetova A., Ky`dy`rbekova A., Turdaly`uly` M. Integral`ny`j podkhod raspoznavaniya rechi dlya agglyutinativny`kh yazy`kov, AUE`S Vestnik, № 1(48).- 2020, [Mamy`rbaev O., Shayakhmetova A., Kady`rbekova A., Turdaly`uly` M. Integral`ny`j podkhod raspoznavaniya rechi agglyutinativny`x yazy`kov, AUE`S Vesstnik, N1 (48).-2020] 16 Bahdanau, D.; Chorowski, J.; Serdyuk, D.; Brakel, P.; Bengio, Y. Endto-end attention-based large vocabulary speech recognition. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016; pp. 4945–4949.] 17 J. Lafferty, A. McCallum, and F. Pereira, “Conditional random fields: Probabilistic models for segmenting and labeling sequence data,” in Proceedings of the International Conference on Machine Learning (ICML’01), Williamstown, MA, USA, Jun. 2001, pp. 282–289. 18 E. Fosler-Lussier, Y. He, P. Jyothi, and R. Prabhavalkar, “Conditional random fields in speech, audio, and language processing,” Proceedings of the IEEE, vol. 101, no. 5, pp. 1054–1075, 2013. 19 Markovnikov N.M., Kipyatkova I.S. Analiticheskij obzor integral`ny`kh sistem raspoznavaniya rechi, Tr. SPIIRAN, 58 (2018), 77–110 [Markovnikov N.M., Kipyatkova I.S. Analiticheskij obzor integral`ny`kh system raspoznavaniya rechi, Tr.SPIIRAN, 58 (2018)] 20 Hifny Y., Renals S. Speech recognition using augmented conditional random fields // IEEE Transactions on Audio, Speech, and Language Processing. 2009. vol. 17. no. 2. pp. 354–365. 21 H. Tang et al., “End-to-End Neural Segmental Models for Speech Recognition,” in IEEE Journal of Selected Topics in Signal Processing, vol. 11, no. 8, pp. 1254-1264, Dec. 2017.
Создатель : Лариса Николаевна Гребцова
Создан : 22 августа 2024

Быстрый поиск

Подписка на журнал Министерство науки и высшего образование РК Шаблон статьи
(SCOPUS,WEB of SCIENCE)
Шаблон статьи
(Новости наука Казахстана)

Activity Logs

There are 2 new tasks for you in “AirPlus Mobile App” project:
Added at 4:23 PM by
img
Meeting with customer
Application Design
img
img
A
In Progress
View
Project Delivery Preparation
CRM System Development
img
B
Completed
View
Invitation for crafting engaging designs that speak human workshop
Sent at 4:23 PM by
img
Task #45890merged with #45890in “Ads Pro Admin Dashboard project:
Initiated at 4:23 PM by
img
3 new application design concepts added:
Created at 4:23 PM by
img
New case #67890is assigned to you in Multi-platform Database Design project
Added at 4:23 PM by
Alice Tan
You have received a new order:
Placed at 5:05 AM by
img

Database Backup Process Completed!

Login into Admin Dashboard to make sure the data integrity is OK
Proceed
New order #67890is placed for Workshow Planning & Budget Estimation
Placed at 4:23 PM by
Jimmy Bold
Pic
Brian Cox 2 mins
How likely are you to recommend our company to your friends and family ?
5 mins You
Pic
Hey there, we’re just writing to let you know that you’ve been subscribed to a repository on GitHub.
Pic
Brian Cox 1 Hour
Ok, Understood!
2 Hours You
Pic
You’ll receive notifications for all issues, pull requests!
Pic
Brian Cox 3 Hours
You can unwatch this repository immediately by clicking here: Keenthemes.com
4 Hours You
Pic
Most purchased Business courses during this sale!
Pic
Brian Cox 5 Hours
Company BBQ to celebrate the last quater achievements and goals. Food and drinks provided
Just now You
Pic
Pic
Brian Cox Just now
Right before vacation season we have the next Big Deal for you.

Shopping Cart

Iblender The best kitchen gadget in 2022
$ 350 for 5
SmartCleaner Smart tool for cooking
$ 650 for 4
CameraMaxr Professional camera for edge
$ 150 for 3
$D Printer Manfactoring unique objekts
$ 1450 for 7
MotionWire Perfect animation tool
$ 650 for 7
Samsung Profile info,Timeline etc
$ 720 for 6
$D Printer Manfactoring unique objekts
$ 430 for 8