ГЛАСОВО РАЗПОЗНАВАНЕ - ИСТОРИЧЕСКО РАЗВИТИЕ И ОСНОВНИ ТЕХНИКИ

Автори

  • Hasan Hasanov
  • Penka V Georgieva

Ключови думи :

гласово разпознаване, обработка на естествени езици, изкуствен интелект

Абстракт

 Обработването на естествени езици е една от основните области на съвременния изкуствен интелект. Гласовото разпознаване е елемент на обработката на естествени езици, при който изречените думи се преобразуват в писмен текст с помощта на различни техники. Разпознаването на глас e област, в която изследователите се изправят пред множество предизвикателства от разнообразен характер. В тази студия е направен обзор на историческото развитие на гласовото разпознаване, посочени са видовете гласово разпознаване и са представени основните техники,
използвани в тази област.

Сваляния

Данните за свалянията все още не са налични.

Литература (библиография)

[1] Davis K., R. Biddulph, S. Balashek, "Automatic Recognition of Spoken Digits," The Journal of the Acoustical Society of America, vol. 24, no. 6, pp. 627-642, 1952.
[2] Olson H., H. Belar, “Phonetic Typewriter,” The Journal of the Acoustical Society of America, vol. 28, no. 6, pp. 1072-1081, 1956.
[3] Forgie J., C. Forgie, “Results Obtained from a Vowel Recognition Computer,” The Journal of the Acoustical Society of America, vol. 31, no. 11, pp. 1480-1489, 1959.
[4] Suzuki J., K. Nakata , “Recognition of Japanese Vowels—Preliminary to the Recognition of Speech,” J. Radio Res. Lab, vol. 37, no. 8, pp. 193-212, 1961.
[5] Sakai T., S. Doshita, “The Phonetic Typewriter,” The Journal of the Acoustical Society of America, vol. 33, no. 11, 1961.
[6] Nagata K., Y. Kato, S. Chiba, „Spoken Digit Recognizer for Japanese Language,“ NEC Res. Develop, № 6, 1963.
[7] Denes P., “The Design and Operation of the Mechanical Speech Recognizer at University College London,” British Institution of Radio Engineers, vol. 19, no. 4, pp. 211-229, 1959.
[8] Martin Т., А. Nelson, Х. Zadell, “Speech Recognition by Feature Abstraction,” Tech. Report AL-TDR-64-176, Air Force Avionics Lab, 1964.
[9] Vintsyuk Т., “Speech Discrimination by Dynamic Programming,” Kibernetika, vol. 4, no. 2, pp. 81-88, 1968.
[10] Sakoe H., S. Chiba, “Dynamic Programming Algorithm Quantization for Spoken Word,” Speech and Signal Proc., vol. 26, no. 1, pp. 43-49, 1978.
[11] Viterbi A., “Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm,” IEEE Trans. Informaiton Theory, vol. 13, pp. 260-269, 1967.
[12] Atal B., S. Hanauer, “Speech Analysis and Synthesis by Linear Prediction of the Speech Wave,” J. Acoust. Soc. Am., vol. 50, no. 2, pp. 637-655, 1971.
[13] Itakura F., S. Saito, “A Statistical Method for Estimation of Speech Spectral Density and Formant Frequencies,” Electronics and Communications in Japan, vol. 53, pp. 36-43, 1970.
[14] Itajura F., “Minimum Prediction Residual Principle Applied to Speech Recognition,” IEEE Trans. Acoustics, Speech and Signal Proc, vol. 23, pp. 57-72, 1975.
[15] Rabiner L., S. Levinson, A. Rosenberg, J. Wilpon, “Speaker Independent Recognition of Isolated Words Using Clustering Techniques,” IEEE Trans. Acoustics, Speech and Signal Proc., vol. 27, pp. 336-349, 1979.
[16] Lowerre B., “The HARPY Speech Understanding System,” Trends in Speech Recognition, Speech Science Publications, 1986, reprinted in Readings in Speech
Recognition, pp. 576-586, 1990.
[17] Mohri M., “Finite-State Transducers in Language and Speech Processing,” Computational Linguistics, vol. 23, no. 2, pp. 269-312, 1997.
[18] Klatt D., “Review of the DARPA Speech Understanding Project (1),” J. Acoust. Soc. Am., vol. 62, pp. 1345-1366, 1977.
[19] Jelinek F., R. Bahl, R. Mercer, “Design of a Linguistic Statistical Decoder for the Recognition of Continuous Speech,” IEEE Trans. On Information Theory, vol. 21, pp. 250-256, 1975.
[20] Shannon C., “A Mathematical Theory of Communication,” Bell System Technical Journal, vol. 27, pp. 379-423, 623-656, 1948.
[21] Juang B., S. Levinson, M. Sondhi, “Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains,” IEEE Trans. Information Theory, vol. 32, no. 2, pp. 307-309, 1986.
[22] Juang B., “Maximum Likelihood Estimation for Mixture Multivariate Stochastic Observations of Markov Chains,” AT&T Tech. J, vol. 64, no. 6, pp. 1235-1249, 1985.
[23] Lee C., L. Rabiner, R. Pieraccini, J. Wilpon, “Acoustic modeling for large vocabulary speech recognition,” Computer Speech & Language, pp. 1237-1265, 1990.
[24] Wilpon J., L. Rabiner, C. Lee , E. Goldman, “Automatic Recognition of Keywords in Unconstrained Speech Using Hidden Markov Models,” IEEE Trans. on Acoustics, Speech and Signal Processing, vol. 38, no. 11, 1990.
[25] Ferguson J., “Hidden Markov Analysis: An Introduction in Hidden Markov Models for Speech,” Institute for Defense Analyses, Princeton, 1980.
[26] Levinson S., L. Rabiner, M. Sondhi, “An Introduction to the Application of the Theory of Probabilistic Functions of a Markov Process to Automatic Speech Recognition,” Bell Syst. Tech. J., vol. 62, no. 4, pp. 1035-1074, 1983.
[27] Rabiner L., B. Juang, “Statistical Methods for the Recognition and Understanding of Speech,” Encyclopedia of Language and Linguistics, 2004.
[28] Baum L., “An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes,” Inequalities, vol. 3, pp. 1-8, 1972.
[29] Theodoridis S., K. Koutroumbas, “Pattern Recognition: Second Edition,” Elsevier Academic Press, 2003.
[30] Baum L., “An Inequality and Associated Maximization Technique in Statistical Estimation for Probabilistic Functions of Markov Processes,” Inequalities, vol. 3, pp. 1-8, 1972.
[31] Poritz A., “Linear Predictive Hidden Markov Models and the Speech Signal,” in Proc. ICASSP-82, Paris, 1982.
[32] Liporace L., “Maximum Likelihood Estimation for Multivariate Observations of Markov Sources,” IEEE Trans. on Information Theory, vol. 28, no. 5, pp. 729-734, 1982.
[33] Juang B. , S. Levinson, M. Sondhi, “Maximum Likelihood Estimation for Multivariate Mixture Observations of Markov Chains,” IEEE Trans. Information Theory, vol. 32, no. 2, pp. 307-309, 1986.
[34] Juang B., “Maximum Likelihood Estimation for Mixture Multivariate Stochastic Observations of Markov Chains,” AT&T Tech. J., vol. 64, no. 6, pp. 1235-1249, 1985.
[35] Mohri M., “Finite-State Transducers in Language and Speech Processing,” Computational Linguistics, vol. 23, no. 2, pp. 269-312, 1992.
[36] McCullough W., W. Pitts, “A Logical Calculus of Ideas Immanent in Nervous Activity,” Bull. Math Biophysics, vol. 5, pp. 115-133, 1943.
[37] Lippmann R., Review of Neural Networks for Speech Recognition, Readings in Speech Recognition, 1990.
[38] Juang B., C. Lee, W. Chou, “Minimum Classification Error Rate Methods for Speech Recognition,” IEEE Trans. Speech & Audio Processing, T-SA, vol. 5, no. 3, pp. 257-265, 1997.
[39] Vapnik V., Statistical Learning Theory, John Wiley and Sons, 1998.
[40] Lee K., Large-vocabulary Speaker-independent Continuous Speech Recognition: The Sphinx System, Ph.D. Thesis, Carnegie Mellon University, 1988.
[41] Schwartz R., C. Barry , Y. Chow, etc., „The BBN BYBLOS Continuous Speech Recognition System,“ in Proc. of the Speech and Natural Language Workshop, Philadelphia, 1989.
[42] Murveit H., M. Cohen , P. Price , etc., „SRI's DECIPHER System,“ in proceedings of the Speech and Natural Language Workshop, 1989, Philadelphia.
[43] Young S., „the HTKBook,“ http://htk.eng.cam.ac.uk/.
[44] Glass J., E. Weinstein, „SpeechBuilder: Facilitating Spoken Dialogue System
Development,“ 7th European Conf. on Speech Communication and Technology, Aalborg Denmark, 2001.
[45] Zue V., “Jupiter: A Telephone-Based Conversational Interface for Weather Information,” IEEE Trans. On Speech and Audio Processing, vol. X, pp. 100-112, 2000.
[46] Gorin A., B. Parker, R. Sachs, J. Wilpon, “How May I Help You?,” 1996.
[47] Kishorjit N., R. Vidya, Y. Nirmal, B. Sivaji, “Manipuri Morpheme Identification,” in
Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing, Mumbai, 2012.
[48] Huang X., A. Acero, H. Hon, Spoken Language processing – A Guide to Theory, Algorithms and System Development, Prentice Hall PTR, 2001, pp. 375-407.
[49] Потемкин В., В. Медведев, Нейронные сети. MATLAB 6, Диалог-МИФИ, 2002.
[50] Младенов В., С. Йорданова, Размито управление и невронни мрежи, София, 2006.
[51] Тренчев И., П. Миланов, Н. Пенчева, И. Мирчев, Невронни мрежи, Благоевград: ЮЗУ "Неофит Рилски", 2010.
[52] Георгиева П., Генетични размити ситеми, Бургас: Полиграф, 2016.

Публикуван

2018-05-18

Брой

Раздел (Секция)

Компютърни науки и комуникации - рецензирани публикации. ISSN: 1314-7846

Как да цитирате

ГЛАСОВО РАЗПОЗНАВАНЕ - ИСТОРИЧЕСКО РАЗВИТИЕ И ОСНОВНИ ТЕХНИКИ. (2018). КОМПЮТЪРНИ НАУКИ И КОМУНИКАЦИИ, 6(1), 20-55. https://csc.bfu.bg/index.php/CSC/article/view/38