(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Technology and Engineering Exploration (IJATEE)

ISSN (Print):2394-5443    ISSN (Online):2394-7454
Volume-7 Issue-62 January-2020
Full-Text PDF
Paper Title : Text independent voiceprint recognition model based on I-vector
Author Name : Jing Zhang and Minfeng Yao
Abstract :

The commonly used text independent Voiceprint recognition models are Gaussian Mixture Model (GMM) and GMM and general background model (GMM-UBM). In the equalization vector of the GMM model, both the speaker information and the channel information are included, which results in unstable performance of the recognition system of the GMM and GMM-UBM models. In addition, the recognition ability for cross channel is poor, moreover, both models are limited by the maximum likelihood criterion. So, they employ weak ability to distinguish categories. I-vector is also known as identity authentication vector and has been proposed on the basis of Gaussian super vector in recent years. The method uses one space instead of the two spaces, including the difference between the speakers and the difference between the channels, and it is known as the most cutting-edge speaker modeling technology available today. Therefore, this paper adopted i-vector framework as the speaker recognition model, and studied the main problems that need to be dealt with. The recognition effect of GMM-UBM model and i-vector model were investigated by experiment as well. Through comparison experiments, it is verified that the i-vector recognition model employs a lower error rate of the and is more efficient. In the recognition phase, to quickly recognize the speaker's identity only needs to record two seconds of speech, and the system recognition accuracy reaches 97%.

Keywords : Speaker recognition, Text-independent, I-vector, EER.
Cite this article : Zhang J, Yao M. Text independent voiceprint recognition model based on I-vector. International Journal of Advanced Technology and Engineering Exploration. 2020; 7(62):1-10. DOI:10.19101/IJATEE.2019.650076 .
References :
[1]Zhaohui W, Yingchun Y. Speaker recognition model and method. Beijing: Tsinghua University Press, 2009, pp.14-7.
[2]Rao RR, Prasad A, Rao CK. Robust features for automatic text-independent speaker recognition using Gaussian mixture model. International Journal of Soft Computing and Engineering. 2011; 1(5):330-5.
[Google Scholar]
[3]Drgas S, Virtanen T. Speaker verification using adaptive dictionaries in non-negative spectrogram deconvolution. In international conference on latent variable analysis and signal separation 2015 (pp. 462-9). Springer, Cham.
[Crossref] [Google Scholar]
[4]Swietojanski P, Ghoshal A, Renals S. Convolutional neural networks for distant speech recognition. IEEE Signal Processing Letters. 2014; 21(9):1120-4.
[Crossref] [Google Scholar]
[5]Li Z, HE L, Zhang W, Liu J. Speaker recognition based on discriminant i-vector local distance preserving projection [J]. Journal of Tsinghua University (Science and Technology). 2012.
[Google Scholar]
[6]You CH, Li H, Ma B, Lee KA. A study on GMM-SVM with adaptive relevance factor and its comparison with i-vector and JFA for speaker recognition. In international conference on acoustics, speech and signal processing 2013 (pp. 7683-7). IEEE.
[Crossref] [Google Scholar]
[7]Gupta V, Kenny P, Ouellet P, Stafylakis T. I-vector-based speaker adaptation of deep neural networks for French broadcast audio transcription. In international conference on acoustics, speech and signal processing 2014 (pp. 6334-8). IEEE.
[Crossref] [Google Scholar]
[8]Cumani S, Laface P. Scoring heterogeneous speaker vectors using nonlinear transformations and tied PLDA models. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2018; 26(5):995-1009.
[Crossref] [Google Scholar]
[9]Kanagasundaram A, Dean D, Sridharan S, Gonzalez-Dominguez J, Gonzalez-Rodriguez J, Ramos D. Improving short utterance i-vector speaker verification using utterance variance modelling and compensation techniques. Speech Communication. 2014; 59:69-82.
[Crossref] [Google Scholar]
[10]Lu X, Shen P, Tsao Y, Kawai H. Regularization of neural network model with distance metric learning for i-vector based spoken language identification. Computer Speech & Language. 2017; 44:48-60.
[Crossref] [Google Scholar]
[11]Wang W, Xu J, Yan Y. Identity vector extraction using shared mixture of PLDA for short-time speaker recognition. Chinese Journal of Electronics. 2019; 28(2):357-63.
[Crossref] [Google Scholar]
[12]Ahmed AI, Chiverton J, Ndzi D, Becerra V. Channel variability synthesis in i-vector speaker recognition. IET international conference on intelligent signal processing 2017.
[Crossref] [Google Scholar]
[13]Nayana PK, Mathew D, Thomas A. Performance comparison of speaker recognition systems using GMM and i-vector methods with PNCC and RASTA PLP features. In international conference on intelligent computing, instrumentation and control technologies 2017 (pp. 438-43). IEEE.
[Crossref] [Google Scholar]
[14]Joy NM, Kothinti SR, Umesh S. FMLLR speaker normalization with i-vector: In pseudo-FMLLR and distillation framework. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2018; 26(4):797-805.
[Crossref] [Google Scholar]
[15]Xu L, Lee KA, Li H, Yang Z. Generalizing i-vector estimation for rapid speaker recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2018; 26(4):749-59.
[Crossref] [Google Scholar]
[16]Al-Kaltakchi MT, Woo WL, Dlay SS, Chambers JA. Speaker identification evaluation based on the speech biometric and i-vector model using the timit and ntimit databases. In international workshop on biometrics and forensics 2017 (pp. 1-6). IEEE.
[Crossref] [Google Scholar]
[17]Kanagasundaram A, Dean D, Sridharan S, Ghaemmaghami H, Fookes C. A study on the effects of using short utterance length development data in the design of GPLDA speaker verification systems. International Journal of Speech Technology. 2017; 20(2):247-59.
[Crossref] [Google Scholar]
[18]Lizhe T, Dawei F, Dongsheng L, Rongchun L, Feng L. Analysis of large-scale distributed machine learning systems: a case study on LDA. Journal of Computer Applications. 2017; 37(3): 628-34.
[19]Lei L, Kun S. Speaker recognition using wavelet packet entropy, I-Vector, and Cosine Distance Scoring. Journal of Electrical and Computer Engineering. 2017.
[Crossref] [Google Scholar]