(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Technology and Engineering Exploration (IJATEE)

ISSN (Print):2394-5443    ISSN (Online):2394-7454
Volume-5 Issue-46 September-2018
Full-Text PDF
DOI:10.19101/IJATEE.2018.546018
Paper Title : PSSM amino-acid composition based rules for gene identification
Author Name : Heena Farooq Bhat and M. Arif Wani
Abstract :

One of the major aspects in recognizing the molecular mechanism of the cell is to understand the significance or function of each protein encoded in the genome. For that purpose, genome annotation proves to be very supportive. One of the most obligatory phases of genome annotation is the prediction of the genes. Several methods or techniques have been developed in order to locate or predict the patterns of genes in genome sequence. However, still, the recognition of genes is found to be very complicated problem. Recognizing the corresponding gene of a given protein sequence by means of conventional tools is error prone. Hence, the recognition of genes is a very demanding task. In this paper, we first concentrate on the problem of gene prediction and its challenges. We then present a new method for identifying genes. This new method follows a two-step procedure. First, we present new features extracted from protein sequences and these features are derived from a position specific scoring matrix (PSSM). The PSSM profiles are converted into uniform numeric representation. Then, a new structured approach has been applied on PSSM vector which uses a decision tree based technique for obtaining rules. The rules derived from an algorithm correspond to genes. This new method has been demonstrated on genome DNAset dataset. It is observed that the experimental results of new approach produces better results.

Keywords : Gene prediction, Classification, Feature extraction, Binding proteins, Rule induction, PSSM.
Cite this article : Heena Farooq Bhat and M. Arif Wani , " PSSM amino-acid composition based rules for gene identification " , International Journal of Advanced Technology and Engineering Exploration (IJATEE), Volume-5, Issue-46, September-2018 ,pp.318-325.DOI:10.19101/IJATEE.2018.546018
References :
[1]Wani MA. Incremental hybrid approach for microarray classification. In international conference on machine learning and applications 2008 (pp. 514-20). IEEE.
[Crossref] [Google Scholar]
[2]Wani MA. Microarray classification using sub-space grids. In machine learning and applications and workshops 2011 (pp. 389-94). IEEE.
[Crossref] [Google Scholar]
[3]Wani MA. Introducing subspace grids to recognise patterns in multidimensinal data. In international conference on machine learning and applications 2012 (pp. 33-9). IEEE.
[Crossref] [Google Scholar]
[4]Wani MA, Yesilbudak M. Recognition of wind speed patterns using multi-scale subspace grids with decision trees. International Journal of Renewable Energy Research. 2013; 3(2):458-62.
[Google Scholar]
[5]Wani MA. SAFARI: a structured approach for automatic rule. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics). 2001; 31(4):650-7.
[Crossref] [Google Scholar]
[6]Goel N, Singh S, Aseri TC. A comparative analysis of soft computing techniques for gene prediction. Analytical Biochemistry. 2013; 438(1):14-21.
[Crossref] [Google Scholar]
[7]Bhat HF, Wani MA. Modified one-against-all algorithm based on support vector machine. International Journal of Advanced Research in Computer Science and Software Engineering. 2013.
[Google Scholar]
[8]Bhat HF, Wani MA. A comparative study of five main support vector machine based multiclass classification algorithms. International Journal of Advance Foundation and Research in Science & Engineering. 2014; 1(2):1-6.
[Google Scholar]
[9]Wani MA. Hybrid method for fast SVM training in applications involving large volumes of data. In international conference on machine learning and applications 2013 (pp. 491-4). IEEE.
[Crossref] [Google Scholar]
[10]Wani MA, Bhat HF. Multiclass SVM algorithms for wind speed prediction. In international conference on renewable energy research and applications 2017 (pp. 1139-43). IEEE.
[Crossref] [Google Scholar]
[11]Khan AI, Wani MA. Efficient and rotation invariant fingerprint matching algorithm using adjustment factor. In international conference on machine learning and applications 2015 (pp. 1103-10). IEEE.
[Crossref] [Google Scholar]
[12]Bhat FA, Wani MA. Performance comparison of major classical face recognition techniques. In international conference on machine learning and applications 2014 (pp. 521-8). IEEE.
[Crossref] [Google Scholar]
[13]Mujtaba T, Wani MA. Daily global horizontal solar radiation forecasting using extreme learning machines. International conference on computing for sustainable global development (pp. 7290-5). IEEE.
[14]Bhat FA, Wani MA. Dropout Technique Based Convolutional Neural Networks Model for Face Recognition. Artificial Intelligent Systems and Machine Learning. 2017; 9(9):202-9.
[Google Scholar]
[15]Bhat MR, Wani MA. Mixture weighted latent dirichlet allocation, an optimized and generalized probabilistic model for large corpus of data. Artificial Intelligent Systems and Machine Learning. 2018; 10(1):8-17.
[Google Scholar]
[16]Mathe C, Sagot MF, Schiex T, Rouze P. Current methods of gene prediction, their strengths and weaknesses. Nucleic Acids Research. 2002; 30(19):4103-17.
[Crossref] [Google Scholar]
[17]Xu Y, Mural RJ, Einstein JR, Shah MB, Uberbacher EC. GRAIL: a multi-agent neural network system for gene identification. Proceedings of the IEEE. 1996; 84(10):1544-52.
[Crossref] [Google Scholar]
[18]Krogh A. Using database matches with HMMGene for automated gene detection in Drosophila. Genome Research. 2000; 10:523-8.
[Crossref] [Google Scholar]
[19]Burge C, Karlin S. Prediction of complete gene structures in human genomic DNA1. Journal of Molecular Biology. 1997; 268(1):78-94.
[Crossref] [Google Scholar]
[20]http://genes.mit.edu/GENSCAN.html. Accessed 15 May 2018.
[21]Yeh RF, Lim LP, Burge CB. Computational inference of homologous gene structures in the human genome. Genome Research. 2001; 11:803-16.
[Crossref] [Google Scholar]
[22]Riyaz R, Wani MA. Local and global data spread based index for determining number of clusters in a dataset. In 15th IEEE international conference on machine learning and applications (ICMLA) 2016 (pp. 651-6). IEEE.
[Crossref] [Google Scholar]
[23]Klasberg S, Bitard-Feildel T, Mallet L. Computational identification of novel genes: current and future perspectives. Bioinformatics and Biology Insights. 2016; 10:121-31.
[Crossref] [Google Scholar]
[24]Goel N, Singh S, Aseri TC. A review of soft computing techniques for gene prediction. ISRN Genomics. 2013:1-8.
[Crossref] [Google Scholar]
[25]Sleator RD. An overview of the current status of eukaryote gene prediction strategies. Gene. 2010; 461(1-2):1-4.
[Crossref] [Google Scholar]
[26]Yandell M, Ence D. A beginners guide to eukaryotic genome annotation. Nature Reviews Genetics. 2012; 13(5):329-42.
[Google Scholar]
[27]Guigo R, Knudsen S, Drake N, Smith T. Prediction of gene structure. Journal of Molecular Biology. 1992; 226(1):141-57.
[Crossref] [Google Scholar]
[28]Salamov AA, Solovyev VV. Ab initio gene finding in Drosophila genomic DNA. Genome Research. 2000; 10:516-22.
[Crossref] [Google Scholar]
[29]Stanke M, Steinkamp R, Waack S, Morgenstern B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic Acids Research. 2004; 32(suppl_ 2):309-12.
[Crossref] [Google Scholar]
[30]Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research. 1997; 25(17):3389-402.
[Crossref] [Google Scholar]
[31]Altschul SF, Koonin EV. Iterated profile searches with PSI-BLAST-a tool for discovery in protein databases. Trends in Biochemical Sciences. 1998; 23(11):444-7.
[Crossref] [Google Scholar]
[32]Gribskov M, McLachlan AD, Eisenberg D. Profile analysis: detection of distantly related proteins. Proceedings of the National Academy of Sciences. 1987; 84(13):4355-8.
[Crossref] [Google Scholar]
[33]Kumar M, Gromiha MM, Raghava GP. Identification of DNA-binding proteins using support vector machines and evolutionary profiles. BMC Bioinformatics. 2007; 8.
[Crossref] [Google Scholar]