(Publisher of Peer Reviewed Open Access Journals)

International Journal of Advanced Computer Research (IJACR)

ISSN (Print):2249-7277    ISSN (Online):2277-7970
Volume-9 Issue-44 September-2019
Full-Text PDF
Paper Title : A review of feature selection in sentiment analysis using information gain and domain specific ontology
Author Name : Ibrahim Said Ahmad, Azuraliza Abu Bakar and Mohd Ridzwan Yaakub
Abstract :

There is a continued interest in understanding people’s interest through the contents they share online. However, the data generated is massive, characterized by textual jargons and tokens that contain no sentiment or opinion value. One way of reducing the data dimension and pruning of irrelevant features is feature selection. However, the existing approaches of feature selection are still inefficient. Two prominent feature selection methods in sentiment analysis are information gain and ontology-based methods. Information gain has the disadvantage of not considering redundancy between features while ontology-based approach requires a lot of human intervention. The aim of this paper is to review these two methods. The review of these two methods shows that using the two methods in a two-step approach can overcome their limitations and provide an optimal feature set for sentiment analysis.

Keywords : Sentiment analysis, Feature selection, Information gain, Ontology.
Cite this article : Ahmad IS, Bakar AA, Yaakub MR. A review of feature selection in sentiment analysis using information gain and domain specific ontology. International Journal of Advanced Computer Research. 2019; 9(44):283-292. DOI:10.19101/IJACR.PID90.
References :
[1]Dave K, Lawrence S, Pennock DM. Mining the peanut gallery: opinion extraction and semantic classification of product reviews. In proceedings of the international conference on World Wide Web 2003 (pp. 519-28). ACM.
[Crossref] [Google Scholar]
[2]Nasukawa T, Yi J. Sentiment analysis: capturing favorability using natural language processing. In proceedings of the international conference on knowledge capture 2003 (pp. 70-7). ACM.
[Crossref] [Google Scholar]
[3]Pang B, Lee L, Vaithyanathan S. Thumbs up? sentiment classification using machine learning techniques. In proceedings of the ACL-02 conference on empirical methods in natural language processing 2002 (pp. 79-86). Association for Computational Linguistics.
[Crossref] [Google Scholar]
[4]Turney PD. Thumbs up or thumbs down? semantic orientation applied to unsupervised classification of reviews. In proceedings of the annual meeting on association for computational linguistics 2002 (pp. 417-24). Association for Computational Linguistics.
[Crossref] [Google Scholar]
[5]Yi J, Nasukawa T, Bunescu R, Niblack W. Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques. In IEEE international conference on data mining 2003 (pp. 427-34). IEEE.
[Crossref] [Google Scholar]
[6]Ahmad SR, Bakar AA, Yaakub MR. Metaheuristic algorithms for feature selection in sentiment analysis. In science and information conference (SAI) 2015 (pp. 222-6). IEEE.
[Crossref] [Google Scholar]
[7]Zheng L, Wang H, Gao S. Sentimental feature selection for sentiment analysis of Chinese online reviews. International Journal of Machine Learning and Cybernetics. 2018; 9(1):75-84.
[Crossref] [Google Scholar]
[8]Medhat W, Hassan A, Korashy H. Sentiment analysis algorithms and applications: a survey. Ain Shams Engineering Journal. 2014; 5(4):1093-113.
[Crossref] [Google Scholar]
[9]Ravi K, Ravi V. A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowledge-Based Systems. 2015; 89:14-46.
[Crossref] [Google Scholar]
[10]Miranda MD, Sassi RJ. Using sentiment analysis to assess customer satisfaction in an online job search company. In international conference on business information systems 2014 (pp. 17-27). Springer, Cham.
[Crossref] [Google Scholar]
[11]Wang S, Li D, Song X, Wei Y, Li H. A feature selection method based on improved fisher’s discriminant ratio for text sentiment classification. Expert Systems with Applications. 2011; 38(7):8696-702.
[Crossref] [Google Scholar]
[12]Colace F, De Santo M, Greco L, Moscato V, Picariello A. Probabilistic approaches for sentiment analysis: latent dirichlet allocation for ontology building and sentiment extraction. In sentiment analysis and ontology engineering 2016 (pp. 75-91). Springer, Cham.
[Crossref] [Google Scholar]
[13]Li YM, Li TY. Deriving market intelligence from microblogs. Decision Support Systems. 2013; 55(1):206-17.
[Crossref] [Google Scholar]
[14]Kang H, Yoo SJ, Han D. Senti-lexicon and improved naïve Bayes algorithms for sentiment analysis of restaurant reviews. Expert Systems with Applications. 2012; 39(5):6000-10.
[Crossref] [Google Scholar]
[15]Tripathy A, Agrawal A, Rath SK. Classification of sentiment reviews using n-gram machine learning approach. Expert Systems with Applications. 2016; 57:117-26.
[Crossref] [Google Scholar]
[16]Vohra SM, Teraiya JB. A comparative study of sentiment analysis techniques. Journal JIKRCE. 2013; 2(2):313-7.
[Google Scholar]
[17]Mohammad S, Dunne C, Dorr B. Generating high-coverage semantic orientation lexicons from overtly marked words and a thesaurus. In proceedings of the 2009 conference on empirical methods in natural language processing: 2009 (pp. 599-608). Association for Computational Linguistics.
[Google Scholar]
[18]Liu H, Lieberman H, Selker T. A model of textual affect sensing using real-world knowledge. In proceedings of the 8th international conference on intelligent user interfaces 2003 (pp. 125-32). ACM.
[Crossref] [Google Scholar]
[19]Tsai AC, Wu CE, Tsai RT, Hsu JY. Building a concept-level sentiment dictionary based on commonsense knowledge. IEEE Intelligent Systems. 2013; 28(2):22-30.
[Crossref] [Google Scholar]
[20]Hatzivassiloglou V, McKeown KR. Predicting the semantic orientation of adjectives. In proceedings of the annual meeting of the association for computational linguistics and eighth conference of the European chapter of the association for computational linguistics 1997 (pp. 174-81). Association for Computational Linguistics.
[Crossref] [Google Scholar]
[21]Mostafa MM. More than words: social networks’ text mining for consumer brand sentiments. Expert Systems with Applications. 2013; 40(10):4241-51.
[Crossref] [Google Scholar]
[22]Abdel-Hafez A, Xu Y. Ontology-based products reputation model. In proceedings of the IEEE/WIC/ACM international joint conferences on web intelligence (WI) and intelligent agent technologies (IAT)2013 (pp. 37-40). IEEE Computer Society.
[Crossref] [Google Scholar]
[23]Garcia-Herranz M, Moro E, Cebrian M, Christakis NA, Fowler JH. Using friends as sensors to detect global-scale contagious outbreaks. PloS one. 2014; 9(4).
[Crossref] [Google Scholar]
[24]Tumasjan A, Sprenger TO, Sandner PG, Welpe IM. Predicting elections with twitter: what 140 characters reveal about political sentiment. In fourth international AAAI conference on weblogs and social media 2010:178-85.
[Google Scholar]
[25]Duric A, Song F. Feature selection for sentiment analysis based on content and syntax models. Decision Support Systems. 2012; 53(4):704-11.
[Crossref] [Google Scholar]
[26]Kontopoulos E, Berberidis C, Dergiades T, Bassiliades N. Ontology-based sentiment analysis of twitter posts. Expert Systems with Applications. 2013; 40(10):4065-74.
[Crossref] [Google Scholar]
[27]Manek AS, Shenoy PD, Mohan MC, Venugopal KR. Aspect term extraction for sentiment analysis in large movie reviews using Gini Index feature selection method and SVM classifier. World Wide Web. 2017; 20(2):135-54.
[Crossref] [Google Scholar]
[28]Ding X, Liu B, Yu PS. A holistic lexicon-based approach to opinion mining. In proceedings of the international conference on web search and data mining 2008 (pp. 231-40). ACM.
[Crossref] [Google Scholar]
[29]Whitelaw C, Garg N, Argamon S. Using appraisal groups for sentiment analysis. In proceedings of the ACM international conference on information and knowledge management 2005 (pp. 625-31). ACM.
[Crossref] [Google Scholar]
[30]Gómez-Pérez A, Corcho O. Ontology languages for the semantic web. IEEE Intelligent Systems. 2002; 17(1):54-60.
[Crossref] [Google Scholar]
[31]Salas-Zárate MD, Valencia-García R, Ruiz-Martínez A, Colomo-Palacios R. Feature-based opinion mining in financial news: an ontology-driven approach. Journal of Information Science. 2017; 43(4):458-79.
[Crossref] [Google Scholar]
[32]Ali F, Kwak KS, Kim YG. Opinion mining based on fuzzy domain ontology and support vector machine: a proposal to automate online review classification. Applied Soft Computing. 2016; 47:235-50.
[Crossref] [Google Scholar]
[33]Agarwal B, Mittal N, Bansal P, Garg S. Sentiment analysis using common-sense and context information. Computational Intelligence and Neuroscience.2015.
[Crossref] [Google Scholar]
[34]Thakor P, Sasi S. Ontology-based sentiment analysis process for social media content. Procedia Computer Science. 2015; 53:199-207.
[Crossref] [Google Scholar]
[35]Lundquist D, Zhang K, Ouksel A. Ontology-driven cyber-security threat assessment based on sentiment analysis of network activity data. In international conference on cloud and autonomic computing 2014 (pp. 5-14). IEEE.
[Crossref] [Google Scholar]
[36]Marstawi A, Sharef NM, Aris TN, Mustapha A. Ontology-based aspect extraction for an improved sentiment analysis in summarization of product reviews. In proceedings of the international conference on computer modeling and simulation 2017 (pp. 100-4). ACM.
[Crossref] [Google Scholar]
[37]Schouten K, Frasincar F, De Jong F. Ontology-enhanced aspect-based sentiment analysis. In international conference on web engineering 2017 (pp. 302-20). Springer, Cham.
[Crossref] [Google Scholar]
[38]Yadav N, Chowdary CR. Feature based sentiment analysis using a domain ontology. In proceedings of the international conference on natural language processing 2016 (pp. 90-8).
[Google Scholar]
[39]Gutierrez F, Dou D, Fickas S, Wimalasuriya D, Zong H. A hybrid ontology-based information extraction system. Journal of Information Science. 2016; 42(6):798-820.
[Crossref] [Google Scholar]
[40]Alexopoulos P, Wallace M. Creating domain-specific semantic lexicons for aspect-based sentiment analysis. In international workshop on semantic and social media adaptation and personalization 2015 (pp. 1-6). IEEE.
[Crossref] [Google Scholar]
[41]Blanco E, Cankaya H, Moldovan D. Commonsense knowledge extraction using concepts properties. In twenty-fourth international FLAIRS conference 2011(pp. 222-7).
[Google Scholar]
[42]Shangfeng H, Kanagasabai R. Learning commonsense knowledge models for semantic analytics. In international conference on semantic computing 2016 (pp. 400-3). IEEE.
[Crossref] [Google Scholar]
[43]Cambria E, Hussain A, Havasi C, Eckl C. Sentic computing: exploitation of common sense for the development of emotion-sensitive systems. In development of multimodal interfaces: active listening and synchrony 2010 (pp. 148-56). Springer, Berlin, Heidelberg.
[Crossref] [Google Scholar]
[44]Cambria E, Speer R, Havasi C, Hussain A. Senticnet: a publicly available semantic resource for opinion mining. In AAAI fall symposium series 2010 (pp.14-8).
[Google Scholar]
[45]Cambria E, Havasi C, Hussain A. SenticNet 2: a semantic and affective resource for opinion mining and sentiment analysis. In international FLAIRS conference 2012 (pp. 202-7).
[Google Scholar]
[46]Cambria E, Olsher D, Rajagopal D. SenticNet 3: a common and common-sense knowledge base for cognition-driven sentiment analysis. In AAAI conference on artificial intelligence 2014(pp.1515-21).
[Google Scholar]
[47]Verdu S. Fifty years of Shannon theory. IEEE Transactions on Information Theory. 1998; 44(6):2057-78.
[Google Scholar]
[48]Lee C, Lee GG. Information gain and divergence-based feature selection for machine learning-based text categorization. Information Processing & Management. 2006; 42(1):155-65.
[Crossref] [Google Scholar]
[49]Mukras R, Wiratunga N, Lothian R, Chakraborti S, Harper D. Information gain feature selection for ordinal text classification using probability re-distribution. In proceedings of the textlink workshop at IJCAI 2007.
[Google Scholar]
[50]Wu G, Xu J. Optimized approach of feature selection based on information gain. In international conference on computer science and mechanical automation 2015 (pp. 157-61). IEEE.
[Crossref] [Google Scholar]
[51]Pratiwi AI. On the feature selection and classification based on information gain for document sentiment analysis. Applied Computational Intelligence and Soft Computing. 2018.
[Crossref] [Google Scholar]
[52]Zhu L, Wang G, Zou X. Improved information gain feature selection method for Chinese text classification based on word embedding. In proceedings of the international conference on software and computer applications 2017 (pp. 72-6). ACM.
[Crossref] [Google Scholar]
[53]Schouten K, Frasincar F, Dekker R. An information gain-driven feature study for aspect-based sentiment analysis. In international conference on applications of natural language to information systems 2016 (pp. 48-59). Springer, Cham.
[Crossref] [Google Scholar]
[54]Fahrudin TM, Syarif I, Barakbah AR. Feature selection algorithm using information gain-based clustering for supporting the treatment process of breast cancer. In international conference on informatics and computing 2016 (pp. 6-11). IEEE.
[Crossref] [Google Scholar]
[55]Ong BY, Goh SW, Xu C. Sparsity adjusted information gain for feature selection in sentiment analysis. In international conference on big data 2015 (pp. 2122-8). IEEE.
[Crossref] [Google Scholar]
[56]Gao Z, Xu Y, Meng F, Qi F, Lin Z. Improved information gain-based feature selection for text categorization. In international conference on wireless communications, vehicular technology, information theory and aerospace & electronic systems (VITAE) 2014 (pp. 1-5). IEEE.
[Crossref] [Google Scholar]
[57]Luo K, Luo J, Yin M, Li J. IG-C4. 5: an improved feature selection method based on information gain. In international conference on mechatronics, electronic, industrial and control engineering (MEIC-14) 2014. Atlantis Press.
[Crossref] [Google Scholar]