Exploring Performance of Clustering Methods on Document Sentiment Analysis
Clustering is a powerful unsupervised tool for sentiment analysis from text. However, the clustering results may be affected by any step of the clustering process, such as data pre-processing strategy, term weighting method in Vector Space Model and clustering algorithm. This paper presents the results of an experimental study of some common clustering techniques with respect to the task of sentiment analysis. Different from previous studies, in particular, we investigate the combination effects of these factors with a series of comprehensive experimental studies. The experimental results indicate that, first, the K-means-type clustering algorithms show clear advantages on balanced review datasets, while performing rather poorly on unbalanced datasets by considering clustering accuracy. Second, the comparatively newly designed weighting models are better than the traditional weighting models for sentiment clustering on both balanced and unbalanced datasets. Furthermore, adjective and adverb words extraction strategy can offer obvious improvements on clustering performance, while strategies of adopting stemming and stopword removal will bring negative influences on sentiment clustering. The experimental results would be valuable for both the study and usage of clustering methods in online review sentiment analysis
Year of publication: |
2017
|
---|---|
Authors: | Ma, Baojun |
Other Persons: | Yuan, Hua (contributor) ; Wu, Ye (contributor) |
Publisher: |
[2017]: [S.l.] : SSRN |
Saved in:
freely available
Extent: | 1 Online-Ressource (21 p) |
---|---|
Type of publication: | Book / Working Paper |
Language: | English |
Notes: | In: Journal of Information Science, 2017, 43(1): 54-74 Nach Informationen von SSRN wurde die ursprüngliche Fassung des Dokuments December 3, 2015 erstellt |
Source: | ECONIS - Online Catalogue of the ZBW |
Persistent link: https://www.econbiz.de/10012970149
Saved in favorites
Similar items by person
-
Semantic Search for Public Opinions on Urban Affairs : A Probabilistic Topic Modeling-Based Approach
Ma, Baojun, (2017)
-
Towards controlling virus propagation in information systems with point-to-group information sharing
Yuan, Hua, (2009)
-
Optimizing the re-profiling strategy of metro wheels based on a data-driven wear model
Wang, Ling, (2015)
- More ...