I've been researching sentiment analysis with word embeddings. I read papers that state that word embeddings ignore sentiment information of the words in the text. One paper states that among the top 10 words that are semantically similar, around 30 percent of words have opposite polarity e.g. happy - sad.
So, I computed word embeddings on my dataset (Amazon reviews) with the GloVe algorithm in R. Then, I looked at the most similar words with cosine similarity and I found that actually every word is sentimentally similar. (E.g. beautiful - lovely - gorgeous - pretty - nice - love). Therefore, I was wondering how this is possible since I expected the opposite from reading several papers. What could be the reason for my findings?
Two of the many papers I read:
- Yu, L. C., Wang, J., Lai, K. R. & Zhang, X. (2017). Refining Word Embeddings Using Intensity Scores for Sentiment Analysis. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 26(3), 671-681.
- Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T. & Qin, B. (2014). Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification. Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, 1: Long Papers, 1555-1565