Until now the sentiment package for R only worked with English text. Today, I released version 1.0 of the sentiment package that features multi-language support. In order to perform sentiment analysis with German text, just add the parameter language="german" as shown in this example:
The new code allows you to add any language. So far, I started to prepare German sentiment files. French and Spanish are coming…
Today I’d like to follow up on this and show how to implement sentiment analysis in Tableau using Tableau’s R integration. Some of the many uses of social media analytics is sentiment analysis where we evaluate whether posts on a specific issue are positive, neutral, or negative (polarity), and which emotion in predominant.
What do customers like or dislike about your products? How do people perceive your brand compared to last year?
The sentiment package requires the tm and Rstem packages, so make sure that they are installed properly. Execute these commands in your R console to install sentiment from GitHub (see alternative way to install at the end of this blog post):
The sentiment package offers two functions, which can be easily called from calculated fields in Tableau:
The function get_polarity returns “positive”, “neutral”, or “negative”:
The function get_emotion returns “anger”, “disgust”, “fear”, “joy”, “sadness”, “surprise”, or “NA”:
The sentiment package follows a lexicon based approach and comes with two files emotions_english.csv.gz (source and structure) and subjectivity_english.csv.gz (source and structure). Both files contain word lists in English and are stored in the R package library under /sentiment/data directory.
If text is incorrectly classified, you could easily fix this issue by extending these two files. If your aim is to analyze text other than English, you need to create word lists for the target language. Kindly share them in the comments!
Feel free to download the Packaged Workbook (twbx) here.
[Update 11 Aug 2016]: If you are having trouble with install_github, try to install directly form this website:
When I was doing text mining, I was often tempted to reach out for a scripting language like R, Python, or Ruby – and then I feed the results into Tableau. Tableau served as a communications tool to represent the insights in a pleasant way.
Wouldn’t it be handy to perform text mining and further analysis at the speed of thought directly in Tableau?
Tableau has some relatively basic text processing functions that can be used for calculated fields. This is, however, not enough to perform text mining such as sentiment analysis, where it is required to split up text in tokens. Also Tableau’s beloved R integration will not help in this case.
As a workaround, I decided to use Postgres’ built-in string functions for such text mining tasks, which perform much faster than most scripting languages. For the following word count example, I applied the function regexp_split_to_table that takes a piece of text (such as a blog post), splits it by a pattern, and returns the tokens as rows:
I joined this code snippet as a Custom SQL Query to my Tableau data source, which is connected to the database that is powering my blog:
And here we go, an interactive word count visualization:
This example could be easily enhanced with data from Google Analytics, or altered to analyse user comments, survey results, or social media feeds. Do you have some more fancy ideas for real-time text mining with Tableau? Leave me a comment!
[Update 19 Jan 2016]: How to identify Twitter hashtags? Do I need another RegEx?
Another regular expression via a Custom SQL Query is not required for identifying words within tweets as hashtags. A simple calculated field in Tableau will do the job:
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.