How to implement Sentiment Analysis in Tableau using R

Interactive sentiment analysis with Tableau using R
Interactive sentiment analysis with Tableau using R

In my previous post I highlighted Tableau’s text mining capabilities, resulting in fancy visuals such as word clouds:

Today I’d like to follow up on this and show how to implement sentiment analysis in Tableau using Tableau’s R integration. Some of the many uses of social media analytics is sentiment analysis where we evaluate whether posts on a specific issue are positive, neutral, or negative (polarity), and which emotion in predominant.

What do customers like or dislike about your products? How do people perceive your brand compared to last year?

In order to answer such questions in Tableau, we need to install an R package that is capable of performing the sentiment analysis. In the following example we use an extended version of the sentiment package, which was initiated by Timothy P. Jurka.

The sentiment package requires the tm and Rstem packages, so make sure that they are installed properly. Execute these commands in your R console to install sentiment from GitHub (see alternative way to install at the end of this blog post):


install.packages("devtools")
library(devtools)
install_github("aloth/sentiment/sentiment")

The sentiment package offers two functions, which can be easily called from calculated fields in Tableau:

Screenshot 2016-01-31 15.25.24 crop

The function get_polarity returns “positive”, “neutral”, or “negative”:


SCRIPT_STR('
library(sentiment)
get_polarity(.arg1, algorithm = "bayes")
'
, ATTR([Tweet Text]))

The function get_emotion returns “anger”, “disgust”, “fear”, “joy”, “sadness”, “surprise”, or “NA”:


SCRIPT_STR('
library(sentiment)
get_emotion(.arg1, algorithm = "bayes")
'
, ATTR([Tweet Text]))

The sentiment package follows a lexicon based approach and comes with two files emotions_english.csv.gz (source and structure) and subjectivity_english.csv.gz (source and structure). Both files contain word lists in English and are stored in the R package library under /sentiment/data directory.

If text is incorrectly classified, you could easily fix this issue by extending these two files. If your aim is to analyze text other than English, you need to create word lists for the target language. Kindly share them in the comments!

Feel free to download the Packaged Workbook (twbx) here.

Update 11 Aug 2016: If you are having trouble with install_github, try to install directly form this website:


install.packages("Rcpp")
install.packages("http://alexloth.com/utils/sentiment/current/sentiment.zip",repos=NULL)

How to perform Text Mining at the Speed of Thought directly in Tableau

Interactive real-time text mining with Tableau 9.2
Interactive real-time text mining with Tableau 9.2

When I was doing text mining, I was often tempted to reach out for a scripting language like R, Python, or Ruby – and then I feed the results into Tableau. Tableau served as a communications tool to represent the insights in a pleasant way.

Wouldn’t it be handy to perform text mining and further analysis at the speed of thought directly in Tableau?

Tableau has some relatively basic text processing functions that can be used for calculated fields. This is, however, not enough to perform text mining such as sentiment analysis, where it is required to split up text in tokens. Also Tableau’s beloved R integration will not help in this case.

As a workaround, I decided to use Postgres’ built-in string functions for such text mining tasks, which perform much faster than most scripting languages. For the following word count example, I applied the function regexp_split_to_table that takes a piece of text (such as a blog post), splits it by a pattern, and returns the tokens as rows:


select
guid
, regexp_split_to_table(lower(post_content), '\s+') as word
, count(1) as word_count
from
alexblog_posts
group by
guid, word

I joined this code snippet as a Custom SQL Query to my Tableau data source, which is connected to the database that is powering my blog:

Screenshot 2016-01-14 15.34.46

And here we go, an interactive word count visualization:

 

This example could be easily enhanced with data from Google Analytics, or altered to analyse user comments, survey results, or social media feeds. Do you have some more fancy ideas for real-time text mining with Tableau? Leave me a comment!

: How to identify Twitter hashtags? Do I need another RegEx?

Another regular expression via a Custom SQL Query is not required for identifying words within tweets as hashtags. A simple calculated field in Tableau will do the job:


CASE LEFT([Word], 1)
WHEN "#" THEN "Hash Tag"
WHEN "@" THEN "User Reference"
ELSE "Regular Content"
END

Looking for an example? Feel free to check out the Tweets featuring #tableau Dashboard on Tableau Public and download the Packaged Workbook (twbx):

Tweets featuring #tableau Dashboard

Any more feedback, ideas, or questions?

How to use a custom Mapbox map as your background map in Tableau

Mapbox map in Tableau
Mapbox map in Tableau

Tableau now comes with more geographical data built in, including updated US congressional districts (CD), local name synonyms for world capitals, Japanese postal, and Mapbox integration. I’ve to admit I really love Mapbox!

What is Mapbox? Mapbox is an online repository of custom-built maps for your needs and enables you to create the perfect map to integrate into your Tableau visualization. Mapbox maps are highly customizable – you can design your own map, build applications, extend applications, use satellite imagery and create static maps. You can even have Pirate Maps!

Mapbox tutorial:

  1. So first off you’ll have to register with mapbox.com
  2. Once you are logged in go to Account > API access tokens > copy and paste your token. You’ll need this for Tableau.
  3. Open up Tableau > connect to your data source that has geographical locations. For this case, we will use the sample sales data set that is preloaded in Tableau.
  4. Map > Background Maps > Map Service to open a popup box.
  5. Add > Mapbox Services > Classic
  6. Fill in a style name for this map > Paste in the access token you previously copied
  7. Drop the selection box down and it will provide a list of classic maps already for your use. For this case, we are going to use Emerald.
  8. Take your city dimension from the data set > double click or drag and drop to populate a map. See below the before and after without Mapbox and with.

If you create multiple Mapbox maps and want to populate different styles on different worksheets, you can:

  1. Maps > Background Maps > Emerald. Here you have a list of maps that you have created.

So here you have a basic understanding of using Mapbox in Tableau.

Happy mapping, literally go explore! And join me on Twitter:

How to speed up Tableau by using Performance Recordings

Tableau Performance Recording Timeline
Tableau Performance Recording Timeline

Getting your dashboards up to speed can be quite difficult if you don’t know where the latency is situated. The first and most important rule about making workbooks more efficient is to understand that if it loads slowly in Desktop on your computer, then it will be slow on the server too once it is published. Tableau Desktop and Tableau Server each have their own way to enable, record, and analyze performance.

A must have for performance tuning your workbooks. All you have to do is start the Tableau Performance Recording, make your workbook action and stop the Performance Recording. A few seconds later, Tableau opens a new workbook with the Performance Summary dashboard in it.

Create a performance recording in Tableau Desktop

  1. To start recording performance, follow this step: Help > Settings and Performance > Start Performance Recording
  2. Make some dashboard operations and/or refresh your data source(s).
  3. To stop recording, and then view a temporary workbook containing results from the recording session, follow this step: Help > Settings and Performance > Stop Performance Recording
  4. You can now view the Performance Summary dashboard and begin your analysis.

Create a performance recording on Tableau Server

  1. Administrators must enable the feature. This is located under settings, for each site.
  2. Check the box and save for Workbook Performance Metrics.
  3. Navigate to a view on the server.
  4. Remove the iid=xx from the URL.
  5. Enter in its place record_performance=yes. Your full URL should now look something like this: https://data.alexloth.com/#/site/AA/views/Superstore/Summary?:record_performance=yes
  6. After the page reloads, you’ll notice the ID is added automatically back to the URL and that a performance button appears within the View’s toolbar. Don’t click on the performance button yet.
  7. Do some filtering and some clicking within the workbook such as applying filters, selecting marks/rows, and clicks that cause actions to other elements of the visualization.
  8. Then click the performance button.
  9. Now you’re ready to click on the Performance button which will launch a new window with the Performance Summary dashboard.
  10. Don’t forget to disable the performance recording in the admin settings when you are finished.

Understand the Performance Summery dashboard

The Performance Summery dashboard contains three views:

  • Timeline: a Gantt chart displaying event start time and duration.
  • Events sorted by time: a bar chart showing event duration by type.
  • Query text: It optionally appears when clicking-on an executing query event in the bar chart.

Time line Gantt chart

The uppermost view in a performance recording dashboard shows the events that occurred during the recording, arranged chronologically from left to right. The bottom axis shows elapsed time since Tableau started, in seconds.

In the Timeline view, the WorkbookDashboard, and Worksheet columns identify the context for the events. The Event column identifies the nature of the event, and the final column show each event’s duration and how it compares chronologically to other recorded events.

The events sorted by time

This section of the workbook shows the duration of recorded events in descending order. This is useful for observing the execution time of each event that occurs during the performance recording. This will help you identify any lengthy events that may be the cause of performance problems.
Events with longer durations can help you identify where to look first if you want to speed up your workbook.

Different colors indicate different types of events. The range of events that can be recorded is:

  • Computing layouts: If layouts are taking too long, consider simplifying your workbook.
  • Connecting to a data source: Slow connections could be due to network issues or issues with the database server.
  • Executing query: If queries are taking too long, consult your database server’s documentation.
  • Generating extract: To speed up extract generation, consider only importing some data from the original data source. For example, you can filter on specific data fields, or create a sample based on a specified number of rows or percentage of the data.
  • Geocoding: To speed up geocoding performance, try using less data or filtering out data.
  • Blending data: To speed up data blending, try using less data or filtering out data.
  • Server rendering: You can speed up server, rendering by running additional VizQL Server processes on additional machines.

Query text

Alternatively, the workbook also displays the query text for any specific event that you want to examine in detail. You can access the detail by clicking on any of the green executing query events in the bar chart. This is a handy feature which allows you to review any query text that may be of interest without having to leave the tableau performance summary dashboard.

If you click on an Executing Query event in either the Timeline or Events section of a performance recording dashboard, the text for that query is displayed in the Query section.

Data Science Toolbox: How to use R with Tableau

Recently Tableau released an exciting new feature: R integration via RServe. Tableau with R seems to bring my data science toolbox to the next level! In this tutorial, I’m going to walk you through the installation and connecting Tableau with RServe. I will also give you an example of calling an R function with a parameter from Tableau to visualize the results in Tableau.

1. Install and start R and RServe

You can download base R from r-project.org. Next, invoke R from the terminal to install and run the RServe package:

> install.packages("Rserve")
> library(Rserve)
> Rserve()

To ensure RServe is running, you can try Telnet to connect to it:

Telnet

Protip: If you prefer an IDE for R, I can highly recommend you to install RStudio.

2. Connecting Tableau to RServe

Now let’s open Tableau and set up the connection:

Tableau 10 Help menu

Tableau 10 External Service Connection

3. Adding R code to a Calculated Field

You can invoke R scripts in Tableau’s Calculated Fields, such as k-means clustering controlled by an interactive parameter slider:


SCRIPT_INT('
kmeans(data.frame(.arg1,.arg2,.arg3),' + STR([Cluster Amount]) + ')$cluster;
',
SUM([Sales]), SUM([Profit]), SUM([Quantity]))

Calculated Field in Tableau 10

4. Use Calculated Field in Tableau

You can now use your R calculation as an alternate Calculated Field in your Tableau worksheet:

Tableau 10 showing k-means clustering

Feel free to download the Tableau Packaged Workbook (twbx) here.

Update 26 Jun 2016: Tableau 8.1 screenshots were updated with Tableau 10.0 (preview) screenshots due to my upcoming Advanced Analytics session at TC16, which is going to reference back to this blog post.