How to perform Text Mining at the Speed of Thought directly in Tableau

Interactive real-time text mining with Tableau Desktop 9.2
Interactive real-time text mining with Tableau Desktop

Tableau is an incredibly versatile tool, commonly known for its ability to create stunning visualizations. But did you know that with Tableau, you can also perform real-time, interactive text mining? Let’s delve into how we can harness this function to gain rapid insights from our textual data.

Previously, during text mining tasks, you might have found yourself reaching for a scripting language like R, Python, or Ruby, only to feed the results back into Tableau for visualization. This approach has Tableau serving merely as a communications tool to represent insights.

However, wouldn’t it be more convenient and efficient to perform text mining and further analysis directly in Tableau?

While Tableau has some relatively basic text processing functions that can be used for calculated fields, these often fall short when it comes to performing tasks like sentiment analysis, where text needs to be split into tokens. Even Tableau’s beloved R integration does not lend a hand in these scenarios.

The Power of Postgres for Text Mining in Tableau

Faced with these challenges, I decided to harness the power of Postgres‘ built-in string functions for text mining tasks. These functions perform much faster than most scripting languages. For example, I used the function regexp_split_to_table for word count, which takes a piece of text (like a blog post), splits it by a pattern, and returns the tokens as rows:

select
guid
, regexp_split_to_table(lower(post_content), '\s+') as word
, count(1) as word_count
from
alexblog_posts
group by
guid, word

Incorporating Custom SQL into Tableau Visualization

I joined this code snippet as a Custom SQL Query to my Tableau data source, which is connected to the database that is powering my blog:

Join with Custom SQL Query in Tableau applying the Postgres function regexp_split_to_table
Join with Custom SQL Query in Tableau applying the Postgres function regexp_split_to_table

And here we go, I was able to create an interactive word count visualization right in Tableau:

This example can be easily enhanced with data from Google Analytics, or adapted to analyze user comments, survey results, or social media feeds. The possibilities for Custom SQL in Tableau are vast and versatile. Do you have some more fancy ideas for real-time text mining with Tableau? Leave me a comment!

Update (TC Pro Tip): Identifying Twitter Hashtags in Tableau

A simple calculated field in Tableau can help identify words within tweets as hashtags or user references, eliminating the need for another regular expression via a Custom SQL Query:

CASE LEFT([Word], 1)
WHEN "#" THEN "Hash Tag"
WHEN "@" THEN "User Reference"
ELSE "Regular Content"
END

Looking for an example? Feel free to check out the Tweets featuring #tableau Dashboard on Tableau Public and download the Packaged Workbook (twbx):

Tableau dashboard that shows tweets featuring the hashtag #tableau (presented at Tableau Conference)
Tableau dashboard that shows tweets featuring the hashtag #tableau (presented at Tableau Conference)

Any more feedback, ideas, or questions? I hope this post provides you with valuable insights into how to master text mining in Tableau, and I look forward to hearing about your experiences and creative applications. You can find more tutorials like this in my new book Visual Analytics with Tableau (Amazon).

Transparency: This blog contains affiliate links. If you click on them, you will be redirected to the merchant. If you decide to make a purchase, we will receive a small commission. The price does not change for you. Affiliate links have no influence on our writing.

KPMG Global Automotive Executive Survey 2016

KPMG Global Automotive Executive Survey 2016: click to open interactive story
KPMG Global Automotive Executive Survey 2016: click to open interactive story

In the recent months, 800 automotive executives from 38 countries gave their insights to KPMG. You can discover the key highlights of the KPMG Global Automotive Executive Survey in this eye-catching interactive Tableau story.

This is a fabulous example how you can use stories to present a narrative to an audience. Just as dashboards provide spatial arrangements of analysis that work together, stories present sequential arrangements of analysis that create a narrative flow for your audience.

How to load data to Hadoop with Alteryx and visualize with Tableau via Impala?

This YouTube tutorial shows you a handy way to load your Excel data to Cloudera Hadoop with Alteryx, and how to see and understand your data even faster with Tableau connected to Impala.

The same tool chain to load and access data can be used with Hive (eg. on Hortonworks) or Spark SQL (eg. on MapR). A overview on common data process technologies can be found in the Big Data jungle guide.

How to use a custom Mapbox map as your background map in Tableau

Mapbox map in Tableau
Mapbox map in Tableau

Tableau now comes with more geographical data built in, including updated US congressional districts (CD), local name synonyms for world capitals, Japanese postal, and Mapbox integration. I’ve to admit I really love Mapbox!

What is Mapbox? Mapbox is an online repository of custom-built maps for your needs and enables you to create the perfect map to integrate into your Tableau visualization. Mapbox maps are highly customizable – you can design your own map, build applications, extend applications, use satellite imagery and create static maps. You can even have Pirate Maps!

Mapbox tutorial:

  1. So first off you’ll have to register with mapbox.com
  2. Once you are logged in go to Account > API access tokens > copy and paste your token. You’ll need this for Tableau.
  3. Open up Tableau > connect to your data source that has geographical locations. For this case, we will use the sample sales data set that is preloaded in Tableau.
  4. Map > Background Maps > Map Service to open a popup box.
  5. Add > Mapbox Services > Classic
  6. Fill in a style name for this map > Paste in the access token you previously copied
  7. Drop the selection box down and it will provide a list of classic maps already for your use. For this case, we are going to use Emerald.
  8. Take your city dimension from the data set > double click or drag and drop to populate a map. See below the before and after without Mapbox and with.

If you create multiple Mapbox maps and want to populate different styles on different worksheets, you can:

  1. Maps > Background Maps > Emerald. Here you have a list of maps that you have created.

So here you have a basic understanding of using Mapbox in Tableau.

Happy mapping, literally go explore! And join me on Twitter:

How to speed up Tableau by using Performance Recordings

Tableau Performance Recording Timeline
Tableau Performance Recording Timeline

Getting your dashboards up to speed can be quite difficult if you don’t know where the latency is situated. The first and most important rule about making workbooks more efficient is to understand that if it loads slowly in Desktop on your computer, then it will be slow on the server too once it is published. Tableau Desktop and Tableau Server each have their own way to enable, record, and analyze performance.

A must have for performance tuning your workbooks. All you have to do is start the Tableau Performance Recording, make your workbook action and stop the Performance Recording. A few seconds later, Tableau opens a new workbook with the Performance Summary dashboard in it.

Create a performance recording in Tableau Desktop

  1. To start recording performance, follow this step: Help > Settings and Performance > Start Performance Recording
  2. Make some dashboard operations and/or refresh your data source(s).
  3. To stop recording, and then view a temporary workbook containing results from the recording session, follow this step: Help > Settings and Performance > Stop Performance Recording
  4. You can now view the Performance Summary dashboard and begin your analysis.

Create a performance recording on Tableau Server

  1. Administrators must enable the feature. This is located under settings, for each site.
  2. Check the box and save for Workbook Performance Metrics.
  3. Navigate to a view on the server.
  4. Remove the iid=xx from the URL.
  5. Enter in its place record_performance=yes. Your full URL should now look something like this: https://data.alexloth.com/#/site/AA/views/Superstore/Summary?:record_performance=yes
  6. After the page reloads, you’ll notice the ID is added automatically back to the URL and that a performance button appears within the View’s toolbar. Don’t click on the performance button yet.
  7. Do some filtering and some clicking within the workbook such as applying filters, selecting marks/rows, and clicks that cause actions to other elements of the visualization.
  8. Then click the performance button.
  9. Now you’re ready to click on the Performance button which will launch a new window with the Performance Summary dashboard.
  10. Don’t forget to disable the performance recording in the admin settings when you are finished.

Understand the Performance Summery dashboard

The Performance Summery dashboard contains three views:

  • Timeline: a Gantt chart displaying event start time and duration.
  • Events sorted by time: a bar chart showing event duration by type.
  • Query text: It optionally appears when clicking-on an executing query event in the bar chart.

Time line Gantt chart

The uppermost view in a performance recording dashboard shows the events that occurred during the recording, arranged chronologically from left to right. The bottom axis shows elapsed time since Tableau started, in seconds.

In the Timeline view, the WorkbookDashboard, and Worksheet columns identify the context for the events. The Event column identifies the nature of the event, and the final column show each event’s duration and how it compares chronologically to other recorded events.

The events sorted by time

This section of the workbook shows the duration of recorded events in descending order. This is useful for observing the execution time of each event that occurs during the performance recording. This will help you identify any lengthy events that may be the cause of performance problems.
Events with longer durations can help you identify where to look first if you want to speed up your workbook.

Different colors indicate different types of events. The range of events that can be recorded is:

  • Computing layouts: If layouts are taking too long, consider simplifying your workbook.
  • Connecting to a data source: Slow connections could be due to network issues or issues with the database server.
  • Executing query: If queries are taking too long, consult your database server’s documentation.
  • Generating extract: To speed up extract generation, consider only importing some data from the original data source. For example, you can filter on specific data fields, or create a sample based on a specified number of rows or percentage of the data.
  • Geocoding: To speed up geocoding performance, try using less data or filtering out data.
  • Blending data: To speed up data blending, try using less data or filtering out data.
  • Server rendering: You can speed up server, rendering by running additional VizQL Server processes on additional machines.

Query text

Alternatively, the workbook also displays the query text for any specific event that you want to examine in detail. You can access the detail by clicking on any of the green executing query events in the bar chart. This is a handy feature which allows you to review any query text that may be of interest without having to leave the tableau performance summary dashboard.

If you click on an Executing Query event in either the Timeline or Events section of a performance recording dashboard, the text for that query is displayed in the Query section.