
When I was doing text mining, I was often tempted to reach out for a scripting language like R, Python, or Ruby – and then I feed the results into Tableau. Tableau served as a communications tool to represent the insights in a pleasant way.
Wouldn’t it be handy to perform text mining and further analysis at the speed of thought directly in Tableau?
Tableau has some relatively basic text processing functions that can be used for calculated fields. This is, however, not enough to perform text mining such as sentiment analysis, where it is required to split up text in tokens. Also Tableau’s beloved R integration will not help in this case.
As a workaround, I decided to use Postgres’ built-in string functions for such text mining tasks, which perform much faster than most scripting languages. For the following word count example, I applied the function regexp_split_to_table
that takes a piece of text (such as a blog post), splits it by a pattern, and returns the tokens as rows:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
select | |
guid | |
, regexp_split_to_table(lower(post_content), '\s+') as word | |
, count(1) as word_count | |
from | |
alexblog_posts | |
group by | |
guid, word |
I joined this code snippet as a Custom SQL Query to my Tableau data source, which is connected to the database that is powering my blog:
And here we go, an interactive word count visualization:
This example could be easily enhanced with data from Google Analytics, or altered to analyse user comments, survey results, or social media feeds. Do you have some more fancy ideas for real-time text mining with Tableau? Leave me a comment!
: How to identify Twitter hashtags? Do I need another RegEx?
Another regular expression via a Custom SQL Query is not required for identifying words within tweets as hashtags. A simple calculated field in Tableau will do the job:
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
CASE LEFT([Word], 1) | |
WHEN "#" THEN "Hash Tag" | |
WHEN "@" THEN "User Reference" | |
ELSE "Regular Content" | |
END |
Looking for an example? Feel free to check out the Tweets featuring #tableau Dashboard on Tableau Public and download the Packaged Workbook (twbx):
Any more feedback, ideas, or questions?