When I was doing text mining, I was often tempted to reach out for a scripting language like R, Python, or Ruby – and then I feed the results into Tableau. Tableau served as a communications tool to represent the insights in a pleasant way.
Wouldn’t it be handy to perform text mining and further analysis at the speed of thought directly in Tableau?
Tableau has some relatively basic text processing functions that can be used for calculated fields. This is, however, not enough to perform text mining such as sentiment analysis, where it is required to split up text in tokens. Also Tableau’s beloved R integration will not help in this case.
As a workaround, I decided to use Postgres’ built-in string functions for such text mining tasks, which perform much faster than most scripting languages. For the following word count example, I applied the function
regexp_split_to_table that takes a piece of text (such as a blog post), splits it by a pattern, and returns the tokens as rows:
|, regexp_split_to_table(lower(post_content), '\s+') as word|
|, count(1) as word_count|
I joined this code snippet as a Custom SQL Query to my Tableau data source, which is connected to the database that is powering my blog:
And here we go, an interactive word count visualization:
This example could be easily enhanced with data from Google Analytics, or altered to analyse user comments, survey results, or social media feeds. Do you have some more fancy ideas for real-time text mining with Tableau? Leave me a comment!
: How to identify Twitter hashtags? Do I need another RegEx?
Another regular expression via a Custom SQL Query is not required for identifying words within tweets as hashtags. A simple calculated field in Tableau will do the job:
|CASE LEFT([Word], 1)|
|WHEN "#" THEN "Hash Tag"|
|WHEN "@" THEN "User Reference"|
|ELSE "Regular Content"|
Any more feedback, ideas, or questions?