Data Science Toolbox: How to use R with Tableau

Recently, Tableau released an exciting feature that enhances the capabilities of data analytics: R integration via RServe. By bringing together Tableau and R, data scientists and analysts can now enjoy a more comprehensive and powerful data science toolbox. Whether you’re an experienced data scientist or just starting your journey in data analytics, this tutorial will guide you through the process of integrating R with Tableau.

Step by Step: Integrating R in Tableau

1. Install and start R and RServe

You can download base R from r-project.org. Next, invoke R from the terminal to install and run the RServe package:

> install.packages("Rserve")
> library(Rserve)
> Rserve()

To ensure RServe is running, you can try Telnet to connect to it:

Telnet

Protip: If you prefer an IDE for R, I can highly recommend you to install RStudio.

2. Connecting Tableau to RServe

Now let’s open Tableau and set up the connection:

Tableau 10 Help menu
Tableau 10 External Service Connection

3. Adding R code to a Calculated Field

You can invoke R scripts in Tableau’s Calculated Fields, such as k-means clustering controlled by an interactive parameter slider:

SCRIPT_INT('
kmeans(data.frame(.arg1,.arg2,.arg3),' + STR([Cluster Amount]) + ')$cluster;
',
SUM([Sales]), SUM([Profit]), SUM([Quantity]))
Calculated Field in Tableau 10

4. Use Calculated Field in Tableau

You can now use your R calculation as an alternate Calculated Field in your Tableau worksheet:

Tableau 10 showing k-means clustering

Feel free to download the Tableau Packaged Workbook (twbx) here.

Connect and Stay Updated

Stay on top of the latest in data science and analytics by following me on Twitter and LinkedIn. I frequently share tips, tricks, and insights into the world of data analytics, machine learning, and beyond. Join the conversation, and let’s explore the possibilities together!

Blog post updates:

Gartner Positions Tableau as a Leader for the First Time in BI Magic Quadrant

Screenshot of Tableau's 2013 February Newsletter featuring: "Gartner Positions Tableau as a Leader in 2013 Magic Quadrant"
Screenshot of Tableau’s 2013 February Newsletter featuring: „Gartner Positions Tableau as a Leader in 2013 Magic Quadrant“

One of the most highly anticipated and highly regarded reviews of the business intelligence market was published a couple of days ago. Gartner released its 2013 iteration of the famous Magic Quadrant for BI and Analytics Platform (aka. Gartner BI MQ) – and Tableau was cited as a „Leader“ for the first time.

Congraulations team Tableau!

Transition from Academia to Capgemini: A New Chapter in Data and Analytics

CERN Main Auditorium: my transition from academia to Capgemini
CERN Main Auditorium: my transition from academia to Capgemini

After enjoying research for the last four years, especially during my time at CERN, I have made a significant decision. I have decided to resign from my postgraduate position and make a transition from academia to the exciting world of Capgemini. My passion for Data and Analytics remains strong and will be the core focus of my new role.

Capgemini: A New Adventure After Academia

Capgemini, one of the world’s largest consulting corporations, has caught my attention. Unlike many other consulting companies, Capgemini does not yet have a dedicated team to offer effective strategies and solutions employing Big Data, Analytics, and Machine Learning. This presents an exciting opportunity for me to contribute and innovate.

My Vision: Building a Data-Driven Future at Capgemini

I love these technologies and am confident in my ability to elaborate a business development plan to drive business growth. Through customer and market definition, my plan includes new services such as:

  • Data Science Strategy: Enabling organizations to solve problems with insights from analytics.
  • Consulting: Answering questions using data.
  • Development: Building custom tools like interactive dashboards, pipelines, customized Hadoop setup, and data prep scripts.
  • Training: Offering various skill levels of training, from basic dashboard design to deep dives in R, Python, and D3.js.

This plan also includes a go-to-market strategy, which I’ll keep under wraps for now. Stay tuned for a retrospective reveal in the future!

Reflecting on My Transition from Academia

Making this transition from academia to a corporate role has been a considered decision. As I previously shared in my reflection on my software engineering internship at SAP, the blend of technological challenges and team collaboration has always intrigued me. Joining Capgemini allows me to continue pursuing my passion for data in a dynamic business environment.

Conclusion: Exciting Times Ahead

This transition from academia to Capgemini marks a thrilling new chapter in my career. I look forward to leveraging my expertise in Data and Analytics to contribute to Capgemini’s growth and innovation.

Follow my journey as I explore the intersection of data, technology, and business. Connect with me on Twitter and LinkedIn.

Challenges of Big Data Analytics in High-Energy Physics

Challenges of Big Data Analytics: volume, variety, velocity and veracity
Screenshot of CERN Big Data Analytics presentation

There are four key issues to overcome if you want to tame Big Data: volume (quantity of data), variety (different forms of data), velocity (how fast the data is generated and processed) and veracity (variation in quality of data). You have to be able to deal with lots and lots, of all kinds of data, moving really quickly.

That is why Big Data Analytics has a huge impact on how we plan CERN’s overall technology strategy as well as specific strategies for High-Energy Physics analysis. We want to profit from our data investment and extract the knowledge. This has to be done in a proactive, predictive and intelligent way.

The following presentation shows you how we use Big Data Analytics to improve the operation of the Large Hardron Collider.

Displaying Dimuon Events from the CMS Detector using D3.js

Physicists working on the CMS Detector
Physicists working on the CMS Detector

I became a Python geek and GnuPlot maniac since I joined CERN around three years ago. I have to admit, however, that I really enjoy the flexibility of D3.js, and its capability to render histograms directly in the web browser.

D3 is a JavaScript library for manipulating documents based on data. This library helps you to bring data to life leveraging HTML, CSS and SVG, and embed it in your website.

The following example loads a CSV file, which includes 10,000 dimuon events (i.e. events containing two muons) from the CMS detector, and displays the distribution of the invariant mass M (in GeV, in bins of size 0.1 GeV):

Feel free to download the sample CSV dataset here.

Further reading: D3 Cookbook