#datamustread: Visualize This: The FlowingData Guide to Design, Visualization, and Statistics (2nd Edition) by Nathan Yau

A bookshelf neatly arranged with several books on data visualization and analytics: Displayed in the center is the 2nd edition of "Visualize This: The FlowingData Guide to Design, Visualization, and Statistics" by Nathan Yau. Surrounding this book are various other titles, including those by the Alexander Loth: "Decisively Digital", "Teach Yourself VISUALLY Power BI", "Visual Analytics with Tableau", "Datenvisualisierung mit Tableau", "Datenvisualisierung mit Power BI", and "KI für Content Creation." Other visible titles include "Rewired" and "Self-Service BI & Analytics." The arrangement highlights a strong focus on data visualization, analytics, and AI.
The 2nd edition of Visualize This by Nathan Yau, surrounded by several influential data and AI books, including my own works like Decisively Digital and Teach Yourself Visually Power BI.

While my latest book, KI für Content Creation, has just been reviewed by the renowned c’t magazine, I’m happy to continue reviewing books myself. Today, I’m reviewing the just-released second edition of a cornerstone of the data visualization community, Visualize This: The FlowingData Guide to Design, Visualization, and Statistics by Nathan Yau.

Visualize This: A Deep Dive into Data Visualization

Nathan Yau’s Visualize This has long been a staple for data enthusiasts, and the updated second edition brings fresh techniques, technologies, and examples that reflect the rapidly evolving landscape of data visualization.

Core Highlights of This Book

Data-First Approach: Yau emphasizes that effective visualizations start with a deep understanding of the data. This foundational principle ensures that the resulting graphics are not just visually appealing but also accurately convey the underlying information.

Diverse Toolkit: The book introduces a wide range of tools, including the latest R packages, Python libraries, JavaScript libraries, and illustration software. Yau’s pragmatic approach helps readers choose the right tool for the job without feeling overwhelmed by options.

Real-World Applications: With practical, hands-on examples using real-world datasets, readers learn to create meaningful visualizations. This experiential learning approach is particularly valuable for grasping the subtleties of data representation.

Comprehensive Tutorials: The step-by-step guides are a standout feature, covering statistical graphics, geographical maps, and information design. These tutorials provide clear, actionable instructions that make complex visualizations accessible.

Web and Print Design: Yau details how to create visuals suitable for various mediums, ensuring versatility in application whether for digital platforms or printed materials.

Personal Insights on Visualize This

Having taught data strategy and visualization for seven years, I find Visualize This to be an exceptional resource for a broad audience. Yau skillfully integrates scientific data visualization techniques with graphic design principles, providing practical advice along the way. The book’s toolkit is extensive, featuring R, Illustrator, XML, Python (with BeautifulSoup), JSON, and more, each with working code examples to demonstrate real-world applications.

The image shows an open page from the second edition of "Visualize This: The FlowingData Guide to Design, Visualization, and Statistics" by Nathan Yau. The page, from Chapter 5 titled "Visualizing Categories," features a colorful visualization titled "Cycle of Many," which depicts a 24-hour snapshot of daily activities based on data from the American Time Use Survey. This visual highlights how categories change over time and demonstrates the book's practical approach to data visualization.
Visualize This featuring a colorful 24-hour activity visualization based on data from the American Time Use Survey.

Even though I read the first edition years ago, I couldn’t put the second edition down all weekend. This book is a must-read for anyone who handles data or prepares data-based reports. Its beautiful presentation and careful consideration of every aspect—from typeface to page layout—make it a pleasure to read.

The book is user-friendly, offering a massive set of references and free tools for obtaining interesting datasets across various fields, from sports to politics to health. This breadth of resources is crucial for anyone looking to create impactful visualizations across different domains.

While the focus on Adobe Illustrator might be daunting due to its cost and learning curve, Yau’s examples show how Illustrator can enhance graphics created in other tools like SAS and R. I personally prefer the open-source Inkscape, but Yau’s insights helped me overcome my initial reluctance to use Illustrator, leading to more polished and professional visuals.

Yau uses R, Python, and Adobe Illustrator to demonstrate what can be achieved with imagination and creativity. Although some readers might desire more complex walkthroughs from raw data to final graphics, such material would require substantial foundational knowledge in R and Python. Including this would make the book significantly thicker and veer off from its focus on creating visually appealing graphics.

Conclusion: Visualize This is Essential Reading for Data Professionals

Visualize This (Amazon) is an indispensable guide for anyone serious about data visualization. Its methodical, data-first approach, combined with practical tutorials and a comprehensive toolkit, makes it a must-read for information designers, analysts, journalists, statisticians, and data scientists.

For those looking to refine their data visualization skills and create compelling, accurate graphics, this book offers invaluable insights and techniques.

Connect with me on LinkedIn and Twitter for more reviews and insights on the latest in data & AI, and #datamustread:

„#datamustread: Visualize This: The FlowingData Guide to Design, Visualization, and Statistics (2nd Edition) by Nathan Yau“ weiterlesen

Newsletter: Data & AI Digest #2

Generated with DALL-E
Generated with DALL-E

👋 Hello Data & AI Enthusiasts,

Welcome to another edition of the Data & AI Digest! We’re excited to bring you a curated selection of the week’s most compelling stories in the realm of data science, artificial intelligence, and more. Whether you’re a seasoned expert or a curious beginner, there’s something here for everyone.

  1. [AI] Understanding AI Performance: Discover how modern AI models often match or exceed human capabilities in tests, yet struggle in real-world applications. Read more
  2. [AI] Generative AI Strategy for Tech Leaders: CIOs and CTOs need to integrate generative AI into their tech architecture effectively. Explore 5 key elements for successful implementation. Read more
  3. [Statistics] Mastering the Central Limit Theorem in R: Understand the Central Limit Theorem, a cornerstone in statistics, and learn how to simulate it using R in this step-by-step tutorial. Read more
  4. [Graph Theory] Comprehensive Introduction to Graph Theory: This quarter-long course covers everything from simple graphs to Eulerian circuits and spanning trees. Read more
  5. [SQL] SQL Konferenz Highlights on Microsoft Fabric: Get an in-depth look at Microsoft Fabric and its role as a Data Platform for the Era of AI. Read more
  6. [Microsoft] Forbes Insights on Microsoft’s Copilots: Learn six critical things every business owner should know about Microsoft Copilot. Read more
  7. [GitHub] How GitHub’s Copilot is Being Used: GitHub’s Copilot remains the most popular AI-based code completion service. Find out the latest usage trends. Read more
  8. [Apple] iPhone 15 Pro’s Spatial Videos: Teased at Apple’s latest keynote, learn about the new spatial video capabilities of the iPhone 15 Pro. Read more
  9. [Geopolitics] China’s AI Influence Campaign: Researchers from Microsoft and other organizations discuss Beijing’s rapid change in disinformation tactics through AI. Read more

That’s a wrap for this week’s Data & AI Digest! We hope you found these articles insightful and thought-provoking. If you enjoyed this issue, help us make it bigger and better by sharing it with colleagues and friends. 🚀

Don’t forget, for real-time updates and discussions, join our LinkedIn Data & AI User Group. We look forward to your active participation and valuable insights.

Get the Data & AI Digest newsletter delivered to your email weekly.

Until next week, happy reading and exploring!

Data Science Toolbox: How to use Julia with Tableau

Julia in Tableau: R allows Tableau to execute Julia code on the fly, enhancing your data analytics experience.
Julia in Tableau: R allows Tableau to execute Julia code on the fly, enhancing your data analytics experience.

Michael, a data scientist, who is working for a German railway and logistics company, recently told me during a FATUG Meetup that he loves Tableau’s R integration and Tableau’s Python integration. As he continued, he raised the question of using functions they have written in Julia. Julia, a high-level dynamic programming language for high-performance numerical analysis, is an integral part of the newly developed data strategy in Michael’s organization.

Tableau, however, does not come with native support for Julia. I didn’t want to keep Michael’s team down and was looking for an alternative way to integrate Julia with Tableau.

This solution is working flawlessly in a production environment for several months. In this tutorial, I’m going to walk you through the installation and connecting Tableau with R and Julia. I will also give you an example of calling a Julia statement from Tableau to calculate the sphere volume.

Step by Step: Integrating Julia in Tableau

1. Install Julia and add PATH variable

You can download Julia from julialang.org. Add Julia’s installation path to the PATH environment variable.

2. Install R, XRJulia, and RServe

You can download base R from r-project.org. Next, invoke R from the terminal to install the XRJulia and the RServe packages:

> install.packages("XRJulia")
> install.packages("Rserve")

XRJulia provides an interface from R to Julia. RServe is a TCP/IP server that allows Tableau to use facilities of R.

3. Load libraries and start RServe

After packages are successfully installed, we load them and run RServe:

> library(XRJulia)
> library(Rserve)
> Rserve()

Make sure to repeat this step every time you restart your R session.

4. Connecting Tableau to RServe

Now let’s open the Help menu in Tableau Desktop and choose Settings and Performance >Manage External Service connection to open the External Service Connection dialog box:

TC17 External Service Connection

Enter a server name using a domain or an IP address and specify a port. Port 6311 is the default port used by Rserve. Take a look at my R tutorial to learn more about Tableau’s R integration.

5. Adding Julia code to a Calculated Field

You can invoke Calculated Field functions called SCRIPT_STR, SCRIPT_REAL, SCRIPT_BOOL, and SCRIPT_INT to embed your Julia code in Tableau, such as this simple snippet that calculates sphere volume:


SCRIPT_INT('
library(XRJulia)
if (!exists("ev")) ev <- RJulia()
y <- juliaEval("
4 / 3 * %s * ' + STR([Factor]) + ' * pi ^ 3
", .arg1)
',
[Radius])

6. Use Calculated Field in Tableau

You can now use your Julia calculation as an alternate Calculated Field in your Tableau worksheet:

Using Julia within calculations in Tableau (click to enlarge)
Using Julia calculations within Tableau (click to enlarge)

Feel free to download the Tableau Packaged Workbook (twbx) here.

Further Reading: Mastering Julia

If you want to go beyond this tutorial and explore more about Julia in the context of data science, I recommend the book Mastering Julia. You can find it here.

Further Reading: Visual Analytics with Tableau

Join the data science conversation and follow me on Twitter and LinkedIn for more tips, tricks, and tutorials on Julia in Tableau and other data analytics topics. If you’re looking to master Tableau, don’t forget to preorder your copy of my upcoming book, Visual Analytics with Tableau. (Amazon). It offers an in-depth exploration of data visualization techniques and best practices.

Also, feel free to comment and share my Tableau Julia Tutorial tweet:

Tableau Conference TC17 Sneak Peek: Integrating Julia for Advanced Analytics

Demo: using Julia within calculations in Tableau (click to enlarge)
Demo: using Julia calculations within Tableau (click to enlarge)

We have already seen some love from Tableau for R and Python, boosting Tableau’s Advanced Analytics capabilities.

So what is the next big thing for our Data Science Rockstars? Julia!

Who is Julia?

JuliaJulia logo is a high-level dynamic programming language introduced in 2012. Designed to address the needs of high-performance numerical analysis its syntax is very similar to MATLAB. If you are used to MATLAB, you will be very quick to get on track with Julia.

Compared to R and Python, Julia is significantly faster (close to C and FORTRAN, see benchmark). Based on Tableau’s R integration, Julia is a fantastic addition to Tableau’s Advanced Analytics stack and to your data science toolbox.

Where can I learn more?

Do you want to learn more about Advanced Analytics and how to leverage Tableau with R, Python, and Julia? Meet me at the 2017 Tableau Conferences in London, Berlin, or Las Vegas and join my Advanced Analytics sessions:

Will there be an online tutorial?

Yes, of course! I published tutorials for R and Python on this blog. And I will also publish a Julia tutorial soon. Feel free to follow me on Twitter @xlth, and leave me your feedback/suggestions in the comment section below.

Further reading: Mastering Julia

A German translation of this post is published on the official Tableau blog: Tableau Conference On Tour Sneak Peek: Julia-Integration für Advanced Analytics

Update 11 Oct 2017: The Julia+Tableau tutorial blog post is now published.

Price and Sentiment Analysis: Why is Bitcoin Going Down?

Bitcoin Price and Sentiment Analysis with variable Moving Average: click to open interactive Tableau dashboard with annotations
Bitcoin Price and Sentiment Analysis with variable Moving Average: click to open interactive Tableau dashboard with annotations

Bitcoin has become one of the trendy investment assets in the recent years. Whenever bitcoin prices approach historical highs, every investor should watch the currency closely. Bitcoin rallied by more than 20% in the first days of 2017, crossing the $1000 mark for the first time since November 2013.

As many experienced bitcoin traders will remember, the first $1000 peak was a case of obvious over exuberance. Bitcoin was hot, plenty of money was pouring into it. Bitcoin investors got too excited, causing a price surge. Prices then rebounded and suffered a long-term collapse shortly after.

Moving Average Convergence/Divergence Indicator

Many traders rely on a Moving Average Convergence/Divergence (MACD) indicator. The MACD is a measure of the convergence and divergence between two EMAs (usually 12 and 26 days) and is calculated by subtracting the two of them. The signal line is constructed by creating an EMA (usually 10 days) of the signal line.

The signal line crossing the MACD from above is a buy signal. The signal line crossing the MACD from below is a sell signal. Relying only on momentum-based indicators (such as the MACD) and optimization-based models, however, will most certainly fail to indicate heavy price drops, as the drop in late 2016.

Predicting Fear with Sentiment Analysis

In late 2016 a lot of people began to pour money into bitcoin again. This time because they were worried that stock markets and other assets were due for a drop. For investors, it is essential to figure out whether or not these fears are actually founded. However, such „safe assets“ are prone to suffering from bubbles. People get scared, get invested into gold, or bitcoin, then realize that their fears were unfounded. As a result bitcoin prices could plummet.

So how to catch emotions such as fear in advance? Twitter is a valuable source of information and emotion. It certainly influences the stock market and can help to predict the market. Sentiment analysis can lead price movements by up to two days. Negative sentiment, however, is reflected in the market much more than positive sentiment. This is probably because most people tweet positive things about bitcoins most of the time. Even more positive news occurred after breaking the $1000 barrier.

This content is part of the session “Price and Sentiment Analysis: Why is Bitcoin Going Down?” that I deliver at the Frankfurt Bitcoin Colloquium. Have a look on my upcoming sessions!

[Update 14 Jun 2017]: Axis for Moving Average adjusted. Relative Date selector added with last 6 month as default. Screenshot updated.

Feel free to share the Bitcoin Price and Sentiment Analysis dashboard, which is also featured as Viz of the Day on Tableau Public: