Challenges of Big Data Analytics in High-Energy Physics

Challenges of Big Data Analytics: volume, variety, velocity and veracity
Screenshot of CERN Big Data Analytics presentation

There are four key issues to overcome if you want to tame Big Data: volume (quantity of data), variety (different forms of data), velocity (how fast the data is generated and processed) and veracity (variation in quality of data). You have to be able to deal with lots and lots, of all kinds of data, moving really quickly.

That is why Big Data Analytics has a huge impact on how we plan CERN’s overall technology strategy as well as specific strategies for High-Energy Physics analysis. We want to profit from our data investment and extract the knowledge. This has to be done in a proactive, predictive and intelligent way.

The following presentation shows you how we use Big Data Analytics to improve the operation of the Large Hardron Collider.

Displaying Dimuon Events from the CMS Detector using D3.js

Physicists working on the CMS Detector
Physicists working on the CMS Detector

I became a Python geek and GnuPlot maniac since I joined CERN around three years ago. I have to admit, however, that I really enjoy the flexibility of D3.js, and its capability to render histograms directly in the web browser.

D3 is a JavaScript library for manipulating documents based on data. This library helps you to bring data to life leveraging HTML, CSS and SVG, and embed it in your website.

The following example loads a CSV file, which includes 10,000 dimuon events (i.e. events containing two muons) from the CMS detector, and displays the distribution of the invariant mass M (in GeV, in bins of size 0.1 GeV):

Feel free to download the sample CSV dataset here.

Further reading: D3 Cookbook

CERN: Where Big Bang Theory meets Big Data Analytics

Screenshot of SQL Plan Baselines with Oracle Enterprise Manager at CERN
Screenshot of SQL Plan Baselines with Oracle Enterprise Manager at CERN

The volume, variety, velocity and veracity of data generated by the LHC experiments at CERN continue to reach unprecedented levels: some 22 petabyte of data this year, after throwing away 99% of what is recorded by the LHC detectors. This phenomenal growth means that not only must we understand Big Data in order to decipher the information that really counts, but we also must understand the opportunities of what we can achieve with Big Data Analytics.

The raw data from the experiments is stored in structured files (using CERN’s ROOT Framework), which are better suited to physics analysis. Transactional relational databases (Oracle 11g with Real Application Clusters) store metadata information that is used to manage that raw data. For metadata residing on the Oracle Database, Oracle TimesTen serves as an in-memory cache database. The raw data is analysed on PROOF (Parallel ROOT Facility) clusters. Hadoop Distributed File System (HDFS), however, is used to store the monitoring data.

Just as in the CERN example, there are some significant trends in Big Data Analytics:

  • Descriptive Analytics, such as standard business reports, dashboards and data visualization, have been widely used for some time, and are the core applications of traditional Business Intelligence. This ad hoc analysis looks at the static past and reveal what has occurred. One recent trend, however, is to include the findings from Predictive Analytics, such as forecasts of sales on the dashboard.
  • Predictive Analytics identify trends, spot weaknesses or determine conditions for making decisions about the future. The methods for Predictive Analytics such as machine learning, predictive modeling, text mining, neural networks and statistical analysis have existed for some time. Software products such as SAS Enterprise Miner have made these methods much easier to use.
  • Discovery Analytics is the ability to analyse new data sources. This creates additional opportunities for insights and is especially important for organizations with massive amounts of various data.
  • Prescriptive Analytics suggests what to do and can identify optimal solutions, often for the allocation of scarce resources. Prescriptive Analytics has been researched at CERN for a long time but is now finding wider use in practice.
  • Semantic Analytics suggests what you are looking for and provides a richer response, bringing some human level into Analytics that we have not necessarily been getting out of raw data streams before.

As these trends bear fruit, new ecosystems and markets are being created for broad cross-enterprise Big Data Analytics. Use cases like the CERN’s LHC experiments provide us with greater insight into how important Big Data Analytics is in the scientific community as well as to businesses.

Gothic Majesty of Siena

Duomo di Siena
Duomo di Siena
Streets of Siena's medieval center
Streets of Siena’s medieval center
Basilica di San Domenico
Basilica di San Domenico

Truth be told, the real gems of Tuscany are the historic town and cities. One of my favorite is the Gothic majesty of Siena. Legend tells us that Siena was founded by the son of Remus, and the symbol of the wold feeding the twins Romulus and Remus is as ubiquitous in Siena as it is in Rome.

The streets of Siena’s medieval center are humongous and gorgeous. During the day the stone ground sizzles under the sun and the wonderfully crafted buildings bake from exposure from an incredible clear sky. To be on the safe side and because I love film grain, I decided to load my camera with an ISO 200 Fuji film to capture the town (click on the photos to enlarge them and to see the grain).

Our first stop was Duomo di Siena, a cathedral originally designed and completed between 1215 and 1263 and Siena’s main landmark. The dome rises from a hexagonal base with supporting columns. The magnificent facade of white, green and red polychrome marble was designed by Giovanni Pisano. The lantern atop was added by Gian Lorenzo Bernini.

Later we visited the Basilica di San Domenico, which was constructed between 1226 and 1265, but was enlarged in the 14th century resulting in the stunning Gothic appearance it has now. In the afternoon we continued to stroll around Siena and had plenty of Gelati at Palazzo Publicco… Erster deutscher Marktplatz für Bitcoins

Bitcoins sind derzeit auch bei uns am CERN ein brandheißes Thema. Innerhalb weniger Wochen stieg der Wert eines Bitcoins (BTC) von 20 Cent im Dezember 2010 auf Größenordnungen von bis zu 30 Dollar. Dennoch lohnt sich das Mining kaum, zumindest nicht zu den aktuellen Strompreisen.

Die Bitcoin-Börse schafft hier nun Abhilfe! Ein gutes halbes Jahr später, am 26. August 2011, hat der erste deutsche Marktplatz zum Kaufen und Verkaufen von Bitcoins den Handel aufgenommen. Auf können User auf einfache Art und Weise Bitcoins an andere User verkaufen oder von diesen kaufen.

Dafür ist es erforderlich, dass sich die User bei registrieren und, insofern sie als Verkäufer auftreten wollen, auf ihr Benutzerkonto ein Bitcoin-Guthaben übertragen. Sobald für die eigenen Bitcoins ein Käufer gefunden wurde, werden automatisch alle Informationen zur Bezahlung an den Käufer übermittelt.

Die Bezahlung der Bitcoins erfolgt direkt zwischen Käufer und Verkäufer. Erst wenn die Zahlung beim Verkäufer eingegangen ist, werden die Bitcoins abzüglich einer geringen Gebühr aus dem Guthaben des Verkäufers in das Guthaben des Käufers übertragen.