CERN: Where Big Bang Theory meets Big Data Analytics

Screenshot of SQL Plan Baselines with Oracle Enterprise Manager at CERN
Screenshot of SQL Plan Baselines with Oracle Enterprise Manager at CERN

The volume, variety, velocity and veracity of data generated by the LHC experiments at CERN continue to reach unprecedented levels: some 22 petabyte of data this year, after throwing away 99% of what is recorded by the LHC detectors. This phenomenal growth means that not only must we understand Big Data in order to decipher the information that really counts, but we also must understand the opportunities of what we can achieve with Big Data Analytics.

The raw data from the experiments is stored in structured files (using CERN’s ROOT Framework), which are better suited to physics analysis. Transactional relational databases (Oracle 11g with Real Application Clusters) store metadata information that is used to manage that raw data. For metadata residing on the Oracle Database, Oracle TimesTen serves as an in-memory cache database. The raw data is analysed on PROOF (Parallel ROOT Facility) clusters. Hadoop Distributed File System (HDFS), however, is used to store the monitoring data.

Just as in the CERN example, there are some significant trends in Big Data Analytics:

  • Descriptive Analytics, such as standard business reports, dashboards and data visualization, have been widely used for some time, and are the core applications of traditional Business Intelligence. This ad hoc analysis looks at the static past and reveal what has occurred. One recent trend, however, is to include the findings from Predictive Analytics, such as forecasts of sales on the dashboard.
  • Predictive Analytics identify trends, spot weaknesses or determine conditions for making decisions about the future. The methods for Predictive Analytics such as machine learning, predictive modeling, text mining, neural networks and statistical analysis have existed for some time. Software products such as SAS Enterprise Miner have made these methods much easier to use.
  • Discovery Analytics is the ability to analyse new data sources. This creates additional opportunities for insights and is especially important for organizations with massive amounts of various data.
  • Prescriptive Analytics suggests what to do and can identify optimal solutions, often for the allocation of scarce resources. Prescriptive Analytics has been researched at CERN for a long time but is now finding wider use in practice.
  • Semantic Analytics suggests what you are looking for and provides a richer response, bringing some human level into Analytics that we have not necessarily been getting out of raw data streams before.

As these trends bear fruit, new ecosystems and markets are being created for broad cross-enterprise Big Data Analytics. Use cases like the CERN’s LHC experiments provide us with greater insight into how important Big Data Analytics is in the scientific community as well as to businesses.

Gothic Majesty of Siena

Duomo di Siena
Duomo di Siena
Streets of Siena's medieval center
Streets of Siena’s medieval center
Basilica di San Domenico
Basilica di San Domenico

Truth be told, the real gems of Tuscany are the historic town and cities. One of my favorite is the Gothic majesty of Siena. Legend tells us that Siena was founded by the son of Remus, and the symbol of the wold feeding the twins Romulus and Remus is as ubiquitous in Siena as it is in Rome.

The streets of Siena’s medieval center are humongous and gorgeous. During the day the stone ground sizzles under the sun and the wonderfully crafted buildings bake from exposure from an incredible clear sky. To be on the safe side and because I love film grain, I decided to load my camera with an ISO 200 Fuji film to capture the town (click on the photos to enlarge them and to see the grain).

Our first stop was Duomo di Siena, a cathedral originally designed and completed between 1215 and 1263 and Siena’s main landmark. The dome rises from a hexagonal base with supporting columns. The magnificent facade of white, green and red polychrome marble was designed by Giovanni Pisano. The lantern atop was added by Gian Lorenzo Bernini.

Later we visited the Basilica di San Domenico, which was constructed between 1226 and 1265, but was enlarged in the 14th century resulting in the stunning Gothic appearance it has now. In the afternoon we continued to stroll around Siena and had plenty of Gelati at Palazzo Publicco…

bitcoin.de: Erster deutscher Marktplatz für Bitcoins

Bitcoins sind derzeit auch bei uns am CERN ein brandheißes Thema. Innerhalb weniger Wochen stieg der Wert eines Bitcoins (BTC) von 20 Cent im Dezember 2010 auf Größenordnungen von bis zu 30 Dollar. Dennoch lohnt sich das Mining kaum, zumindest nicht zu den aktuellen Strompreisen.

Die Bitcoin-Börse bitcoin.de schafft hier nun Abhilfe! Ein gutes halbes Jahr später, am 26. August 2011, hat der erste deutsche Marktplatz zum Kaufen und Verkaufen von Bitcoins den Handel aufgenommen. Auf bitcoin.de können User auf einfache Art und Weise Bitcoins an andere User verkaufen oder von diesen kaufen.

Dafür ist es erforderlich, dass sich die User bei bitcoin.de registrieren und, insofern sie als Verkäufer auftreten wollen, auf ihr Benutzerkonto ein Bitcoin-Guthaben übertragen. Sobald für die eigenen Bitcoins ein Käufer gefunden wurde, werden automatisch alle Informationen zur Bezahlung an den Käufer übermittelt.

Die Bezahlung der Bitcoins erfolgt direkt zwischen Käufer und Verkäufer. Erst wenn die Zahlung beim Verkäufer eingegangen ist, werden die Bitcoins abzüglich einer geringen Gebühr aus dem Guthaben des Verkäufers in das Guthaben des Käufers übertragen.

CERN: The world’s first website went online 20 years ago today

CERN website dispayed in Line Mode Browser
CERN website dispayed in Line Mode Browser

On this day 20 years ago the world’s first website went live. The website, created by Tim Berners-Lee at CERN, was a basic text page with hyperlinks and went live on August 6, 1991.

The website was hosted on Berners-Lees‘ NeXT computer, the first web server ever, which had a note taped to the front that said: „This machine is a server. DO NOT POWER DOWN“.

NeXT computer used as first World Wide Web server
NeXT computer used as first World Wide Web server

Today this computer is displayed in the CERN Computer Center, which is just located next to my office.

[Update 30 Apr 2013]: CERN is bringing the very first website back to life at its original URL. If you’d like to see it, point your browser to: http://info.cern.ch/hypertext/WWW/TheProject.html

Data Science Research: Unlocking the Secrets of the Universe with Big Data at CERN

Time really flies when you immerse yourself in the world of data science research and unravel the mysteries of the universe! It’s been an incredible journey over the past year as I’ve immersed myself in the world of data science at CERN. For those unfamiliar, CERN — set against a stunning backdrop of snow-capped mountains and tranquil Lake Geneva — is home to the Large Hadron Collider (LHC), the world’s most powerful particle accelerator. But what often goes unnoticed is the critical role that data science plays in powering this colossal machine and its quest for groundbreaking discoveries like the elusive Higgs boson.

The Data Tsunami: A Behind-The-Scenes Look

Imagine having to sift through one petabyte (PB) of data every second — yes, you read that right. That’s the amount of data generated by the LHC’s detectors. To make it manageable, high-level triggers act as an advanced filtering system, reducing this torrent of data to a more digestible gigabyte per second. This filtered data then finds its way to the LHC Computing Grid.

High-Level Trigger data flow, crucial for data science research in the ALICE experiment at CERN.
High-Level Trigger data flow, crucial for data science research in the ALICE experiment at CERN.

About 50PB of this data is stored on tape, and another 20PB is stored on disk, managed by a Hadoop-based cloud service. This platform runs up to two million tasks per day, making it a beehive of computational activity.

The Role of Data Science Research at CERN

Data scientists and software engineers are the unsung heroes at CERN, ensuring the smooth operation of the LHC and subsequent data analysis. Machine learning algorithms are used to discover new correlations between variables, including both LHC data and external data sets. This is critical for real-time analysis, where speed and accuracy are of the essence.

While managing the exponential growth of data is an ongoing challenge, the role of data scientists at CERN goes far beyond that. We are at the forefront of fostering a data-driven culture within the organization, transferring knowledge, and implementing best practices. In addition, as technology continues to evolve, part of our role is to identify and integrate new, cutting-edge tools that meet our specific data analysis needs.

The Road Ahead: A Data-Driven Journey

Looking ahead, scalability will remain a key focus as CERN’s data continues to grow. But the horizon of possibilities is vast. From exploring quantum computing to implementing advanced AI models, the role of data science in accelerating CERN’s research goals will only grow.

As I celebrate my one-year anniversary at CERN, I’m filled with gratitude and awe for what has been an incredible journey. From delving into petabytes of data to pushing the boundaries of machine learning in research, it’s been a year of immense learning and contribution.

For more insights into the fascinating universe of CERN and the role data science plays in it, be sure to follow me on Twitter for regular CERN updates and data science insights: