CERN: Where Big Bang Theory meets Big Data Analytics

Screenshot of SQL Plan Baselines with Oracle Enterprise Manager at CERN
Screenshot of SQL Plan Baselines with Oracle Enterprise Manager at CERN

The volume, variety, velocity and veracity of data generated by the LHC experiments at CERN continue to reach unprecedented levels: some 22 petabyte of data this year, after throwing away 99% of what is recorded by the LHC detectors. This phenomenal growth means that not only must we understand Big Data in order to decipher the information that really counts, but we also must understand the opportunities of what we can achieve with Big Data Analytics.

The raw data from the experiments is stored in structured files (using CERN’s ROOT Framework), which are better suited to physics analysis. Transactional relational databases (Oracle 11g with Real Application Clusters) store metadata information that is used to manage that raw data. For metadata residing on the Oracle Database, Oracle TimesTen serves as an in-memory cache database. The raw data is analysed on PROOF (Parallel ROOT Facility) clusters. Hadoop Distributed File System (HDFS), however, is used to store the monitoring data.

Just as in the CERN example, there are some significant trends in Big Data Analytics:

  • Descriptive Analytics, such as standard business reports, dashboards and data visualization, have been widely used for some time, and are the core applications of traditional Business Intelligence. This ad hoc analysis looks at the static past and reveal what has occurred. One recent trend, however, is to include the findings from Predictive Analytics, such as forecasts of sales on the dashboard.
  • Predictive Analytics identify trends, spot weaknesses or determine conditions for making decisions about the future. The methods for Predictive Analytics such as machine learning, predictive modeling, text mining, neural networks and statistical analysis have existed for some time. Software products such as SAS Enterprise Miner have made these methods much easier to use.
  • Discovery Analytics is the ability to analyse new data sources. This creates additional opportunities for insights and is especially important for organizations with massive amounts of various data.
  • Prescriptive Analytics suggests what to do and can identify optimal solutions, often for the allocation of scarce resources. Prescriptive Analytics has been researched at CERN for a long time but is now finding wider use in practice.
  • Semantic Analytics suggests what you are looking for and provides a richer response, bringing some human level into Analytics that we have not necessarily been getting out of raw data streams before.

As these trends bear fruit, new ecosystems and markets are being created for broad cross-enterprise Big Data Analytics. Use cases like the CERN’s LHC experiments provide us with greater insight into how important Big Data Analytics is in the scientific community as well as to businesses.

CERN: The world’s first website went online 20 years ago today

CERN website dispayed in Line Mode Browser
CERN website dispayed in Line Mode Browser

On this day 20 years ago the world’s first website went live. The website, created by Tim Berners-Lee at CERN, was a basic text page with hyperlinks and went live on August 6, 1991.

The website was hosted on Berners-Lees‘ NeXT computer, the first web server ever, which had a note taped to the front that said: „This machine is a server. DO NOT POWER DOWN“.

NeXT computer used as first World Wide Web server
NeXT computer used as first World Wide Web server

Today this computer is displayed in the CERN Computer Center, which is just located next to my office.

[Update 30 Apr 2013]: CERN is bringing the very first website back to life at its original URL. If you’d like to see it, point your browser to: http://info.cern.ch/hypertext/WWW/TheProject.html

Data Science Research: Unlocking the Secrets of the Universe with Big Data at CERN

Time really flies when you immerse yourself in the world of data science research and unravel the mysteries of the universe! It’s been an incredible journey over the past year as I’ve immersed myself in the world of data science at CERN. For those unfamiliar, CERN — set against a stunning backdrop of snow-capped mountains and tranquil Lake Geneva — is home to the Large Hadron Collider (LHC), the world’s most powerful particle accelerator. But what often goes unnoticed is the critical role that data science plays in powering this colossal machine and its quest for groundbreaking discoveries like the elusive Higgs boson.

The Data Tsunami: A Behind-The-Scenes Look

Imagine having to sift through one petabyte (PB) of data every second — yes, you read that right. That’s the amount of data generated by the LHC’s detectors. To make it manageable, high-level triggers act as an advanced filtering system, reducing this torrent of data to a more digestible gigabyte per second. This filtered data then finds its way to the LHC Computing Grid.

High-Level Trigger data flow, crucial for data science research in the ALICE experiment at CERN.
High-Level Trigger data flow, crucial for data science research in the ALICE experiment at CERN.

About 50PB of this data is stored on tape, and another 20PB is stored on disk, managed by a Hadoop-based cloud service. This platform runs up to two million tasks per day, making it a beehive of computational activity.

The Role of Data Science Research at CERN

Data scientists and software engineers are the unsung heroes at CERN, ensuring the smooth operation of the LHC and subsequent data analysis. Machine learning algorithms are used to discover new correlations between variables, including both LHC data and external data sets. This is critical for real-time analysis, where speed and accuracy are of the essence.

While managing the exponential growth of data is an ongoing challenge, the role of data scientists at CERN goes far beyond that. We are at the forefront of fostering a data-driven culture within the organization, transferring knowledge, and implementing best practices. In addition, as technology continues to evolve, part of our role is to identify and integrate new, cutting-edge tools that meet our specific data analysis needs.

The Road Ahead: A Data-Driven Journey

Looking ahead, scalability will remain a key focus as CERN’s data continues to grow. But the horizon of possibilities is vast. From exploring quantum computing to implementing advanced AI models, the role of data science in accelerating CERN’s research goals will only grow.

As I celebrate my one-year anniversary at CERN, I’m filled with gratitude and awe for what has been an incredible journey. From delving into petabytes of data to pushing the boundaries of machine learning in research, it’s been a year of immense learning and contribution.

For more insights into the fascinating universe of CERN and the role data science plays in it, be sure to follow me on Twitter for regular CERN updates and data science insights:

Music on the Lawn & Beach Party

Am Samstag war Music on the Lawn, ein kleines Konzert auf dem Gelände von CERN. Die Mitglieder der Bands sind allesamt Kollegen. Besonders gut spielten „Miss Proper & the Moving Targets“. 😀

Weniger Rock, dafür um so mehr House und Trance gab es dann am Abend. Am Lac Léman ging es weiter mit einer Beach Party.

Top 10 Angels&Demons Questions

CERN Exhibition: Top 10 Angels&Demons Questions
CERN Exhibition: Top 10 Angels&Demons Questions

Gestern haben wir Angels & Demons (deutscher Titel: Illuminati) im Kino gesehen. Die Verfilmung des gleichnamigen Bestsellers von Dan Brown war vor allem visuell sehr ansprechend. Tom Hanks hat wie schon in Da Vinci Code souverän die Rolle des Protagonisten Robert Langdon verkörpert.

Ein Teil der Handlung des Films spielt am CERN. Tatsächlich wurden einige Einstellungen am ATLAS-Detektor des LHC gedreht. Regisseur Ron Howard sah sich ebenfalls das CERN-Gelände an, um den Film authentischer zu gestalten. Die Herstellung einer Bombe aus Antimaterie ist hingegen ebenso Fiktion wie die „Schöpfung aus dem Nichts“, welche im Film lediglich dazu dient den Konflikt zwischen Religion und Naturwissenschaft zu entfachen.

CERN hat Angels & Demons eigens eine Ausstellung (siehe Foto oben) und eine Website gewidmet, um „the science behind the story“ zu erläutern und auf häufig gestellte Fragen („Does CERN create black holes?“, etc.) einzugehen.