CERN: Where Big Bang Theory meets Big Data Analytics

Screenshot of SQL Plan Baselines with Oracle Enterprise Manager at CERN
Screenshot of SQL Plan Baselines with Oracle Enterprise Manager at CERN

The volume, variety, velocity and veracity of data generated by the LHC experiments at CERN continue to reach unprecedented levels: some 22 petabyte of data this year, after throwing away 99% of what is recorded by the LHC detectors. This phenomenal growth means that not only must we understand Big Data in order to decipher the information that really counts, but we also must understand the opportunities of what we can achieve with Big Data Analytics.

The raw data from the experiments is stored in structured files (using CERN’s ROOT Framework), which are better suited to physics analysis. Transactional relational databases (Oracle 11g with Real Application Clusters) store metadata information that is used to manage that raw data. For metadata residing on the Oracle Database, Oracle TimesTen serves as an in-memory cache database. The raw data is analysed on PROOF (Parallel ROOT Facility) clusters. Hadoop Distributed File System (HDFS), however, is used to store the monitoring data.

Just as in the CERN example, there are some significant trends in Big Data Analytics:

  • Descriptive Analytics, such as standard business reports, dashboards and data visualization, have been widely used for some time, and are the core applications of traditional Business Intelligence. This ad hoc analysis looks at the static past and reveal what has occurred. One recent trend, however, is to include the findings from Predictive Analytics, such as forecasts of sales on the dashboard.
  • Predictive Analytics identify trends, spot weaknesses or determine conditions for making decisions about the future. The methods for Predictive Analytics such as machine learning, predictive modeling, text mining, neural networks and statistical analysis have existed for some time. Software products such as SAS Enterprise Miner have made these methods much easier to use.
  • Discovery Analytics is the ability to analyse new data sources. This creates additional opportunities for insights and is especially important for organizations with massive amounts of various data.
  • Prescriptive Analytics suggests what to do and can identify optimal solutions, often for the allocation of scarce resources. Prescriptive Analytics has been researched at CERN for a long time but is now finding wider use in practice.
  • Semantic Analytics suggests what you are looking for and provides a richer response, bringing some human level into Analytics that we have not necessarily been getting out of raw data streams before.

As these trends bear fruit, new ecosystems and markets are being created for broad cross-enterprise Big Data Analytics. Use cases like the CERN’s LHC experiments provide us with greater insight into how important Big Data Analytics is in the scientific community as well as to businesses.