I have enjoyed research for the last four years. Yet, I have decided to resign from my postgraduate position at CERN, and to move to Capgemini. I will continue on the areas I love: Data and Analytics!
Capgemini is one of the world’s largest consulting corporations. Like many other consulting company, Capgemini does not yet have a dedicated team to offer effective strategies and solutions employing Big Data, Analytics and Machine Learning.
I love these technologies, and I am very confident that I will elaborate a business development plan to drive business growth, through customer and market definition, including new services such as:
Data Science Strategy (enable organizations to solve business problems increasingly with insights from analytics)
Consulting (answering questions using data)
Development (building custom data-related tools like interactive dashboards, pipelines, customized Hadoop setup, data prep scripts…)
Training (across a variety of skill levels; from basic dashboard design to deep dive in R, Python and D3.js)
This plan is also accompanied by a go-to-market strategy, which I don’t want to unveil on my blog. Maybe retrospective in some years, so stay tuned…
Wow, time flies. One year has passed since I started to work at CERN as a data scientist. CERN, surrounded by snow-capped mountains and Lake Geneva, is known for its particle accelerator Large Hadron Collider (LHC) and its adventure in search of the Higgs boson. Underneath the research there is an tremendous amount of data that are analysed by data scientists.
Filters, known as High Level Triggers, reduce the flow of data from a petabyte (PB) a second to a gigabyte per second, which is then transferred from the detectors to the LHC Computing Grid. Once there, the data is stored on about 50PB of tape storage and 20PB of disk storage. The disks are managed as a cloud service (Hadoop), on which up to two millions of tasks are performed every day.
CERN relies on software engineers and data scientists to streamline the management and operation of its particle accelerator. It is crucial for research to allow real-time analysis. Data extractions need to remain scalable and predictive. Machine learning is applied to identify new correlations between variables (LHC data and external data) that were not previously connected.
So what is coming up next? Scalability remains a very important area, as the CERN’s data will continue to grow exponentially. However, the role of data scientists goes much further. We need to transfer knowledge throughout the organisation and enable a data-driven culture. In addition, we need to evaluate and incorporate new innovative technologies for data analysis that are appropriate for our use cases.
just realized that today is my first year anniversary working for CERN. Thanks for all the memories made within the year!