DevOps / Machine Learning Engineer (IT-PW-PI-2024-75-LD)

CERN European Organization for Nuclear Research

  • Publication date:

    05 June 2024
  • Contract type:

    Permanent position
  • Place of work:

    Genf

DevOps / Machine Learning Engineer (IT-PW-PI-2024-75-LD)

Company Description

At CERN, the European Organization for Nuclear Research, physicists and engineers are probing the fundamental structure of the universe. Using the world's largest and most complex scientific instruments, they study the basic constituents of matter - fundamental particles that are made to collide together at close to the speed of light. The process gives physicists clues about how particles interact, and provides insights into the fundamental laws of nature. Find out more on http://home.cern.

Job Description

Introduction

Interested in helping shape the next generation systems handling the very large amount of data coming out of CERN physics detectors? Passionate about machine learning (ML) and how it will reshape how our experiments work?

Join the CERN IT Platforms and Workflows Group and the Next Generation Triggers project and take a lead role defining our future infrastructure.

You will, in close collaboration with the CERN physics experiments:

  • Lead the effort to design the new system, including specification of hardware resources and participation in tender processes;
  • Evaluate the performance of multiple reference use cases. Develop and run extensive benchmarks against different resource types and configurations, covering distributed training, hyper-parameter optimization and inference;
  • Lead the effort for research and development of a shared platform for machine learning (MLOps) and GPU accelerated workloads, serving the different CERN teams involved. Iterate with end users on different prototype solutions and engage with industry leaders to ensure long term sustainability of our choices.

Functions

In the CERN IT department and the Next Generation Triggers project, you will:

  • Supervise younger team members and coordinate tasks in the Next Generation Triggers project in the area of computing infrastructure and platforms;
  • Research, develop and deploy multiple prototypes for a scalable platform serving machine learning and other accelerated workloads. Report on aspects of performance, total cost of ownership and sustainability; 
  • Contribute to the efficient use of GPU and other accelerator technologies in both the project and the department, including on-premises and external resources (public cloud and HPC);
  • Ensure appropriate collaboration with vendors, research and industry partners, looking for opportunities for further optimization of our systems and platforms in a fast-moving environment.

Qualifications

Master's degree or equivalent relevant experience in the field of computing engineering or a related field.

Experience:

The candidate should have:

  • Demonstrated experience in the implementation and support of platforms and services for Machine Learning (ML);
  • Knowledge of containers and container orchestration systems, in particular Kubernetes and other tools in the cloud native ecosystem;
  • Familiarity and previous experience with DevOps practices.

Additional experience in the following areas would be an asset:

  • Experience in operating and optimising large scale infrastructures;
  • Previous experience deploying and managing infrastructure and services in public cloud providers.

Technical competencies:

  • Knowledge of operating systems;
  • Knowledge of system configuration tools;
  • Architecture and design of ICT systems;
  • Identification and selection of relevant emerging ICT technologies;
  • Knowledge and application of software life-cycle tools and procedures.

Behavioural competencies:

  • Working in Teams: working well in groups and readily fitting into a team; participating fully and taking an active role in team activities Cooperating constructively with others in the pursuit of team goals; balancing personal goals with team goals.
  • Solving Problems: addressing complex problems by breaking them down into manageable components Recognizing what is essential; discriminating between important and peripheral information and being able to see the whole picture Testing solutions for long-term suitability, cross-checking with all concerned before implementation.
  • Managing Self: taking initiative beyond regular tasks and making things happen Working well autonomously; taking on activities and tasks without prompting.
  • Building Relationships: showing appreciation for the ideas and contributions of others and encourages others to express their views, even if controversial Being able to put self in the shoes of others in order to understand their needs and interests.

Language skills:

Spoken and written English: ability to understand and speak the language in professional contexts. Ability to draw-up technical specifications and/or scientific reports and to make oral presentations. The willingness to acquire French is an asset.

Additional Information

Eligibility and closing date:

Diversity has been an integral part of CERN's mission since its foundation and is an established value of the Organization. Employing a diverse workforce is central to our success. We welcome applications from all Member States and Associate Member States.

This vacancy will be filled as soon as possible, and applications should normally reach us no later than 06.07.2024.

Employment Conditions

Contract type: Limited duration contract (3 years). Subject to certain conditions, holders of limited-duration contracts may apply for an indefinite position.

These functions require:

  • Work during nights, Sundays and official holidays, when required by the needs of the Organization.

Job grade: 6-7

Job reference: IT-PW-PI-2024-75-LD

Benchmark Job Title: Computing Engineer