Open Systems AG
Zürich
Senior Observability Platform Engineer (80-100%)
- 04 July 2026
- 80 – 100%
- Permanent position
- Zürich
About the job
Senior Observability Platform Engineer (80-100%)
Location: Zurich / Bern
We are seeking a highly skilled and experienced Senior Platform Observability Engineer to join our team. In this role, you will be responsible for ensuring the reliability, scalability, and efficiency of our core observability infrastructure that supports our engineering teams and customer-facing portal. Your work will include evolving these systems and participate in fostering adoption of observability best-practices in the organization.
Key Responsibilities
- Configure, operate, and enhance our observability platforms and frameworks (Clickhouse, Thanos, Loki, Tempo, OpenTelemetry Collector + custom processors).
- Continuously improve and drive organization-wide adoption of observability best-practices, ensuring comprehensive monitoring, logging, and tracing.
- Develop and maintain automated solutions for monitoring, alerting, and incident response.
System Optimization
- Collaborate with engineering teams to understand their needs and provide robust, scalable solutions utilizing the observability platform.
- Optimize system performance and ensure high availability through proactive monitoring and maintenance.
- Develop and implement strategies for cost optimization, capacity planning, and performance tuning.
Innovation and Improvement
- Stay up-to-date with the latest industry trends, tools, and technologies to drive continuous improvement.
- Experiment with and implement new tools, especially around observability and telemetry, to enhance platform capabilities.
- Evaluate and integrate OpenTelemetry Collector where beneficial to enhance telemetry data collection and analysis.
Required Skills and ExperienceEssential/Required Skills
- Observability Platforms: Proven track record in managing at least one of the following observability stacks: Thanos, Mimir, Cortex, Tempo, Loki or Clickhouse; with the ability to configure, operate, and improve these systems.
- Kubernetes: Deep understanding of Kubernetes architecture and hands-on experience in managing resources on clusters.
- Helm: Experience in writing and maintaining Helm charts, and understanding third-party charts to deploy and manage Kubernetes resources efficiently.
- GitOps: Experience in continuous delivery and GitOps practices (version control, CI/CD pipelines).
- Agentic Development: Hands-on experience using agentic AI workflows (e.g., GitHub Copilot, Claude Code, Cursor, or similar) to accelerate day-to-day engineering.
- Docker: Expertise in containerization, orchestration, and optimization of Docker workloads.
Desirable Skills
- Coding Experience: Coding knowledge in Golang or a similar language.
- Open Source: contributor to open source project written in Golang or a similar language.
- OpenTelemetry Collector: Knowledge of the OpenTelemetry Collector or direction contribution to project.
- AI for Observability: Interest in applying AI/ML to the observability domain like anomaly detection on metrics and logs, automated root-cause analysis, alert noise reduction and correlation, and natural-language querying over telemetry
Soft Skills
- Quick Learner: Ability to quickly grasp new concepts and technologies, adapting to the evolving needs of the organization.
- Communication: Excellent communication skills, with the ability to convey complex technical concepts to both technical and non-technical stakeholders.
- Customer Focus: Keen awareness of customer needs and the impact of platform operations on both internal engineering teams and external users.
- Collaborative Mindset: Strong ability to work collaboratively in cross-functional teams, contributing to a culture of continuous improvement and innovation.
Education and Experience
- Bachelor’s degree in Computer Science, Information Technology, or related field (or equivalent experience).
- 5+ years of experience in platform engineering, site reliability engineering, or a related role.
- Demonstrated experience in managing large-scale infrastructures and observability platforms (such as Thanos, Mimir, Cortex, Tempo, Loki, Clickhouse).
- Technical Expertise
- Observability Platform Operations
- You are excited by the prospect of managing more than 20 TB of telemetry data per day, originating from a fleet of 10 000+ nodes (including linux hosts, k8s clusters, VMs).
What we offer:
You’ll be among people who believe in:
Caring PASSIONATELY about keeping our customers safe – We’re dedicated to solving problems. Whatever it takes.
Thinking UNCONVENTIONALLY to stay ahead – The world never fails to surprise us. So let’s surprise it first.
Doing the hard work to make things SIMPLE – Craft and hone something that delights in its simplicity.
Working COLLABORATIVELY to build success – The power of the team will always make us faster and better.
As a testament to this, Open Systems has been recognized as an outstanding place to work. You’ll be surrounded by smart teams who enrich your experience and provide opportunities you will need to develop your skills and advance your career.
We look forward to receiving your online application (please note that you have to compress your application into two attachments).
Come as you are! We search for amazing people of diverse backgrounds, experiences, abilities, and perspectives. Open Systems welcomes and encourages diversity in the workplace regardless of race, gender, religion, age, sexual orientation, disability, or veteran status.
Direct applications only will be considered.
About Open Systems:
Open Systems is an international provider of co-managed SASE operating models, helping enterprises and organizations securely operate complex hybrid and multicloud environments. Founded in 1990 and headquartered in Switzerland, the company generates over USD 100 million in annual revenue and supports global enterprise customers with more than 60,000 employees across operations in more than 180 countries.