This position is posted by Jobgether on behalf of a partner company. We are currently looking for a DevOps/Observability Engineer in Canada.
This role is focused on designing and building a next-generation observability ecosystem that enables deep visibility across large-scale, distributed cloud environments. You will lead the architecture of unified telemetry pipelines, ensuring logs, metrics, and traces are efficiently collected, processed, and analyzed. Working within a modern AWS-based infrastructure, you will leverage OpenTelemetry, Kubernetes, and industry-leading monitoring tools to enhance system reliability and performance. The environment is highly technical, cloud-native, and centered on automation, scalability, and continuous improvement. You will collaborate closely with engineering teams to integrate observability into CI/CD workflows and production systems. This position offers the opportunity to shape enterprise-wide monitoring standards and directly influence operational excellence at scale.
Accountabilities:
In this role, you will design, implement, and evolve a unified observability platform that supports large-scale distributed systems and ensures operational visibility across environments.
- Architect and implement end-to-end observability pipelines using OpenTelemetry, Prometheus, Grafana, and related tooling in AWS environments
- Design scalable log, metric, and trace collection strategies, including cross-account AWS telemetry integration and centralized monitoring frameworks
- Build and optimize log aggregation, filtering, and routing systems, including integrations with Splunk and other enterprise tools
- Develop advanced alerting, dashboards, and monitoring solutions using PromQL, CloudWatch, and Alertmanager
- Implement Infrastructure as Code using Terraform to deploy and manage observability and cloud infrastructure components
- Support Kubernetes-based observability across EKS/ECS environments, ensuring full-stack visibility and reliability
- Drive cost optimization initiatives by improving telemetry efficiency, storage strategies, and data filtering approaches
- Collaborate with engineering and platform teams to embed observability into deployment pipelines and production systems
Requirements
This position requires strong technical expertise in DevOps, cloud infrastructure, and observability engineering, with a proven ability to build scalable monitoring systems in complex environments.
- 8+ years of experience in DevOps, SRE, or observability engineering roles
- Strong expertise in AWS cloud services and multi-account observability architectures
- Hands-on experience with OpenTelemetry, Prometheus, Grafana, Splunk, and CloudWatch
- Strong proficiency with Infrastructure as Code tools, particularly Terraform
- Advanced programming/scripting skills (Python, Go, or similar) for automation and tooling
- Experience with Kubernetes (EKS) and containerized environments (Docker, ECS)
- Deep understanding of logging, metrics, tracing, and distributed system observability principles
- Strong analytical, problem-solving, and systems-thinking abilities with a focus on scalability and reliability
- Excellent communication skills and ability to work in cross-functional, fast-paced engineering teams
Benefits
- Competitive compensation aligned with experience and market benchmarks
- Fully remote work setup across Canada
- Opportunity to work on large-scale, cloud-native systems and cutting-edge observability platforms
- Exposure to advanced AI, cloud, and distributed engineering environments
- Career growth within a high-performance, innovation-driven engineering culture
- Collaborative and knowledge-sharing work environment with global teams
- Continuous learning opportunities and access to modern DevOps and cloud technologies
- Inclusive and flexible work culture supporting work-life balance.