Starcom consulting limited logo

Site Reliability Engineer

Starcom consulting limited
5 days ago
Contract
On-site
Germany

Job Title: Site Reliability Engineer (SRE)

Job Location: Germany (Remote)

Job Type: Fixed Term Contract (12 Months)

Responsibilities
•    Design, develop, and maintain observability platform component and integrations across Prometheus, Thanos, Grafana, OpenTelemetry, and streaming telemetry systems.
•    Contribute to architecture and technical design of scalable monitoring solutions running on Kubernetes, Docker, and cloud-native environments.
•    Implement standardized instrumentation Using OpenTelemetry, SDKs, collectors, exporters and agent across services and infrastructure. 
•    Built and Optimise Telemetry pipelines for metrics, Logs, and traces using Prometheus, OTEL collector, Kafka/Streaming pipelines, and time-series backends. 
•    Develop advance PromQL queries, recording rules, and AlertManager logic for complex monitoring scenarios.
•    Create reusable dashboards and visualisation templates using Grafana (and Perses if applicable.) 
•    Automate deployments and configuration using Git, GitHub/GitLab, Jenkins, ArgoCD, Helm, and Infrastructure-as-Code practises. 
•    Troubleshoot and optimise performance across collectors, exporters, storage backends and query layers. 
•    Support performance testing, load validation, and reliability analysis of observability components.
•    Collaborate with engineering and SRE teams to onboard services and improve Telemetry coverage across platforms.
•    Document implementations, standards, and operational procedures.

Required skills and expertise
•    Strong programming experience in Go, Python, Or Java With focus on backend or platform engineering.
•    Hands-on expertise with Prometheus ecosystem (Prometheus, Alertmanager, exporters, Pushgateway) And PromQL.
•    Experience implementing OpenTelemetry instrumentation, collectors, processors and pipelines.
•    Strong knowledge of Kubernetes, containers, Helm and MicroServices architecture.
•    Experience with CI/CD tools such as Jenkins, GitHub Actions, GitLab CI, Or Argo CD.
•    Understanding of distributed systems, Performance tuning, Debugging, and profiling techniques.
•    Fimilarity with streaming and messaging systems (example, Kafka or equivalent) and time-series databases.
•    Experience building or integrating REST/gRPC APIs.
•    Proficiency in Git workflows, Scripting (Bash/Python), And automation frameworks.
•    Understanding of SNMP, exporters, and infrastructure/device telemetry collections.
•    Awareness of sec Security, RBAC, secret management, and compliance requirements in platform environments.