Your role
Overview:- Build and run a reliable platform for services and data workflows across Kubernetes and Prefect.
- Own CI/CD, observability, security, and developer experience for Python/Go/Rust services.
Responsibilities:- Design, provision, and operate Kubernetes workloads (deployments, networking, autoscaling, storage).
- Build and maintain GitLab CI/CD pipelines for Python, Go, and Rust services (build, test, scan, release).
- Operate Prefect (agents, work queues, deployments, concurrency limits, task execution environments).
- Implement environment strategy and promotion flow (dev/staging/prod) with clear release gates.
- Create golden paths and templates for FastAPI microservices and Prefect flows.
- Manage secrets, configuration, and access (e.g., GitLab variables, K8s secrets).
- Establish observability: logging, metrics, traces, alerting, runbooks, and SLOs.
- Operate data stores (MySQL, PostgreSQL, Redis): provisioning, backups, migration execution, monitoring, and capacity planning.
- Optimise build and runtime costs (container images, caching, autoscaling, resource requests/limits).
- Lead incident response, postmortems, and reliability improvements.
Your profile
You have:- 4+ years in DevOps/SRE/Platform roles with production Kubernetes.
- Strong GitLab CI/CD experience (pipelines, runners, caching, artifact management).
- Proficiency with containers and image optimization; comfortable with Linux internals and networking.
- Hands-on with Prefect in production (deployments, flow orchestration, storage, results).
- Familiar with operating MySQL/PostgreSQL/Redis in production (availability, performance, backups).
- Scripting/automation with Python or Go; ability to read Rust build pipelines.
- Solid understanding of security fundamentals (least privilege, image scanning, SBOM, secret hygiene).
- Experience instrumenting systems and creating actionable alerts.
Nice to have:- Helm/Kustomize, policy-as-code (OPA), and basic gRPC.
- Performance tuning for highโthroughput data or API services.
- Experience in multiโtenant or multiโcluster environments.
About us
At Stelia, we are building the AI Operating System for a distributed, intelligent world. Our mission is to dismantle the boundaries between humanity and technology by creating an Enterprise AI designed for trust, resilience, and scale.