Blockdaemon logo

Site Reliability Engineer

Blockdaemon
Full-time
Remote
Denmark
Azure

As a Site Reliability Engineer (SRE), you will play a critical role supporting our Blockdaemon team by ensuring the reliability, scalability, and performance of our systems and services. You will collaborate closely with cross-functional teams to design, implement, and maintain robust and resilient infrastructure solutions in a Multi-Cloud environment.

The ideal candidate is passionate about automation, possesses strong analytical skills, and thrives in a fast-paced, dynamic environment.

Blockdaemon is a Blockchain Infrastructure Company operating in a multi-cloud configuration with a global footprint. The expectation for this role is a candidate capable of supporting systems & infrastructure stack across the major clouds, Google Cloud Platform (GCP) and Amazon Web Services (AWS), Azure.

Your Impact

  • System Architecture and Design: Collaborate with software engineering teams to design scalable, highly available, and resilient systems. Drive architectural improvements to enhance system reliability and performance.
  • Implement Infrastructure as Code to manage services and deployments in a multi-cloud, multi-project configuration.
  • Automation and Tooling: Develop automation tools and scripts to streamline deployment, monitoring, and incident response processes. Implement and maintain infrastructure as code frameworks.
  • Monitoring and Alerting: Configure and maintain monitoring systems to detect and mitigate potential issues proactively. Define alerting thresholds and response procedures to ensure timely incident resolution.
  • Incident Management: Respond to and resolve critical incidents, perform root cause analysis, and implement preventive measures to minimize the likelihood of recurrence. Participate in an on-call rotation to provide 24/7 support as needed.
  • Capacity Planning and Performance Optimization: Analyze system performance metrics, identify bottlenecks, and propose optimizations to improve resource utilization and efficiency.
  • Security and Compliance: Work closely with security teams to implement best practices for data protection, access control, and compliance with regulatory requirements. Conduct periodic security audits and vulnerability assessments.
  • Documentation and Knowledge Sharing: Document system configurations, procedures, and troubleshooting steps. Share knowledge and best practices with team members to foster a culture of continuous learning and improvement.

Role Requirements

Must Have:

  • Proven experience in an independent contributor role working with cloud platforms: GCP, AWS, Azure, Infrastructure-as-Code tooling: Terraform, Helm, and CI/CD orchestration platforms: GitlabCI, ArgoCD, Github Actions or similar GitOps workflows.
  • Excellent problem-solving skills and the ability to independently troubleshoot complex issues.
  • Strong communication and collaboration skills, with the ability to work effectively in cross-functional teams.
  • Strong Architectural & Security Mindset.

Should Have:

  • Strong understanding of Linux/Unix systems administration and networking concepts.
  • Hands-on experience with configuring and running monitoring tools like Prometheus, Grafana, etc.
  • 5+ years experience of maintaining infrastructure-as-code on Google Cloud Platform, Amazon Web Services and Azure.
  • Experience working in SOC 2 Type 1 and Type 2 certified companies.

Nice-to-Have:

  • Proficiency in scripting and programming languages such as BASH, Golang, Python and TypeScript.
  • 2+ years hands-on experience operating highly available Kubernetes clusters.
  • Experience being involved in incident management and resolution.
  • Experience with AI development tools and related security considerations.
  • Passion for the Blockchain Industry & Decentralised Systems.
  • Experience with Blockchain Infrastructure, either in a personal or professional capacity.

About Us:


We Power the Blockchain economy.