Description
Sr. SRE, K8s @ PulsePoint
About PulsePoint:
PulsePoint is a fast-growing healthcare technology company (with adtech roots) using real-time data to transform healthcare. We help brands and agencies interpret the hard-to-read signals across the health journey and unify these digital determinants of health with real-world data to produce the most dimensional view of the customer. Our award-winning advertising platforms use machine learning and programmatic automation to seamlessly activate this data, making marketing, predictive analytics, and decision support easy and instantaneous.
Sr. SRE, K8s:
As a part of the SRE team (working REMOTELY) you will be challenged, expected to grow your technical knowledge, challenge your fellow team members, and they will challenge you back. Our team is not competitive, but we are goal-oriented and driven to succeed.
What you'll be doing:
Ensure reliability and scalability of our multi datacenter and hybrid Linux environments.
Managing the large-scale Linux infrastructure to ensure maximum uptime.
Performance and reliability testing. This may include reviewing configuration, software choices/versions, hardware specs, etc.
Advancing our technology stack with innovative ideas and new creative solutions.
Participating in capacity management of core systems and services, application analysis and performance and security tuning. Provide operational support of systems and build automation to remediate and address the root cause; with the goal of automating response to all non-exceptional service conditions.
Create strategies for long term permanent fixes to critical production incidents.
Maintain documentation, build tooling, and create alerts to both identify and address infrastructure reliability.
Proactively identify system anomalies.
Who are you:
East Coast U.S. hours 9am-6pm EST preferred, but we can be flexible as long as you can work until 12pm/1pm EST;Β you can work fully remotely
10+ years of experience
Immaculate knowledge of best practices for architecting cross-datacenter Kubernetes clusters running on-premise with automated etcd management using kubeadm
Profound knowledge of docker (docker-shim), containerd and runc internals at the kernel level
Ability to manually troubleshoot and solve certificate issues within kubernetes with zero downtime
Vast experience in development of custom kubernetes operators and autoscalers, as well as tailored ingress/egress controllers
Numerous successful major version upgrades of elasticsearch and fluentd in the past are a must, as well as kubedb operator expertise
Fluency in gitops automation tools (flux v1/v2), comprehensive knowledge of helm customize controller
In-depth understanding of kubernetes security ACLs, exhaustive previous exposure to RBAC configuration and complete knowledge of DEX
Secret management is essential to succeed in this role, vault expertise is also required
Ability to manage BGP configuration, mastery in kube-router and gobgp, as well as MetalLB
Expert-level skills in KubeDNS and CoreDNS
Understanding of the most intricate details in rook/ceph implementation for kubernetes
Thorough understanding of RPM based Linux systems.
Experience administering SQL/NoSQL databases (MySQL, ES, Redis, Cassandra)
Experience with scalable infrastructure monitoring solutions such as Icinga, Prometheus, ELK.
Any scripting language (Python/Ruby/Shell etc)
Understanding of basic networking concepts ( TCP/IP stack, DNS, CDN, load balancing, BGP)
Ability to resolve complex merge conflicts in git is an obvious requirement
Golang experience a plus.
WebMD and its affiliates is an Equal Opportunity/Affirmative Action employer and does not discriminate on the basis of race, ancestry, color, religion, sex, gender, age, marital status, sexual orientation, gender identity, national origin, medical condition, disability, veterans status, or any other basis protected by law.