NA1
Responsibilities
- Develop data processing pipelines using programming languages like Java and Python to extract, transform, and load log data
- Implement scalable, fault-tolerant solutions for data ingestion, processing, and storage.
- Support systems engineering lifecycle activities for data engineering deployments, including requirements gathering, design, testing, implementation, operations, and documentation.
- Automating platform management processes through Ansible or other scripting tools/languages
- Troubleshooting incidents impacting the log data platforms
- Collaborate with cross-functional teams to understand data requirements and design scalable solutions that meet business needs.
- Develop training and documentation materials
- Support log data platform upgrades including coordinating testing of upgrades with users of the platform
- Gather and process raw data from multiple disparate sources (including writing scripts, calling APIs, writing SQL queries, etc.) into a form suitable for analysis
- Enables log data, batch and real-time analytical processing solutions leveraging emerging technologies
- Participate in on-call rotations to address critical issues and ensure the reliability of data engineering systems
Experience
- Ability to troubleshoot and diagnose complex issues
- Able to demonstrate experience supporting technical users and conduct requirements analysis
- Can work independently with minimal guidance & oversight
- Experience with IT Service Management and familiarity with Incident & Problem management
- Highly skilled in identifying performance bottlenecks, identifying anomalous system behavior, and resolving root cause of service issues.
- Demonstrated ability to effectively work across teams and functions to influence design, operations, and deployment of highly available software
- Knowledge of standard methodologies related to security, performance, and disaster recovery
Required Technical Expertise
-
Expertise in language such as Java, Python.Β Implementation knowledge in data processing pipelines using programming languages like Java and Python to extract, transform, and load (ETL) data
- Create and maintain data models, ensuring efficient storage, retrieval, and analysis of large datasets
- Troubleshoot and resolve issues related to data processing, storage, and retrieval.
-
3-5 yearsβ Experience in designing, developing, and deploying data lakes using AWS native services (S3, Glue (Crawlers, ETL, Catalog), IAM, Terraform, Athena)
- Experience in development of systems for data extraction, ingestion and processing of large volumes of data
- Experience with data pipeline orchestration platforms
- Experience in Ansible/Terraform/Cloud Formation scripts and Infrastructure as Code scripting is required
- Implement version control and CI/CD practices for data engineering workflows to ensure reliable and efficient deployments
- Proficiency in implementing monitoring, logging, and alerting solutions for data infrastructure (e.g., Prometheus, Grafana)
- Proficiency in distributed Linux environments
Preferred Technical Experience
- Familiarity with data streaming technologies such as Kafka, Kinesis, spark streaming, etc
- Knowledge of cloud platforms (prefer AWS) and container + orchestration technologies
- Experience with AWS OpenSearch, Splunk
- Experience with common scripting and query languages