Jobs search

Site Reliability Engineer- Senior Dev Operations Engineer SRE

https:/www.energyjobline.com/sitemap.xml • Pleasanton, CA, United States • $200k - $250k / year • 2m ago

Site Reliability Engineer - Senior Dev Operations Engineer (SRE)

Pleasanton, CA (100% remote)

Duration: 12+ Months

Interview: Web Cam Interview

Rate: $60-$75/Hr on W2

Top 3-5 Must Haves:

Experience setting up alerts/alarms/notifications in AWS cloud (CloudWatch/Dynatrace).
Experience with AWS solutions using services including Kafka, ECS, EKS.
Experience with Infrastructure as Code (IaC) using CDK or Terraform.

Role:

The Site Reliability Engineer (SRE) will lead the DevOps team responsible for system administration areas including monitoring, installation, configuration, maintenance, operations, and architecture of AWS cloud environments and on-premise environments.

The candidate will work within a team to implement and maintain all production and pre-production environments by utilizing tools and automation.

Looking for a candidate with exceptional Site Reliability and DevOps skills, extensive knowledge and experience in implementing solutions and tools to maintain and grow all application environments.

Most importantly, the right individual will possess a positive, 'can-do' attitude and a passion for delivering technical solutions in a fast-paced environment.

In addition, the individual will be dedicated, independent, and collaborate at a high level to ensure the stability and reliability of infrastructure and applications running in the AWS Cloud and on-premise environments.

Advanced experience working in AWS environments is expected while leading the implementation of improvements and advancements.

Key Responsibilities:

Monitor sites, environments, and software by implementing tools and automation to achieve 99.9% uptime.
Measure, optimize, and tune system performance ensuring systems run reliably and are highly available in a 24/7 production environment.
Automate system and application monitoring using monitoring and automation tools.
Conduct post-incident reviews and Root Cause Analysis.
Document work to turn findings into repeatable actions.
Code automation within site infrastructure.
Implement production monitoring systems.
Utilize strong analytical and problem-solving skills.
Conduct security assessments and address vulnerabilities.
Design and deploy AWS solutions using services (i.e. EC2, S3, Glacier, ELB, RDS, IAM, Route 53, VPC, Auto Scaling, Cloud Watch, Cloud Trail, Cloud Formation, Security Groups, API Gateway, SSM, Route table, Endpoint service, etc.).
Provision, manage, and conduct day-to-day operations of AWS environments.
Implement alarms, alerts, and notifications using AWS services (i.e. Cloud Watch).
Implement AWS Multi AZ accounts for High Availability (HA) and Disaster Recovery (DR).
Design AWS infrastructure to minimize operational costs through push-button deployment at scale with near-zero downtime.
Develop and maintain configuration management solutions.
Provide technical guidance, knowledge transfer, and mentorship to internal engineering peers as required.
Oversee server maintenance based on updates, system requirements, data usage, and antivirus requirements.
Design, implement, and support large scale web farm infrastructure across multiple data centers supporting the Infrastructure as a Service (IaaS) offering.
Assist engineering in implementing new technologies in development for future production deployment.

#J-18808-Ljbffr