The Role:
- You will be part of a highly technical, collaborative and creative team, with a focus on SRE & Software Engineering.
- Responsible for the design, implementation, reliability and management of cloud-based FedRAMP-compliant applications and platforms.
- Responsible for application incident management escalations which involve troubleshooting complex technical problems and resolving application issues within defined service level objectives.
- Design, write, and deliver software that enhances the availability, scalability, and efficiency of our services.
- Partner with platform and application development teams to learn from incidents and improve the platform resiliency.
- Share acquired knowledge and document accordingly while implementing SRE best practices.
The qualifications you need:
- A bachelors or masters degree in a technical field (e.g. Computer Science, Software Engineering) or a comparable education.
- Experience programming with Java, the Spring framework, and Python (or a similar scripting language in Linux environment).
- A minimum of 5 years experience developing cloud based software applications.
- Experience working with public cloud providers (AWS, Azure, or GCP) and modern cloud monitoring system observability frameworks (e.g., Datadog).
- Experience in developing and running large-scale production services with elastic cloud services and Kubernetes.
- Project experience of operation within the SRE domain.
- Familiarity with CI/CD processes and tools (ArgoCD, GitAction, etc.).
- Experience with infrastructure as code (Terraform, Kustomization).
- Strong problem-solving skills and the ability to troubleshoot complex technical issues.
- Excellent English verbal and written communication skills.
This position will not be eligible for any form of immigration visa sponsorship now or in the future.
#J-18808-Ljbffr