Responsibilities
- Design and implement a robust AI/ML platform that integrates with existing CI/CD pipelines and data platforms.
- Develop automated tools and frameworks for processes like model training, model testing, model registration, model deployment, model monitoring, and experiment tracking.
- Develop E2E MLOps (structured and LLM) pipelines for internal platform use cases.
- Work on building and maintaining core data pipelines to efficiently handle data flow for training and deploying ML models.
- Maintain and optimize machine learning infrastructure for performance and scalability.
- Implement best practices for data governance, security, and compliance in machine learning operations.
- Develop and maintain documentation, including design documents, user guides, and operational procedures.
- Partner with data scientists and software engineers to ensure operational and architectural alignment.
Qualifications
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field.
- 5+ years of total industry experience with at least 1 year in AI/ML operations.
- Proficiency in one or more programming languages such as Scala, Python, Java, or Golang.
- Experience working in big data technologies and frameworks, specifically Apache Spark, for processing large datasets and optimizing data workflows.
- Hands-on experience with infrastructure as code, i.e., Terraform stack.
- Experience in building end-to-end MLOps pipelines in at least 1 production-level project at scale (including CI/CD).
- Experience with AWS services such as Sagemaker, Lambda, S3, Glue, Redshift, and familiarity with large language models (LLMs), AWS Bedrock.
- Familiarity with Kubernetes for container orchestration and Airflow for workflow orchestration.
- Familiarity with ML libraries like TensorFlow, PyTorch, or Scikit-learn.
- Excellent problem-solving and analytical skills.
- Strong communication and collaboration abilities.
Physical Requirements
Ability to work in-office a minimum of three days a week.
#J-18808-Ljbffr