Informatics Data Scientist Lead
Prometheus Federal Services (PFS), a trusted partner to federal health and social services agencies, has an opening for an Informatics Data Scientist Lead. This position is responsible for developing and maintaining our Python codebase, focusing on Extract-Transform-Load (ETL) processes and bioinformatics pipelines. The role requires a blend of technical expertise in data science and bioinformatics, with a strong emphasis on Python programming, data processing, and high-performance computing.
Essential Duties and Responsibilities
The successful candidate may be responsible for, among other things:
- Develop, maintain, and document Python code for ETL processes and bioinformatics pipelines
- Ensure that code is well-documented, version-controlled, and adheres to industry standards such as PEP8
- Implement automated testing frameworks (e.g., pytest) to ensure the reliability and performance of code
- Create logging mechanisms to monitor processes and troubleshoot issues
- Design and implement ETL processes to extract data from various sources, transform it as needed, and load it into relational databases
- Enhance and maintain existing ETL processes, ensuring they are well-documented and tested
- Align and harmonize data from multiple sources for integration into master datasets
- Develop bioinformatics pipelines for tasks such as variant calling, gene expression analysis, and data annotation
- Work within a Linux-based high-performance computing environment using command-line tools
- Utilize tools like Python’s Snakemake to create and manage complex workflows
- Perform testing and validation of bioinformatics pipelines, ensuring accuracy and efficiency
- Collaborate with cross-functional teams, including data engineers, researchers, and project managers
- Participate in regular meetings to discuss project progress, challenges, and goals
- Provide support to research and data teams, helping to structure and prepare data for analysis and modeling
Minimum Qualifications
- Bachelor’s in Data Science, Computer Science, Bioinformatics, or a related field
- Minimum of eight (8) years of experience
- Minimum of five (5) years of federal consulting
- Strong experience in Python programming, particularly in the context of ETL processes and bioinformatics
- Familiarity with version control systems (e.g., Git) and workflow management tools like Snakemake
- Experience working in Linux-based high-performance computing environments
- Knowledge of relational databases and data integration techniques
- Experience with automated testing and logging best practices
- Strong analytical and problem-solving skills
- Excellent communication and documentation skills
- Ability to work both independently and as part of a team
- Authorized to work in the U.S. indefinitely without sponsorship
- Ability to obtain a public trust
Preferred Qualifications
- Experience in healthcare, life sciences, or related industries
- Master’s degree in Data Science, Computer Science, Bioinformatics, or a related field
- VHA Experience
- Knowledge of bioinformatics tools and pipelines
- Familiarity with AI/ML concepts and their application to data science
#J-18808-Ljbffr