Principal Data Scientist

University of Texas MD Anderson Cancer Center • Houston, TX, United States • $200k - $250k / year • 2m ago

The mission of The University of Texas M. D. Anderson Cancer Center is to eliminate cancer in Texas, the nation, and the world through outstanding programs that integrate patient care, research, prevention, and education. Core to the success of our mission is the ability to orchestrate multidimensional data, data analytics, and machine learning to create sustainable impact within a framework of responsible AI. We are building a dynamic team to drive machine learning operations in order to accelerate the impact of AI across the enterprise, driving long-lasting improvements in cancer care.

We are seeking a Principal Data Scientist to lead the development and support of innovative generative AI models across the organization. This role is at the heart of our endeavor to pioneer generative AI healthcare solutions, aimed at revolutionizing healthcare operations, enhancing patient outcomes, and making substantial contributions to the fields of medical and AI research. The selected candidate will be responsible for developing individual and comprehensive multi-modal foundational and generative AI models, utilizing their deep expertise in algorithm architecture, machine learning methodologies, and scientific processes. This effort is supported by an extensive repository of contextually relevant data, including medical imaging, electronic health records (EHR), pathology, operational data, and other pertinent healthcare data.

The successful applicant will engage in close collaboration with clinical and business professionals to identify use cases, select appropriate tools and technologies, and define metrics of impact, ensuring that our generative AI solutions are relevant, efficacious, and safe for use. The successful candidate will work alongside a team of data scientists and machine learning engineers to guarantee the seamless deployment, accessibility, and ongoing maintenance of AI models within our infrastructure. The successful candidate must be capable of nurturing a culture of innovation, promoting team unity, and driving technological progress to integrate AI seamlessly throughout our enterprise, guaranteeing its ethical application and optimizing for impact.

Key Responsibilities:

Generative AI Development: Innovate and develop state-of-the-art machine learning technologies, focusing on generative AI, and multimodal models, suitable for complex healthcare applications.
Foundation Model Development: Lead the development and implementation of advanced foundational AI models, concentrating on the domains of imaging, text, structural data, time-series, and various healthcare-related data types. These models should enable and enhance generative AI applications across multiple use cases.
Collaborative Integration & Validation: Work closely with clinical experts, business stakeholders, data scientists, and machine learning engineers to gather requirements, deploy, and maintain foundation and generative AI models in production environments, ensuring they are effectively validated and integrated into enterprise use.
Academic Collaboration & Translation: Engage with academic data scientists and clinical researchers to explore novel AI approaches and use-cases, facilitating the transition from research algorithms to practical healthcare solutions.
Operational Excellence & Compliance: Document and manage detailed records of model development, maintain rigorous testing and validation protocols, and ensure AI solutions are aligned with regulatory standards and ethical guidelines.
Leadership & Culture Development: Provide technical leadership to a team of data scientists, fostering a culture of innovation, continuous learning, and responsible AI development. Develop thought leadership through presentations, publications, patents, and participation in the tech community.

Technical Expertise:

An in-depth understanding of machine learning algorithms and modeling (e.g., supervised, unsupervised, semi-supervised or weakly supervised learning, generative models, transfer learning, optimization, large language models, etc.)
Experience developing foundational and/or generative AI models.
Experience working with open-source and closed source generative AI models.
Proficient in developing, evaluating, deploying AI/ML algorithms.
Skilled in constructing scalable data pipelines, model artifact management, and model performance analytics.
Experienced with MLOps tools and processes for data, features, code, and model management.
Strong proficiency in Python and either C++ or C#, with practical knowledge of TensorFlow, PyTorch, and Scikit-learn.
Knowledgeable about AI/ML platform infrastructure, including cloud and on-premises architectures.
Familiar with cloud-native tools, services, and computing environments (e.g. Azure, AWS, GCP).

Analytical Expertise:

Experience and demonstrated capability to handle challenges with vague or abstract problem definition.
In-depth knowledge of AI/ML Model Lifecycle Management.
Proficient in decision-making, problem-solving, and executing AI/ML healthcare solutions.
Skilled at the quantitatively assessing machine learning models for performance, workflow impact, and potential risks.
Competent in identifying risks and formulating mitigation plans to prevent project delays.

Oral and Written Communication:

Demonstrated ability to lead and manage data science teams and projects.
Experience with documenting processes, pipelines, workflows, and machine learning experiments.
Report project metrics, including progress, impact, and risks, to leadership, offering strategic recommendations for AI/ML use-case prioritization.
Manage stakeholder relations to facilitate solution adoption and address issues.
Share knowledge and offer technical assistance to researchers and colleagues.
Deliver both technical and non-technical updates in meetings and at professional gatherings.

Education Required:

Bachelor's degree in Biomedical Engineering, Electrical Engineering, Computer Engineering, Physics, Applied Mathematics, Science, Engineering, Computer Science, Statistics, Computational Biology, or related field.

Preferred Education:

Doctorate (Academic)

Experience Required:

Seven years of experience in scientific software or industry programming with a concentration in scientific computing. With Master's degree, five years experience required. With PhD, three years of experience required.

Preferred Experience:

Two years in a technical leadership role, leading the technical execution for a project, providing mentorship, and working collaboratively within and across teams.

It is the policy of The University of Texas MD Anderson Cancer Center to provide equal employment opportunity without regard to race, color, religion, age, national origin, sex, gender, sexual orientation, gender identity/expression, disability, protected veteran status, genetic information, or any other basis protected by institutional policy or by federal, state or local laws unless such distinction is required by law.

Additional Information: