Data Scientist

Location: Pittsburgh

Job Type: Full Time / Permanent

Specifically, the successful candidate will develop and apply cutting-edge machine learning methods that enable the analysis of multi-omic biological tumor datasets to help develop novel insights towards discovering novel therapeutic cancer targets and treatments.

Responsibilities: • Support leadership of the systems immunology and computational biology teams to analyze and prepare biological datasets. • Develop and implement appropriate machine learning (ML) models and deploy them to scale. • Implement metrics to verify model and algorithm effectiveness. • Automate model training, testing and deployment and ensure proper code documentation. • Collaborate with computational team to develop ontology-based NLP platform to support in-house target discovery efforts. • Must be willing to work flexible hours as necessary, and work beyond 40 hrs is likely to be required. • Reports to the leadership of the systems immunology and computational biology teams. • Document code produces reports and communicates results to the broader TTMS team. • Support the computational biology and systems immunology teams in developing analytical tools that meet operational requirements associated with interrogation and management of clinical and non-clinical datasets. • Ensure adherence to standards required to conduct work in a HIPAA compliant data management and sharing. • Keep up with emerging trends in ML, Deep Learning (DL) and Natural Language Processing (NLP). • Performs in accordance with system-wide competencies/behaviors. • Performs other duties as assigned.

Educational and Knowledge Requirements: • PhD in computational, data science, statistics, mathematics, physics, or a related quantitative field. • 3+ years work experience in developing and applying ML algorithms to high-dimensional datasets. • Strong understanding of statistics, ML fundamentals, modern ML, DL and NLP libraries. • Proficient in Python and R scripting. • Expertise in ML/DL frameworks like TensorFlow, Keras, Scikit-learn/Caret and NLP libraries like BERT, BioBERT, NLTK. • Demonstrated ability to write high-quality, production-ready code. • Experience with version control systems like GIT. • Self-motivated, organized, goal oriented, team player focused on a career in biotech. • Demonstrated ability to adhere to and follow defined timelines, milestone, and objectives. • Able to deal with uncertainty and solve problems creatively and independently with solid judgement. • Experience with multi-omic biological datasets (e.g., RNAseq, Exome-seq , next-gen sequencing) is highly desirable. • Experience with cloud computing (AWS) and distributed architectures like Sagemaker and Spark preferred. • Experience with relational databases and SQL preferred. • Knowledge of cancer biology and immunology preferred.