Lead Data Scientist Machine Learning & Preclinical Pharmacology

Exton, Pennsylvania
Full Time

Email Address

Apply Now

Lead Data Scientist Machine Learning & Preclinical Pharmacology
Location: Exton, PA area, Flexible Hybrid
About Company:
Our client is a pioneer in phenotypic screening and in vivo pharmacology services, offering transformative insights to clients in the biotech and pharmaceutical sectors. Their mission is to identify novel therapeutic opportunities through rigorous preclinical science. They are seeking a Lead Data Scientist to accelerate their platform by building intelligent, scalable data systems that enhance our experimental rigor and translational insights. Role Summary:
As Lead Data Scientist, you will drive the development and application of machine learning and data infrastructure to support in vivo R&D efforts. This includes behavior analysis from video data, biosignal processing, and development of automation tools that improve speed, accuracy, and reproducibility in preclinical research. You will collaborate closely with biology, operations, and client-facing teams to turn complex data into actionable insights. Key Responsibilities:
Data Infrastructure & Strategy

Lead modernization of data infrastructure across preclinical and business functions, improving data access, standardization, and workflow efficiency.
Implement robust documentation and data versioning practices to support reproducibility and regulatory compliance.

Behavioral Analytics & Deep Learning

Build and deploy deep learning pipelines for behavior classification and motion tracking in rodent models using tools like TensorFlow, Keras, and Ultralytics.
Oversee all phases of model development: manual labeling, preprocessing, training, validation, and iteration to reach high accuracy (e.g., >95% F1 score).

Electrophysiology & Signal Analysis

Develop and maintain processing pipelines for extracting key features from EEG and other biosignals.
Create dynamic visualizations and analysis tools to support interpretation of complex in vivo datasets.

Automation & Document Intelligence

Design and implement NLP and document-scraping pipelines to transform unstructured client reports into structured databases.
Automate generation of client-facing reports and internal summaries, significantly reducing turnaround time.

Machine Learning Initiatives

Apply supervised and unsupervised learning methods to phenotypic screening datasets for pattern discovery, treatment differentiation, and predictive modeling.
Collaborate with R&D teams to develop interpretable models that inform dosing, mechanism-of-action studies, and novel compound discovery.
Integrate ML outputs with in-lab digital tools to improve experiment monitoring and annotation efficiency.

Qualifications:

Ph.D. or M.S. in Data Science, Computer Science, Neuroscience, Biomedical Engineering, or a related field.
5+ years of experience applying ML to real-world data, preferably in a preclinical, CRO, or in vivo environment.
Deep expertise in Python (pandas, NumPy, scikit-learn, PyTorch/TensorFlow).
Hands-on experience with behavioral video analysis, signal processing, and ML model deployment.
Familiarity with in vivo pharmacology, rodent behavioral models, or electrophysiology is a strong plus.

Preferred Skills: