State-of-the-art Vision Language Models (VLMs) have advanced rapidly, yet they still struggle with physical reasoning and real-world understanding-often due to a "text first, vision second" training paradigm and insufficient large-scale, diverse, real-world datasets. By leveraging Tesla's extensive global vehicle fleet and our rapidly growing humanoid robot platforms, we aim to reshape how VLMs perceive and interpret the physical world.
In this role, you'll have access to unparalleled compute resources, massive multimodal real-world datasets, and close collaboration with a small team of world-class AI research engineers. You'll be involved in every stage of the VLM pipeline-pre-training, alignment, post-training, reinforcement learning, evaluation, distillation, deployment, and efficient inference-pushing the boundaries of vision-language integration for real-world applications.
- Compute and verify scaling laws for real-world understanding using large GPU clusters and extensive datasets
- Develop and debug large distributed training jobs spanning tens of thousands of GPUs
- Align our pre-trained foundation vision models with large language models for unified perception and language comprehension
- Buildild new human-labeled and synthetic datasets addressing real-world tasks and physical reasoning
- Explore reward functions and SOTA RL techniques to enhance real-world understanding and problem-solving
- Leverage Tesla's data to create robust evaluation sets focused on real-world scenarios and physical accuracy
- Perform knowledge distillation from larger models to smaller, edge-optimized models deployable across Tesla cars and robots
- Apply quantization, inference-time optimizations, and device-specific tweaks to reduce power consumption and latency
- Deep Learning Background: Experience with large-scale vision-language models, multimodal transformers, or related architectures
- Distributed Systems Expertise: Proven ability to train and optimize models on high-performance clusters (thousands of GPUs)
- Practical Dataset Management: Comfort curating or generating large, diverse datasets-human-labeled, synthetic, or both
- Reinforcement Learning Knowledge: Familiarity with RL algorithms and reward function design, especially for complex real-world tasks
- Hands-On Approach: Willingness to iterate quickly on experimental ideas-from pre-training to final deployment
- Collaboration & Communication: Strong cross-functional skills, able to work with AI research engineers, robotics teams, and software groups