Research Scientist, Foundation Model, Speech & Audio
- ByteDance
- Seattle, Washington
- Full Time
Team IntroThe Speech team's mission is to empower content interaction and creation using speech & audio related technologies. The team focuses on cutting-edge R&D in areas like speech & audio, music processing, natural language understanding and multimodal deep learning. The team builds AI training and inference systems based on GPUs and advances the state-of-the-art of AI system technologies to accelerate large audio/music language models. The team is also responsible for the development of the complete engineering cycle of large models, including data preparing/processing, model training/evaluation/deployment, etc.
Responsibilities:
- Contribute cutting-edge research to ByteDance product evolution (e.g., TikTok, CapCut) to impact billions of users worldwide.
- Lead research to advance science and technology in audio processing and generation (e.g., Speech Synthesis, Voice Conversion, Audio Codec Learning, Audio Language Modeling, etc.)
- Research, model, design, develop and evaluate novel machine learning models and algorithms.
- Collaborate with globally based researchers and engineering teams in developing machine learning models and algorithms.