Applied Research Scientist - Text-to-Speech (TTS)

Salient
San Francisco, California
Full Time

Email Address

Apply Now

Salient is pioneering voice AI solutions to transform consumer loan servicing and compliance. Our initial focus: the $1.5T U.S. auto lending market. In under two years since our launch Salient has seen rapid market growth, including:

Scaling to more than $10M in ARR
Partnering with some of the largest consumer lenders in America
Cash flow positive
Raising $65m in funding from top-tier venture capital investors
Interfacing with more than 2 million unique US consumers
Processing over $150M in cash transactions
Preventing $30M in fraud and 35k+ CFPB violations
In-person office culture in San Francisco, CA

About the Role

We're looking for an Applied Research Scientist with expertise in Text-to-Speech (TTS) to help us push the boundaries of speech synthesis. You'll work on developing high-quality, low-latency TTS systems that power real-world applications. The ideal candidate combines deep modeling knowledge with a strong engineering mindset to deliver robust, scalable solutions.

Responsibilities

Perform any relevant engineering tasks related to model training and serving. eg, data ingestion, data cleaning, evaluation

Design and train high-quality, low-latency SOTA and TTS models for Real Time agent deployment
Integrate TTS into cascaded LLM+ASR systems; explore joint optimization and feedback loops
Lead research efforts on prosody, speaker identity control, and expressiveness in speech synthesis
Prototype and evaluate new architectures and training pipelines for high-fidelity voice generation
Collaborate with infra and product teams to bring research into production
Contribute to internal tooling for data processing, model training, and inference benchmarking

Requirements

Proven track record developing state of the art TTS systems or advanced degree in speech synthesis
Strong modeling skills and experience training deep neural networks for speech synthesis
Deep understanding of audio modeling, phoneme alignment, vocoders, and Real Time inference challenges
Ability to move from research to working code, this is a hands on role
Comfortable working independently and collaboratively and defining your own roadmap in an ambiguous, fast-moving environment
Ability to work 4 days a week from our San Francisco office (open to candidates willing to relocate)

Nice to Have

Familiarity with multilingual or code-switched TTS
Experience with voice cloning, style transfer, or emotion conditioning in speech
Contributions to academic publications or open-source projects in speech generation

As an early-stage company building at the frontier of AI, we work with high intensity and commitment. While schedules can vary by role/team, many weeks will demand extra focus, flexibility and time particularly during major launches and high impact sprints. We're seeking those who are aligned to and able to commit to that expectation which includes 4 days per week in our San Francisco Office.

Job ID: 483272829

Originally Posted on: 6/30/2025

Email Address

Apply Now