Applied Research Scientist - Text-to-Speech (TTS)

  • Salient
  • San Francisco, California
  • Full Time

Salient is pioneering voice AI solutions to transform consumer loan servicing and compliance. Our initial focus: the $1.5T U.S. auto lending market. In under two years since our launch Salient has seen rapid market growth, including:

  • Scaling to more than $10M in ARR

  • Partnering with some of the largest consumer lenders in America

  • Cash flow positive

  • Raising $65m in funding from top-tier venture capital investors

  • Interfacing with more than 2 million unique US consumers

  • Processing over $150M in cash transactions

  • Preventing $30M in fraud and 35k+ CFPB violations

  • In-person office culture in San Francisco, CA

About the Role

We're looking for an Applied Research Scientist with expertise in Text-to-Speech (TTS) to help us push the boundaries of speech synthesis. You'll work on developing high-quality, low-latency TTS systems that power real-world applications. The ideal candidate combines deep modeling knowledge with a strong engineering mindset to deliver robust, scalable solutions.

Responsibilities

  • Perform any relevant engineering tasks related to model training and serving. eg, data ingestion, data cleaning, evaluation

  • Design and train high-quality, low-latency SOTA and TTS models for Real Time agent deployment

  • Integrate TTS into cascaded LLM+ASR systems; explore joint optimization and feedback loops

  • Lead research efforts on prosody, speaker identity control, and expressiveness in speech synthesis

  • Prototype and evaluate new architectures and training pipelines for high-fidelity voice generation

  • Collaborate with infra and product teams to bring research into production

  • Contribute to internal tooling for data processing, model training, and inference benchmarking

Requirements

  • Proven track record developing state of the art TTS systems or advanced degree in speech synthesis

  • Strong modeling skills and experience training deep neural networks for speech synthesis

  • Deep understanding of audio modeling, phoneme alignment, vocoders, and Real Time inference challenges

  • Ability to move from research to working code, this is a hands on role

  • Comfortable working independently and collaboratively and defining your own roadmap in an ambiguous, fast-moving environment

  • Ability to work 4 days a week from our San Francisco office (open to candidates willing to relocate)

Nice to Have

  • Familiarity with multilingual or code-switched TTS

  • Experience with voice cloning, style transfer, or emotion conditioning in speech

  • Contributions to academic publications or open-source projects in speech generation

As an early-stage company building at the frontier of AI, we work with high intensity and commitment. While schedules can vary by role/team, many weeks will demand extra focus, flexibility and time particularly during major launches and high impact sprints. We're seeking those who are aligned to and able to commit to that expectation which includes 4 days per week in our San Francisco Office.

Job ID: 483272829
Originally Posted on: 6/30/2025

Want to find more Chemistry opportunities?

Check out the 17,092 verified Chemistry jobs on iHireChemists