Get started

The #1 ranked

realtime voice AI

Realtime AI that feels as human as it sounds. Top-ranked text-to-speech, speech-to-speech and LLM routing built for realtime conversation.

Status reached 1M DAUs in 19 days.

OtherHalf powers voice-first companions at scale.

Ongoing, personal, emotionally engaging AI interaction. Build for relationship-building, emotional connection, and entertainment at scale.

Realtime TTS

Keep every user engaged with natural, realtime text-to-speech

#1 ranked TTS by real users on the Artificial Analysis Speech Arena. Sub-130ms first-chunk latency from $15 per million characters, up to 80% cheaper than comparable providers. Clone, design, steer, and stream natural responses.

Voice AI that feels human, across every industry

What customers shipping at scale are saying about Inworld.

LiveKit logo

Inworld's TTS-2 marks a real step forward in emotionally expressive voice synthesis. When combined with the conversational intelligence of LiveKit agents, it enables interactions that feel genuinely human, responsive, nuanced, and alive in ways that feel natural.

David Zhao

Co-Founder & CTO, LiveKit

Luvu logo

I've never seen steering work like this before TTS-2. The output is extremely natural and faithful to the steering prompt, even when it's hyper-specific. The biggest battle you fight with TTS is feeling bland, stale, and robotic. This level of steering unlocks a whole new axis to keep the experience fresh.

Creston Brooks

Co-founder & CTO, Luvu

Talkpal logo

We've always believed language learning should have no borders. TTS 2.0 just made that a lot more real.

Dimitri Dekanozishvili

Co-founder, Talkpal

Isekai Zero logo

We've been chasing the uncanny valley of voice AI for years. Inworld is finally closing the gap between 'impressive' and 'actually believable' with TTS 2.0. When your character speaks and you forget it's AI, that's when the story becomes real.

Louis Muk

CEO, Isekai Zero

Stream logo

We've had early access to Inworld TTS-2 for a few days and we're all blown away. The expressiveness, language steering and multi-lingual support are genuinely impressive. The subtle details like natural pausing make it hard to differentiate between AI and human.

Nash Ramdial

Developer Relations, Stream

k-ID logo

Inworld just made voice AI feel genuinely human across 100+ languages. Partnering with them means we can help bring that experience to kids around the world, safely and compliantly.

Kieran Donovan

CEO, k-ID

Latitude logo

AI Native games need characters you can deeply connect with. Voice models that offer full control and emotional complexity to make characters feel real is one of the biggest pieces missing. TTS 2 is a significant advance in helping make that future a reality.

Nick Walton

CEO, Latitude

VoiceRun logo

Inworld was already at the top of the Artificial Analysis TTS Arena and Realtime TTS-2 pushes further on a dimension VoiceRun customers care about: directability. Style, pacing, emphasis, emotion, and delivery can be shaped in ways that matter for real enterprise deployments.

Nick Leonard

CEO & Co-Founder, VoiceRun

Particle logo

If you want top-of-the-line TTS integrated into your product, work with Inworld. It's easy to use, quick, high-quality, and a fraction of the price of comparable providers. Plus, the Inworld team is responsive and easy to work with.

Sara Beykpour

Co-founder & CEO, Particle

AstroBeam logo

When we adopted Inworld TTS it was a game changer. Players immediately began mentioning how magical it was talking to our NPCs.

Devin Reimer

Founder & CEO, AstroBeam

Status logo

When you work with external parties you never know what you're going to get, but Inworld has been really hands-on and fast. It's honestly been the most positive and best experience I've ever had working with another company on a problem like this.

Fai Nur

CEO, Wishroll (Status)

Death by AI logo

We found a path to profitability with Inworld. They helped us understand the base usage and keep costs per user low.

Tabish Khan

Co-founder & CEO, Playroom (Death by AI)

Vapi logo

Working with Inworld helps us open up new possibilities for developers building expressive, real-time voice agents. The focus is always on giving builders access to great tools, and this integration fits perfectly with that mission.

Jordan Dearsley

Co-Founder, Vapi

Realtime API

Controllable speech-to-speech that understands, reasons, and interacts

End-to-end speech-to-speech with custom voices and tool calling. Fully customizable. Optimize for cost, latency, or what your users care about most.
Live conversation
Full duplex, low-latency streamingFull-duplex audio streaming over a single WebSocket or WebRTC connection.
Intelligent turn takingContext-aware turn detection with adjustable eagerness.
Function callingRegister tools mid-session. The assistant calls your functions without breaking audio.
Provider agnosticRoute to the model that fits your latency, cost, or quality requirements, and swap it out at any time.
Dynamic context managementCreate, retrieve, delete, or truncate conversation items mid-session to control context length and token cost.
Conversational intelligenceUse acoustic and metadata signals to condition what is said, when it is said, and how it is expressed.
Realtime Router

Reason in realtime. Route to the best model and tools for every user and context

One API that intelligently routes requests across OpenAI, Anthropic, Google, and 200+ models. Built-in analytics to ensure the metrics you care about improve. No latency added. Built-in failover, A/B testing, and intelligent model selection with no code changes required.
curl 'https://api.inworld.ai/v1/chat/completions' \ -H "Content-Type: application/json" \ -H "Authorization: Basic $INWORLD_API_KEY" \ -d '{
"model": "inworld/user-aware",
"messages": [{"role": "user", "content": "Hello"}], "extra_body": { "metadata": { "language": "es", "country": "MX", "plan": "free" } } }'
Use it with:
+ many more
Top Models
#1Gemini 3 Flash
16%
#2Qwen3 Max
14%
#3Claude Sonnet 4.6
12%
#4Claude Opus 4.6
12%
#5Grok 4.1 Fast
10%
#6Gemini 2.5 Flash
5%
#7GPT 5.2
5%
#8Kimi K2.5
3%
#9MiniMax M2.5
3%
#10Gemini 2.5 Flash Lite
2%
Realtime STT

Speech-to-text that truly understands your users in realtime

Understand your users and their context in realtime with built-in voice profiling, along with state-of-the-art latency and accuracy.
Live transcription
Realtime streamingRealtime, bidirectional streaming over WebSocket for live audio, or synchronous transcription for complete audio files.
Voice profiling: emotion, age, accent, pitch & styleExtract five real-time signals per audio chunk to understand who is speaking and how they feel.
Semantic & acoustic VADAutomatically detect when speech starts and stops. Enable natural speech patterns.
Unified multi-provider APIA single integration point for industry-leading, high-accuracy transcription providers, with consistent authentication, request formatting, and response handling.
High accuracy & custom vocabularyTranscribe audio with industry-leading accuracy. Add domain-specific terms, product names, and specialized vocabulary to boost recognition further.
Word-level timestamps & diarizationPer-word timing for subtitles and search. Label speakers in multi-party conversations.
Security

Build with confidence on secure AI infrastructure

Enterprise-grade security and compliance built into our AI platform. Built on a zero-trust framework with continuous monitoring, providing a secure foundation for teams building the next generation of AI.
SOC2 Type II
Certified
HIPAA
Compliant
GDPR
Compliant

Voice AI, side-by-side comparison

Capability
Inworld
Google
ElevenLabs
Cartesia
OpenAI
Hume
Voice quality (Artificial Analysis Speech Arena)
#1
#2
#3
Not stated
#5
Not stated
Natural conversational delivery
YesYesNot statedNot statedYesNot stated
Realtime latency
YesNot statedNot statedYesNot statedNot stated
Multi-turn aware speech synthesis
YesNot statedNot statedNot statedYesNot stated
Simple voice direction (inline tags)
YesYesYesYesYesYes
Advanced voice direction (free-form descriptions)
YesNot statedNot statedNot statedYesNot stated
Voice cloning
YesNot statedYesYesNot statedYes
Simple voice design (text-based)
YesNot statedYesNot statedNot statedNot stated
Advanced voice design
YesNot statedNot statedNot statedNot statedNot stated
Crosslingual (single voice, over 100 languages)
YesNot statedYesNot statedNot statedNot stated
Voice profiling (understand user context)
YesNot statedNot statedNot statedNot statedNot stated
Single customizable speech-to-speech API
YesNot statedNot statedNot statedNot statedNot stated
User-aware LLM routing
YesNot statedNot statedNot statedNot statedNot stated
Optimized alphanumeric support
YesNot statedYesNot statedNot statedNot stated
Verified May 2026 from public docs and the Artificial Analysis Speech Arena leaderboard. Based on the latest models from each provider.

Start building

Join millions of developers building the next wave of AI applications.
Copyright © 2021-2026 Inworld AI
Inworld AI – The #1 Ranked Realtime Voice AI