The #1 ranked

realtime voice AI

Realtime AI that feels as human as it sounds. Top-ranked text-to-speech, speech-to-speech and LLM routing built for realtime conversation.

Get Started Contact Sales

Status reached 1M users in 19 days.

OtherHalf powers voice-first companions at scale.

Ongoing, personal, emotionally engaging AI interaction. Build for relationship-building, emotional connection, and entertainment at scale.

We are taking on the cost of AI, so consumer apps can scale.

Every layer, against the alternative

$100

$12.50

Realtime TTS

ElevenLabs vs Inworld

$ / 1M characters · Growth plan

$0.46

$0.10

Realtime STT

Deepgram vs Inworld

$ / hour · Growth plan

LLM markup

Typical gateway vs Inworld Router

% added to your bill · All plans

100

Realtime inference

Public rate vs Inworld

% of the public rate · All plans

$10+

Dedicated GPUs

Hyperscaler vs Inworld

$ / GPU-hour · Starting from

Provider rates, June 2026. Inworld Realtime TTS-2 and Realtime STT on the Growth plan; rates fall further at enterprise scale.

See pricing Read why we cut prices

Realtime TTS

Keep every user engaged with natural, realtime text-to-speech

#1 realtime TTS by real users on the Artificial Analysis Speech Arena. Sub-130ms first-chunk latency, down to $10 per million characters. Clone, design, steer, and stream natural responses.

Get Started View Docs Read the TTS-2 launch

Voice AI that feels human, across every industry

What customers shipping at scale are saying about Inworld.

“Inworld's TTS-2 marks a real step forward in emotionally expressive voice synthesis. When combined with the conversational intelligence of LiveKit agents, it enables interactions that feel genuinely human, responsive, nuanced, and alive in ways that feel natural.”

David Zhao

Co-Founder & CTO, LiveKit

“I've never seen steering work like this before TTS-2. The output is extremely natural and faithful to the steering prompt, even when it's hyper-specific. The biggest battle you fight with TTS is feeling bland, stale, and robotic. This level of steering unlocks a whole new axis to keep the experience fresh.”

Creston Brooks

Co-founder & CTO, Luvu

“We've always believed language learning should have no borders. TTS 2.0 just made that a lot more real.”

Dimitri Dekanozishvili

Co-founder, Talkpal

“We've been chasing the uncanny valley of voice AI for years. Inworld is finally closing the gap between 'impressive' and 'actually believable' with TTS 2.0. When your character speaks and you forget it's AI, that's when the story becomes real.”

Louis Muk

CEO, Isekai Zero

“We've had early access to Inworld TTS-2 for a few days and we're all blown away. The expressiveness, language steering and multi-lingual support are genuinely impressive. The subtle details like natural pausing make it hard to differentiate between AI and human.”

Nash Ramdial

Developer Relations, Stream

“Inworld just made voice AI feel genuinely human across 100+ languages. Partnering with them means we can help bring that experience to kids around the world, safely and compliantly.”

Kieran Donovan

CEO, k-ID

“AI Native games need characters you can deeply connect with. Voice models that offer full control and emotional complexity to make characters feel real is one of the biggest pieces missing. TTS 2 is a significant advance in helping make that future a reality.”

Nick Walton

CEO, Latitude

“Inworld was already at the top of the Artificial Analysis TTS Arena and Realtime TTS-2 pushes further on a dimension VoiceRun customers care about: directability. Style, pacing, emphasis, emotion, and delivery can be shaped in ways that matter for real enterprise deployments.”

Nick Leonard

CEO & Co-Founder, VoiceRun

“If you want top-of-the-line TTS integrated into your product, work with Inworld. It's easy to use, quick, high-quality, and a fraction of the price of comparable providers. Plus, the Inworld team is responsive and easy to work with.”

Sara Beykpour

Co-founder & CEO, Particle

“When we adopted Inworld TTS it was a game changer. Players immediately began mentioning how magical it was talking to our NPCs.”

Devin Reimer

Founder & CEO, AstroBeam

“When you work with external parties you never know what you're going to get, but Inworld has been really hands-on and fast. It's honestly been the most positive and best experience I've ever had working with another company on a problem like this.”

Fai Nur

CEO, Wishroll (Status)

“We found a path to profitability with Inworld. They helped us understand the base usage and keep costs per user low.”

Tabish Khan

Co-founder & CEO, Playroom (Death by AI)

“Working with Inworld helps us open up new possibilities for developers building expressive, real-time voice agents. The focus is always on giving builders access to great tools, and this integration fits perfectly with that mission.”

Jordan Dearsley

Co-Founder, Vapi

Realtime API

Controllable speech-to-speech that understands, reasons, and interacts

End-to-end speech-to-speech with custom voices and tool calling. Fully customizable. Optimize for cost, latency, or what your users care about most. One WebSocket call, one bill, every layer at the new lower rates.

Get Started View Docs

Live conversation

Full duplex, low-latency streamingFull-duplex audio streaming over a single WebSocket or WebRTC connection.

Intelligent turn takingContext-aware turn detection with adjustable eagerness.

Function callingRegister tools mid-session. The assistant calls your functions without breaking audio.

Provider agnosticRoute to the model that fits your latency, cost, or quality requirements, and swap it out at any time.

Dynamic context managementCreate, retrieve, delete, or truncate conversation items mid-session to control context length and token cost.

Conversational intelligenceUse acoustic and metadata signals to condition what is said, when it is said, and how it is expressed.

Realtime Router

Reason in realtime. Route to the best model and tools for every user and context

One API that intelligently routes requests across OpenAI, Anthropic, Google, and 220+ models. Zero markup. You pay provider rates and nothing else. Built-in analytics to ensure the metrics you care about improve. No latency added. Built-in failover, A/B testing, and intelligent model selection with no code changes required.

Get Started View Docs

curl 'https://api.inworld.ai/v1/chat/completions' \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -d '{

    "model": "inworld/user-aware",

    "messages": [{"role": "user", "content": "Hello"}],
    "extra_body": {
        "metadata": {
            "language": "es",
            "country": "MX",
            "plan": "free"
        }
    }
  }'

curl 'https://api.inworld.ai/v1/chat/completions' \
  -H "Content-Type: application/json" \
  -H "Authorization: Basic $INWORLD_API_KEY" \
  -d '{

    "model": "inworld/context-aware",

    "messages": [
        {"role": "system", "content": "You are a support agent."},
        {"role": "user", "content": "My order never arrived"}
    ],
    "extra_body": {
        "metadata": {
            "intent": "complaint",
            "emotion": "frustrated",
            "session_turns": 4
        }
    }
  }'

Use it with:

+ many more

Top Models

Most popular models by % of Router traffic

#1Gemini 3 Flash

16%

#2Qwen3 Max

14%

#3Claude Sonnet 4.6

12%

#4Claude Opus 4.6

12%

#5Grok 4.1 Fast

10%

#6Gemini 2.5 Flash

#7GPT 5.2

#8Kimi K2.5

#9MiniMax M2.5

#10Gemini 2.5 Flash Lite

Realtime STT

Speech-to-text that truly understands your users in realtime

Understand your users and their context in realtime with built-in voice profiling, along with state-of-the-art latency and accuracy. Realtime streaming down to $0.10 per hour.

Get Started View Docs

Live transcription

Realtime streamingRealtime, bidirectional streaming over WebSocket for live audio, or synchronous transcription for complete audio files.

Voice profiling: emotion, age, accent, pitch & styleExtract five real-time signals per audio chunk to understand who is speaking and how they feel.

Semantic & acoustic VADAutomatically detect when speech starts and stops. Enable natural speech patterns.

Unified multi-provider APIA single integration point for industry-leading, high-accuracy transcription providers, with consistent authentication, request formatting, and response handling.

High accuracy & custom vocabularyTranscribe audio with industry-leading accuracy. Add domain-specific terms, product names, and specialized vocabulary to boost recognition further.

Word-level timestamps & diarizationPer-word timing for subtitles and search. Label speakers in multi-party conversations.

Security

Build with confidence on secure AI infrastructure

Enterprise-grade security and compliance built into our AI platform. Built on a zero-trust framework with continuous monitoring, providing a secure foundation for teams building the next generation of AI.

More About Security

SOC2 Type II

Certified

HIPAA

Compliant

GDPR

Compliant

Voice AI, side-by-side comparison

Capability	Inworld	Google	ElevenLabs	Cartesia	OpenAI
Natural conversational delivery	Yes	Yes	Not stated	Not stated	Yes
Realtime latency	Yes	Not stated	Not stated	Yes	Not stated
Multi-turn aware speech synthesis	Yes	Not stated	Not stated	Not stated	Yes
Simple voice direction (inline tags)	Yes	Yes	Yes	Yes	Yes
Advanced voice direction (free-form descriptions)	Yes	Not stated	Not stated	Not stated	Yes
Voice cloning	Yes	Not stated	Yes	Yes	Not stated
Simple voice design (text-based)	Yes	Not stated	Yes	Not stated	Not stated
Advanced voice design	Yes	Not stated	Not stated	Not stated	Not stated
Crosslingual (single voice, over 100 languages)	Yes	Not stated	Yes	Not stated	Not stated
Voice profiling (understand user context)	Yes	Not stated	Not stated	Not stated	Not stated
Single customizable speech-to-speech API	Yes	Not stated	Not stated	Not stated	Not stated
User-aware LLM routing	Yes	Not stated	Not stated	Not stated	Not stated
Optimized alphanumeric support	Yes	Not stated	Yes	Not stated	Not stated