Providers

Providers

Overview of the LLM, TTS, and STT provider model in AvatarLayer.

AvatarLayer uses a pluggable provider model. Every AI capability — language model, text-to-speech, speech-to-text — is defined by a simple interface. The SDK ships with adapters for popular services, but you can implement any interface to add your own.

LLM providers

All LLM adapters implement the LLMProvider interface:

interface LLMProvider {
  readonly id: string;
  chat(messages: ChatMessage[], opts?: LLMOptions): AsyncIterable<LLMChunk>;
}

The chat method returns an async iterable of LLMChunk objects. Each chunk contains a text delta and a done flag.

Built-in adapters

AdapterPackageDefault model
OpenAIAdapteropenaigpt-5.4-mini
AnthropicAdapter@anthropic-ai/sdkclaude-sonnet-4.6
GeminiAdapter@google/generative-aigemini-3-flash-preview
AzureOpenAIAdapteropenaideployment name
GroqAdapteropenaillama-4-scout-17b-16e-instruct
DeepSeekAdapteropenaideepseek-chat
MistralAdapteropenaimistral-small-latest
XAIAdapteropenaigrok-3-mini-fast
OpenRouterAdapteropenaiopenai/gpt-4.1-mini
TogetherAdapteropenaimeta-llama/Llama-4-Scout-17B-16E-Instruct
FireworksAdapteropenaiaccounts/fireworks/models/llama4-scout-instruct-basic
OllamaAdapteropenaillama3.2
PromptAPIAdapterChrome built-in

See LLM Adapters for constructor options and detailed usage.

TTS providers

All TTS adapters implement the TTSProvider interface:

interface TTSProvider {
  readonly id: string;
  synthesize(text: string, opts?: TTSOptions): Promise<Blob>;
}

The synthesize method takes text and returns an audio blob (MP3 by default).

Built-in adapters

AdapterDefault voiceDefault model
ElevenLabsAdapterRachel (21m00Tcm4TlvDq8ikWAM)eleven_multilingual_v2
OpenAITTSAdapteralloytts-1
AzureTTSAdapteren-US-JennyNeural
GoogleTTSAdapteren-US-Standard-C

See TTS Adapters for constructor options and detailed usage.

STT providers

AvatarLayer supports both batch and realtime speech-to-text. Batch adapters implement STTProvider, realtime adapters implement RealtimeSTTProvider.

Batch adapters

AdapterDefault model
OpenAISTTAdapterwhisper-1
GoogleSTTAdapter
AzureSTTAdapter

Realtime adapters

AdapterTransportDefault model
DeepgramSTTAdapterWebSocketnova-3
ElevenLabsSTTAdapterWebSocket
AzureSpeechSTTAdapterWebSocket
AmazonTranscribeSTTAdapterWebSocket
WebSpeechSTTAdapterBrowser API

See STT Adapters for constructor options, token URL patterns, and detailed usage.

LLM options

Options passed to the chat method or set on the session:

OptionTypeDescription
modelstringOverride the default model
temperaturenumberSampling temperature
maxTokensnumberMaximum output tokens
reasoningEffort"none" | "low" | "medium" | "high"Extended thinking budget (Anthropic, OpenAI reasoning models)
systemPromptstringSystem message content
languagestringBCP-47 language code for the expected output (e.g. "en", "es", "ja")
signalAbortSignalCancellation signal

TTS options

Options passed to the synthesize method:

OptionTypeDescription
voiceIdstringOverride the default voice
modelIdstringOverride the default TTS model
speednumberPlayback speed
outputFormatstringAudio format (default: mp3_44100_128)
stabilitynumberVoice stability (ElevenLabs)
similarityBoostnumberSimilarity boost (ElevenLabs)
signalAbortSignalCancellation signal