Providers
Overview of the LLM, TTS, and STT provider model in AvatarLayer.
AvatarLayer uses a pluggable provider model. Every AI capability — language model, text-to-speech, speech-to-text — is defined by a simple interface. The SDK ships with adapters for popular services, but you can implement any interface to add your own.
LLM providers
All LLM adapters implement the LLMProvider interface:
interface LLMProvider {
readonly id: string;
chat(messages: ChatMessage[], opts?: LLMOptions): AsyncIterable<LLMChunk>;
}The chat method returns an async iterable of LLMChunk objects. Each chunk contains a text delta and a done flag.
Built-in adapters
| Adapter | Package | Default model |
|---|---|---|
OpenAIAdapter | openai | gpt-5.4-mini |
AnthropicAdapter | @anthropic-ai/sdk | claude-sonnet-4.6 |
GeminiAdapter | @google/generative-ai | gemini-3-flash-preview |
AzureOpenAIAdapter | openai | deployment name |
GroqAdapter | openai | llama-4-scout-17b-16e-instruct |
DeepSeekAdapter | openai | deepseek-chat |
MistralAdapter | openai | mistral-small-latest |
XAIAdapter | openai | grok-3-mini-fast |
OpenRouterAdapter | openai | openai/gpt-4.1-mini |
TogetherAdapter | openai | meta-llama/Llama-4-Scout-17B-16E-Instruct |
FireworksAdapter | openai | accounts/fireworks/models/llama4-scout-instruct-basic |
OllamaAdapter | openai | llama3.2 |
PromptAPIAdapter | — | Chrome built-in |
See LLM Adapters for constructor options and detailed usage.
TTS providers
All TTS adapters implement the TTSProvider interface:
interface TTSProvider {
readonly id: string;
synthesize(text: string, opts?: TTSOptions): Promise<Blob>;
}The synthesize method takes text and returns an audio blob (MP3 by default).
Built-in adapters
| Adapter | Default voice | Default model |
|---|---|---|
ElevenLabsAdapter | Rachel (21m00Tcm4TlvDq8ikWAM) | eleven_multilingual_v2 |
OpenAITTSAdapter | alloy | tts-1 |
AzureTTSAdapter | en-US-JennyNeural | — |
GoogleTTSAdapter | en-US-Standard-C | — |
See TTS Adapters for constructor options and detailed usage.
STT providers
AvatarLayer supports both batch and realtime speech-to-text. Batch adapters implement STTProvider, realtime adapters implement RealtimeSTTProvider.
Batch adapters
| Adapter | Default model |
|---|---|
OpenAISTTAdapter | whisper-1 |
GoogleSTTAdapter | — |
AzureSTTAdapter | — |
Realtime adapters
| Adapter | Transport | Default model |
|---|---|---|
DeepgramSTTAdapter | WebSocket | nova-3 |
ElevenLabsSTTAdapter | WebSocket | — |
AzureSpeechSTTAdapter | WebSocket | — |
AmazonTranscribeSTTAdapter | WebSocket | — |
WebSpeechSTTAdapter | Browser API | — |
See STT Adapters for constructor options, token URL patterns, and detailed usage.
LLM options
Options passed to the chat method or set on the session:
| Option | Type | Description |
|---|---|---|
model | string | Override the default model |
temperature | number | Sampling temperature |
maxTokens | number | Maximum output tokens |
reasoningEffort | "none" | "low" | "medium" | "high" | Extended thinking budget (Anthropic, OpenAI reasoning models) |
systemPrompt | string | System message content |
language | string | BCP-47 language code for the expected output (e.g. "en", "es", "ja") |
signal | AbortSignal | Cancellation signal |
TTS options
Options passed to the synthesize method:
| Option | Type | Description |
|---|---|---|
voiceId | string | Override the default voice |
modelId | string | Override the default TTS model |
speed | number | Playback speed |
outputFormat | string | Audio format (default: mp3_44100_128) |
stability | number | Voice stability (ElevenLabs) |
similarityBoost | number | Similarity boost (ElevenLabs) |
signal | AbortSignal | Cancellation signal |