AvatarLayer — pluggable SDK for realtime conversational avatars

AvatarLayer's provider model is interface-based. Implement any of the provider interfaces to add your own integrations.

Custom LLM

Implement the LLMProvider interface:

import type {
  LLMProvider,
  ChatMessage,
  LLMChunk,
  LLMOptions,
} from "avatarlayer";

class MyLLM implements LLMProvider {
  readonly id = "my-llm";

  async *chat(
    messages: ChatMessage[],
    opts?: LLMOptions,
  ): AsyncIterable<LLMChunk> {
    const response = await fetch("https://my-llm-api.com/chat", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ messages, model: opts?.model }),
      signal: opts?.signal,
    });

    const reader = response.body!.getReader();
    const decoder = new TextDecoder();

    while (true) {
      const { done, value } = await reader.read();
      if (done) break;
      yield { text: decoder.decode(value), done: false };
    }

    yield { text: "", done: true };
  }
}

Key requirements:

The chat method must be an async generator yielding LLMChunk objects
The final chunk should have done: true
Respect opts.signal for cancellation support

Custom TTS

Implement the TTSProvider interface:

import type { TTSProvider, TTSOptions } from "avatarlayer";

class MyTTS implements TTSProvider {
  readonly id = "my-tts";

  async synthesize(text: string, opts?: TTSOptions): Promise<Blob> {
    const response = await fetch("https://my-tts-api.com/synthesize", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({
        text,
        voice: opts?.voiceId ?? "default",
      }),
      signal: opts?.signal,
    });

    return response.blob();
  }
}

Key requirements:

Return an audio Blob that the browser can decode (MP3, WAV, OGG, etc.)
Respect opts.signal for cancellation support

Custom renderer

Implement the AvatarRenderer interface:

import type { AvatarRenderer, AvatarControl } from "avatarlayer";

class MyRenderer implements AvatarRenderer {
  readonly id = "my-renderer";
  readonly type = "local" as const;

  private container: HTMLElement | null = null;
  private audio: HTMLAudioElement | null = null;
  private resolve: (() => void) | null = null;

  async mount(container: HTMLElement): Promise<void> {
    this.container = container;
    // Set up your rendering surface (canvas, video, etc.)
  }

  update(control: Partial<AvatarControl>): void {
    // Apply avatar state changes (face, emotion, etc.)
  }

  async speak(audio: Blob): Promise<void> {
    return new Promise((resolve) => {
      this.resolve = resolve;
      const url = URL.createObjectURL(audio);
      this.audio = new Audio(url);
      this.audio.onended = () => {
        URL.revokeObjectURL(url);
        this.resolve = null;
        resolve();
      };
      this.audio.play();
    });
  }

  interrupt(): void {
    if (this.audio) {
      this.audio.pause();
      this.audio = null;
    }
    if (this.resolve) {
      this.resolve();
      this.resolve = null;
    }
  }

  unmount(): void {
    this.interrupt();
    this.container = null;
  }
}

Key requirements:

mount() must resolve once the renderer is ready to display
speak() must resolve when audio playback finishes
interrupt() must stop playback immediately and resolve any pending speak() promise
unmount() must clean up all resources

Optional: speakText

If your renderer handles TTS internally (like HeyGen), implement speakText and AvatarSession will skip external TTS:

async speakText(text: string, signal?: AbortSignal): Promise<void> {
  // Send text to your service, wait for speech to complete
}

Using custom adapters

const session = new AvatarSession({
  llm: new MyLLM(),
  tts: new MyTTS(),
  renderer: new MyRenderer(),
});

Custom adapters are first-class citizens — they work with all session features including interruption, runtime swaps, and React bindings.

Custom Adapters

Custom LLM

Custom TTS

Custom renderer

Optional: speakText

Using custom adapters

On this page