Renderers

HeyGen

Streaming avatars with built-in TTS via HeyGen's API.

HeyGenRenderer connects to HeyGen's Streaming Avatar API. Unlike other renderers, HeyGen handles TTS internally — you send text and HeyGen returns lip-synced video and audio. This means no separate TTS provider is needed.

Installation

npm install livekit-client

Usage

import { HeyGenRenderer } from "avatarlayer";

const renderer = new HeyGenRenderer({
  createSession: async () => {
    const resp = await fetch("/api/heygen", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ heygenApiKey: "..." }),
    });
    return resp.json();
  },
  sendTask: async (sessionId, text) => {
    const resp = await fetch("/api/heygen/task", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ heygenApiKey: "...", sessionId, text }),
    });
    return resp.json();
  },
  interruptSession: async (sessionId) => {
    await fetch("/api/heygen/interrupt", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ heygenApiKey: "...", sessionId }),
    });
  },
  closeSession: async (sessionId) => {
    await fetch(`/api/heygen/${sessionId}`, {
      method: "DELETE",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ heygenApiKey: "..." }),
    });
  },
});

Constructor options

OptionTypeDescription
createSession() => Promise<HeyGenSession>Required. Creates a HeyGen streaming session and returns LiveKit connection info.
sendTask(sessionId: string, text: string) => Promise<HeyGenTaskResult>Required. Sends text to HeyGen for speech synthesis.
interruptSession(sessionId: string) => Promise<void>Optional. Interrupts the current speech task.
closeSession(sessionId: string) => Promise<void>Optional. Closes the session when done.
onVideoStream(stream: MediaStream | null) => voidCalled when video track is received/lost
onAudioStream(stream: MediaStream | null) => voidCalled when audio track is received/lost
onStateChange(state: string) => voidCalled on connection state changes

Session and task types

interface HeyGenSession {
  sessionId: string;
  livekitUrl: string;
  livekitToken: string;
}

interface HeyGenTaskResult {
  taskId?: string | null;
  durationMs?: number | null;
}

How speakText works

Because HeyGen handles TTS internally, the renderer implements speakText() instead of relying on speak(audio):

  1. AvatarSession detects that the renderer has speakText and skips external TTS
  2. For each sentence, speakText(text) calls sendTask on your server endpoint
  3. HeyGen synthesizes speech and streams lip-synced video through the LiveKit room
  4. The promise resolves after an estimated speech duration (from durationMs or word count heuristic)

Session configuration

When creating a session with HeyGen, the TTS provider is optional:

const session = new AvatarSession({
  llm: new OpenAIAdapter({ apiKey: "..." }),
  // No TTS needed — HeyGen handles it
  renderer: heygenRenderer,
  systemPrompt: "You are a helpful assistant.",
});