Local ML

On-device TTS, STT, VAD, and embeddings via WebGPU and WASM.

The avatarlayer/local subpath exports adapters that run entirely in the browser — no API keys or server calls required. These use WebGPU, WASM, and ONNX Runtime for on-device inference.

import { KokoroTTSAdapter, SileroVADAdapter } from "avatarlayer/local";

WebGPU support

Check whether the browser supports WebGPU and detect the best inference device:

import { isWebGPUSupported, detectBestDevice } from "avatarlayer/local";

if (isWebGPUSupported()) {
  const device = await detectBestDevice(); // "webgpu" | "wasm" | "cpu"
}

TTS adapters

KokoroTTSAdapter

High-quality on-device TTS using the Kokoro model.

import { KokoroTTSAdapter } from "avatarlayer/local";

const tts = new KokoroTTSAdapter({
  voice: "af_heart",     // optional
  speed: 1.0,             // optional
});
OptionTypeDefaultDescription
voicestringVoice preset
speednumber1.0Speech speed multiplier

KittenTTSAdapter

Lightweight on-device TTS using the Kitten model.

import { KittenTTSAdapter } from "avatarlayer/local";

const tts = new KittenTTSAdapter({
  model: "KittenML/kitten-tts-nano-0.8",  // optional
  voice: "Bella",                           // optional
  speed: 1.0,                               // optional
});
OptionTypeDefaultDescription
modelstring"KittenML/kitten-tts-nano-0.8"Model identifier
voicestring"Bella"Voice preset
speednumber1.0Speech speed multiplier

STT adapters

WhisperLocalAdapter

On-device Whisper STT using ONNX Runtime. Batch transcription only.

import { WhisperLocalAdapter } from "avatarlayer/local";

const stt = new WhisperLocalAdapter();
const text = await stt.transcribe(audioBlob);

WhisperTransformersAdapter

On-device Whisper STT using Transformers.js.

import { WhisperTransformersAdapter } from "avatarlayer/local";

const stt = new WhisperTransformersAdapter({
  model: "onnx-community/whisper-base",  // optional
});
OptionTypeDefaultDescription
modelstringWhisper model identifier

Realtime STT (local)

Both Whisper adapters have realtime variants that pair with SileroVADAdapter for voice-activated transcription:

import {
  WhisperTransformersAdapter,
  WhisperTransformersRealtimeAdapter,
  SileroVADAdapter,
} from "avatarlayer/local";

const whisper = new WhisperTransformersAdapter();
const vad = new SileroVADAdapter();
await vad.init();

const realtimeSTT = new WhisperTransformersRealtimeAdapter(whisper, vad);

The realtime adapter uses VAD to detect speech segments, transcribes them with Whisper, and emits transcript events — fully compatible with startListening().

WhisperLocalRealtimeAdapter works the same way with WhisperLocalAdapter.

VAD

SileroVADAdapter

Neural-network voice activity detection using the Silero VAD ONNX model. More accurate than AmplitudeVADAdapter, especially in noisy environments.

import { SileroVADAdapter } from "avatarlayer/local";

const vad = new SileroVADAdapter({
  positiveSpeechThreshold: 0.5,   // default
  negativeSpeechThreshold: 0.35,  // default
  minSpeechMs: 250,                // default
  silenceDurationMs: 500,          // default
});

await vad.init();
OptionTypeDefaultDescription
positiveSpeechThresholdnumber0.5Speech probability threshold to start
negativeSpeechThresholdnumber0.35Threshold to trigger speech end
minSpeechMsnumber250Min speech duration before committing
silenceDurationMsnumber500Silence duration before speech end

Embeddings

TransformersEmbeddingProvider

On-device text embeddings using Transformers.js. Use with vector thread providers for client-side semantic recall.

import { TransformersEmbeddingProvider } from "avatarlayer/local";

const embedder = new TransformersEmbeddingProvider({
  model: "Xenova/all-MiniLM-L6-v2",  // optional
});

Use it with LocalStorageVectorThreadProvider for fully offline semantic memory:

import {
  LocalStorageVectorThreadProvider,
  TransformersEmbeddingProvider,
} from "avatarlayer/local";

const session = new AvatarSession({
  // ...config
  memory: {
    provider: new LocalStorageVectorThreadProvider(),
    semanticRecall: {
      embedder: new TransformersEmbeddingProvider(),
      topK: 5,
    },
  },
});