Local ML
On-device TTS, STT, VAD, and embeddings via WebGPU and WASM.
The avatarlayer/local subpath exports adapters that run entirely in the browser — no API keys or server calls required. These use WebGPU, WASM, and ONNX Runtime for on-device inference.
import { KokoroTTSAdapter, SileroVADAdapter } from "avatarlayer/local";WebGPU support
Check whether the browser supports WebGPU and detect the best inference device:
import { isWebGPUSupported, detectBestDevice } from "avatarlayer/local";
if (isWebGPUSupported()) {
const device = await detectBestDevice(); // "webgpu" | "wasm" | "cpu"
}TTS adapters
KokoroTTSAdapter
High-quality on-device TTS using the Kokoro model.
import { KokoroTTSAdapter } from "avatarlayer/local";
const tts = new KokoroTTSAdapter({
voice: "af_heart", // optional
speed: 1.0, // optional
});| Option | Type | Default | Description |
|---|---|---|---|
voice | string | — | Voice preset |
speed | number | 1.0 | Speech speed multiplier |
KittenTTSAdapter
Lightweight on-device TTS using the Kitten model.
import { KittenTTSAdapter } from "avatarlayer/local";
const tts = new KittenTTSAdapter({
model: "KittenML/kitten-tts-nano-0.8", // optional
voice: "Bella", // optional
speed: 1.0, // optional
});| Option | Type | Default | Description |
|---|---|---|---|
model | string | "KittenML/kitten-tts-nano-0.8" | Model identifier |
voice | string | "Bella" | Voice preset |
speed | number | 1.0 | Speech speed multiplier |
STT adapters
WhisperLocalAdapter
On-device Whisper STT using ONNX Runtime. Batch transcription only.
import { WhisperLocalAdapter } from "avatarlayer/local";
const stt = new WhisperLocalAdapter();
const text = await stt.transcribe(audioBlob);WhisperTransformersAdapter
On-device Whisper STT using Transformers.js.
import { WhisperTransformersAdapter } from "avatarlayer/local";
const stt = new WhisperTransformersAdapter({
model: "onnx-community/whisper-base", // optional
});| Option | Type | Default | Description |
|---|---|---|---|
model | string | — | Whisper model identifier |
Realtime STT (local)
Both Whisper adapters have realtime variants that pair with SileroVADAdapter for voice-activated transcription:
import {
WhisperTransformersAdapter,
WhisperTransformersRealtimeAdapter,
SileroVADAdapter,
} from "avatarlayer/local";
const whisper = new WhisperTransformersAdapter();
const vad = new SileroVADAdapter();
await vad.init();
const realtimeSTT = new WhisperTransformersRealtimeAdapter(whisper, vad);The realtime adapter uses VAD to detect speech segments, transcribes them with Whisper, and emits transcript events — fully compatible with startListening().
WhisperLocalRealtimeAdapter works the same way with WhisperLocalAdapter.
VAD
SileroVADAdapter
Neural-network voice activity detection using the Silero VAD ONNX model. More accurate than AmplitudeVADAdapter, especially in noisy environments.
import { SileroVADAdapter } from "avatarlayer/local";
const vad = new SileroVADAdapter({
positiveSpeechThreshold: 0.5, // default
negativeSpeechThreshold: 0.35, // default
minSpeechMs: 250, // default
silenceDurationMs: 500, // default
});
await vad.init();| Option | Type | Default | Description |
|---|---|---|---|
positiveSpeechThreshold | number | 0.5 | Speech probability threshold to start |
negativeSpeechThreshold | number | 0.35 | Threshold to trigger speech end |
minSpeechMs | number | 250 | Min speech duration before committing |
silenceDurationMs | number | 500 | Silence duration before speech end |
Embeddings
TransformersEmbeddingProvider
On-device text embeddings using Transformers.js. Use with vector thread providers for client-side semantic recall.
import { TransformersEmbeddingProvider } from "avatarlayer/local";
const embedder = new TransformersEmbeddingProvider({
model: "Xenova/all-MiniLM-L6-v2", // optional
});Use it with LocalStorageVectorThreadProvider for fully offline semantic memory:
import {
LocalStorageVectorThreadProvider,
TransformersEmbeddingProvider,
} from "avatarlayer/local";
const session = new AvatarSession({
// ...config
memory: {
provider: new LocalStorageVectorThreadProvider(),
semanticRecall: {
embedder: new TransformersEmbeddingProvider(),
topK: 5,
},
},
});