AvatarLayer — pluggable SDK for realtime conversational avatars

The VRMLocalRenderer renders a 3D VRM model in the browser using Three.js and @pixiv/three-vrm. It includes automatic blink animation, expression presets, and lip-sync (RMS-based or viseme-based).

Installation

npm install three @pixiv/three-vrm

Usage

import { VRMLocalRenderer } from "avatarlayer/renderers";

const renderer = new VRMLocalRenderer({
  modelUrl: "/models/avatar.vrm",
  idleAnimationUrl: "/animations/idle.fbx",  // optional
  visemeLipSync: true,                        // optional
  renderer: "webgpu",                         // optional, default "webgl"
});

Constructor options

Option	Type	Default	Description
`modelUrl`	`string`	required	URL to the .vrm model file
`idleAnimationUrl`	`string`	—	Optional URL to an idle animation (FBX/GLB)
`visemeLipSync`	`boolean`	—	Enable viseme-based lip-sync instead of RMS amplitude mapping
`renderer`	`'webgl' \| 'webgpu'`	`'webgl'`	Rendering backend. When `'webgpu'`, uses Three.js WebGPURenderer with MToonNodeMaterial. Falls back to WebGL automatically when WebGPU is unavailable.

How it works

When mounted, the renderer:

Resolves the rendering backend (WebGL or WebGPU), falling back to WebGL if needed
Creates a Three.js scene with camera and lighting
Loads the VRM model via GLTFLoader with the VRM plugin (using MToonNodeMaterial for WebGPU)
Starts an automatic blink loop (random interval, 3-6 seconds)
Optionally loads an idle animation from the provided URL
Registers with a shared render pool that drives all VRM renderers from a single requestAnimationFrame loop

When speak(audio) is called:

The audio blob is played through an <audio> element
A LipSyncEngine analyzes the audio in realtime via AnalyserNode
Audio is mapped to the VRM mouth-open expression — either via RMS amplitude (Aa preset) or viseme weights when visemeLipSync is enabled
The promise resolves when the audio ends

Avatar control

The VRM renderer responds to update() calls for fine-grained control:

session.updateControl({
  avatar: {
    face: {
      mouth: { jawOpen: 0.5, smile: 0, mouthPucker: 0 },
      eyes: { blinkL: 0, blinkR: 0, gazeX: 0, gazeY: 0 },
    },
    emotion: {
      label: "happy",
      intensity: 0.8,
      valence: 0.7,
      arousal: 0.5,
    },
  },
});

Supported expression presets: happy, sad, angry, surprised, relaxed, neutral.

Resize handling

The renderer automatically responds to container resize via ResizeObserver, updating the camera aspect ratio and display canvas dimensions.

The renderer supports both WebGL (default) and WebGPU rendering backends. When renderer: 'webgpu' is set and the browser supports WebGPU, the renderer uses Three.js WebGPURenderer with MToonNodeMaterial for VRM materials. When WebGPU is unavailable, it falls back to WebGL automatically.

Shared GPU context

When multiple VRMLocalRenderer instances are active simultaneously — for example, in an AvatarStage with several characters — they transparently share GPU contexts and a single render loop. Each renderer maintains its own Three.js scene, camera, and VRM model, but a module-level render pool drives all of them from one requestAnimationFrame callback. WebGL and WebGPU entries use separate shared renderers, so you can mix backends in the same page.

On each frame the pool renders each scene into the shared GPU canvas, then blits the result to that renderer's 2D display canvas via drawImage(). This collapses N GPU contexts into one, avoiding the resource limits and performance costs that multiple contexts cause on mobile devices.

This optimization is fully transparent — the AvatarRenderer interface is unchanged and client code does not need to opt in. A single renderer works identically; the pool activates automatically when any VRMLocalRenderer is mounted.

VRM (Local)