Technical Implementation Notes

Internal development documentation and pipeline specifications


Conversational Video Pipeline (Option 2 + Smart Clip Router)

Goal: Real-time feeling, high visual fidelity. We mix pre-rendered “hero” clips with live lipsynced talk loops.

States

IDLE → SUMMON → TALK → OUTRO (+ GLITCH when needed)

Clip Library (per persona)

TypeCountLengths (s)Notes
Idle loops2–320–30Micro-motion only
Talk bases6–104/6/8/12Neutral face motion, bookend pose
Summons2–32–3Particles → head
Outros2–31–2Head → particles
Glitch masks2–31 (loop)For >2 s latency spikes
Accent bursts2–30.5–1Persona color/shape flourishes
FAQ hero clips30–50variableFull animation + baked audio

Bookend pose: Every clip starts/ends on the same still frame (12–16 frames) for clean cuts.

Router Logic (pseudo)

q = transcribe(audio)
vec = embed(q)
hit = faiss.search(vec, top_k=1)

if hit.score > 0.83 and cooldown_ok(hit.id):
    play_clip(hit.clip_path)          # Prebaked FAQ
else:
    answer = llm(q, persona)
    wav = tts(answer)
    base = choose_loop(len(wav))       # 4/6/8/12 s
    talk = wav2lip(base, wav, roi="mouth")
    play_clip(talk)

File Format

Latency Budget

To-Do

  1. Draft FAQ list + canonical answers (30–50 per persona).
  2. Generate particle packs and all state clips (bookend pose locked).
  3. Build router microservice (embeddings + FAISS).
  4. Integrate Wav2Lip/GeneFace++ ROI pipeline.
  5. Wire TouchDesigner/Resolume state machine (OSC/HTTP triggers).
  6. Test end-to-end latency, add missing FAQs from logs.

Development Pipeline Notes

Content Production Workflow

Asset Generation Priority:

  1. Vonnegut persona pack (primary launch target)
  2. Default Oracle baseline interactions
  3. FAQ library expansion based on user logs

Technical Stack Integration:

Performance Optimization

GPU Requirements:

Fallback Strategies:


Integration Notes

This pipeline connects to the main Pepper’s Ghost installation described in the implementation strategy and technical specifications. The conversational system operates independently of the display technology choice, allowing flexibility in deployment scenarios.

Cross-Reference


Clip Library JSON Schema

Schema definition for the Oracle clip library system. This schema validates all clip metadata entries to ensure consistency across the content pipeline.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "OracleClip",
  "type": "object",
  "properties": {
    "id": { "type": "string" },
    "persona": { "type": "string" },
    "type": { "type": "string", "enum": ["idle","summon","talk_base","outro","glitch","accent","faq"] },
    "duration": { "type": "number" },
    "emotion": { "type": "string" },
    "path": { "type": "string" },
    "bookend": { "type": "boolean", "default": true },
    "cooldown": { "type": "number", "description": "seconds before reuse allowed" },
    "tags": { "type": "array", "items": { "type": "string" } }
  },
  "required": ["id","persona","type","duration","path"]
}

Schema Usage

This schema defines the structure for all video clips in the Oracle system:

The schema file is also available at /spec/clip_schema.json for direct integration with validation tools.


Visual Pipeline Demo

Particle Consolidation Demo — Demonstration of particles consolidating to form a persona, then dispersing back to neutral/idle particle animation. This represents the SUMMON and OUTRO states in the pipeline.

This demo illustrates the core visual transformation of the Oracle Entity system:

Oracle Entity Introduction Demo

Project Introduction by Base Persona — Demonstration of the Oracle Entity introducing the Echoes of Indiana project, showing how the persona would welcome and orient visitors to the experience.


Based on recent technical discussions, here are key updates to consider for the implementation approach:

1. Simplified Technical Pipeline

States: IDLE → SUMMON → TALK → OUTRO (+ GLITCH when needed)

TALK Options:

Particle Animation Pipeline:

2. Technology Stack Additions

Add:

Downplay:

3. Budget Reallocation

Reallocate:

4. Updated Phase 1 Implementation

Months 1-2: Foundation

5. Particle Design Language

Each Oracle persona has a unique particle signature:

Base System

Persona Signatures

6. Artistic Rationale

”Why Particles?”

Oracle entities manifest as living constellations of light particles rather than photorealistic faces. This aesthetic choice:

7. Simplified Response Times

Response Times:

8. Updated Experience Description

Consider updating visitor-facing descriptions to:

“The Entity responds through directional speakers, their particle form pulsing and flowing with the rhythm of speech, occasionally coalescing into clearer features during profound moments”

9. Collaboration Opportunities

Seeking Technical Collaborator:

10. Simplified Technical Flow

Audio Input → Speech-to-Text → FAQ Matching/LLM
     ↓                              ↓
Particle Modulation ← Video Selection ← Response

Key Insight: Pre-rendered FAQ responses eliminate 80% of processing time, enabling instant playback of high-quality interactions. Only truly novel questions require live generation.

Implementation Notes

The main goal: Shift the narrative from “complex technical challenge” to “artistic choice that happens to be technically smart.” The particle approach isn’t a compromise—it’s a more magical solution than photorealism would be.


Real-Time Streaming Avatar Pipeline Research

Glass-to-glass latency under 1 second is achievable

The research confirms that sub-1-second latency is achievable using modern streaming technologies and optimized pipelines. The most promising approach combines HeyGen’s WebRTC-based streaming avatar API (150-250ms latency) with ElevenLabs streaming TTS (150-300ms) and efficient particle masking systems, achieving total glass-to-glass latency of 300-500ms on a single RTX 4090 workstation.

Avatar Animation Pipeline Recommendations

Primary recommendation: HeyGen Streaming Avatar API HeyGen emerges as the optimal commercial solution for immediate deployment, offering 150-250ms glass-to-glass latency through WebRTC streaming with LiveKit integration. The API provides built-in audio-to-viseme mapping, alpha channel support for holographic displays, and production-ready TypeScript/JavaScript SDKs. At $0.20-$0.30 per minute, it balances cost with performance for rapid prototyping.

Secondary option: NVIDIA Audio2Face 2.0 For maximum performance and local control, NVIDIA’s Audio2Face achieves the lowest theoretical latency at ~50ms with RTX optimization. This on-premises solution requires Omniverse ecosystem setup but provides industry-leading facial blendshape generation and full data control. The RTX 4090’s architecture is specifically optimized for this workload.

Open-source alternative: GeneFace++ GeneFace++ offers real-time NeRF-based 3D talking face generation with no per-minute costs. While requiring significant development investment and ML expertise, it provides complete customization and can achieve real-time performance on RTX 4090 hardware.

Particle Systems and Masking Techniques

TouchDesigner leads for real-time particle effects TouchDesigner proves optimal for particle-based avatar emergence effects, capable of handling 200,000+ particles at 60fps on RTX 4090. The platform uses texture-based particle systems where RGB textures encode particle positions, updated via feedback loops.

Unity VFX Graph for maximum particle count Unity’s Visual Effect Graph supports 1 million+ particles with GPU simulation on RTX 4090, making it suitable for complex emergence effects. The platform offers mesh-based particle emission for face formation and WebRTC integration for real-time input.

Real-Time Pipeline Architecture

WebSocket/WebRTC migration strategy Transitioning from REST to WebSocket reduces latency by 10x, from ~500ms to ~50ms. The recommended implementation uses python-socketio or aiortc for WebRTC, with connection management including exponential backoff retry logic and 15-second heartbeat intervals. ElevenLabs’ WebSocket API with optimized chunk scheduling ([120, 160, 250, 290]) achieves 150-300ms time-to-first-byte.

Microservices architecture pattern The optimal architecture separates speech processing, avatar animation, and delivery into distinct services communicating via Redis message queues. This enables horizontal scaling, with load balancing across multiple avatar instances. The pipeline maintains 150-300ms adaptive buffering with jitter compensation based on network conditions.

Cost Analysis and Deployment Strategy

Operational costs scale with usage Basic deployments start at $0.13/minute using cloud resources, scaling to $0.52/minute for full production configurations. Monthly costs range from $500 for prototypes to $15,000+ for production systems. The break-even point for self-hosted versus API-based solutions typically occurs at 6-12 months with $3,000-$5,000 monthly usage.

Recommended deployment approach Start with HeyGen’s API for rapid prototyping at $99-$330/month, enabling immediate testing without infrastructure investment. Simultaneously develop the particle masking system using TouchDesigner on local RTX 4090 hardware. Once monthly costs exceed $3,000, transition to a hybrid approach with on-premise Audio2Face for avatar generation while maintaining cloud APIs for overflow capacity.

Technical Implementation Roadmap

Phase 1 (Week 1-2): Deploy HeyGen streaming avatar with basic WebSocket integration, achieving <500ms latency baseline. Implement TouchDesigner particle system prototype with alpha channel output.

Phase 2 (Week 3-4): Integrate ElevenLabs streaming TTS with alignment data. Develop adaptive buffering system targeting 150-300ms. Configure holographic display with appropriate codec support.

Phase 3 (Week 5-6): Implement particle-to-face morphing transitions. Optimize pipeline for consistent <1 second glass-to-glass latency. Add monitoring and failover systems.

Phase 4 (Week 7-8): Performance tune for production deployment. Implement caching strategies for common responses. Document API usage patterns for cost optimization.

This architecture achieves the required <1 second latency while providing flexibility to scale from prototype to production deployment on a single RTX 4090 workstation.