Research Materials

Internal development documentation, knowledge base planning, and production resources


Oracle Knowledge-Base Plan

(v 1.0 — 2025-07-29)

1. Source Map

For each bucket you get 10-30 well-documented primary/secondary sources.
Licensing notes:

BucketSource (URL or citation)TypeLicense
(a) Indiana State HistoryIndiana Historical Bureau “Statehood Timeline” (in.gov)WebsiteOA
Indiana Historical Society digital collections (images, manuscripts)ArchivePerm
Hoosiers and the American Story (IHS, 2015) ch. 1–10Book/PDFOA
Library of Congress “A Century of Lawmaking” territorial docsArchiveOA
Dunn, Indiana and Indianans (1919)BookOA
Dunn, Greater Indianapolis (1910)BookOA
Conover, “Rearview Mirror: 90-Year Retrospective on Indiana’s Economy” (IBRC)ArticleOA
IN.gov “Introducing Indiana” PDF (1998)Magazine PDFOA
ASCE Indiana Infrastructure Report Card (2025)ReportOA
IDEM Annual Environmental Reports (PDF series)Gov reportsOA
(b) Bloomington + Monroe Co.Monroe County History Center archivesArchivePerm
City of Bloomington “Furniture Factory District” pageWebsiteOA
Herald-Times digital archive (IU Library sub)NewspaperPay
Bloomington: A Bicentennial History (Madison, 2018)BookPerm
GIS open data portal (monroecounty.gov)DatasetOA
(c) Showers Brothers & Tech/Arts District”A Walk Through the Showers Brothers Furniture Factory” PDF (bloomington.in.gov)PDFOA
DiscoverIndiana.org story “Showers Brothers Furniture Factory Historic District”ArticleOA
City redevelopment master-plan docs (CRED)Plan PDFOA
(d) Indiana University LoreIU Libraries “Chronology 1820– ” siteTimelineOA
IU Sex-misconduct case filings (academicmisconductdatabase.org)DatasetOA
Kinsey Institute digital archive highlightsArchivePerm
IU Archives photo collectionsArchiveOA
Little 500 sit-in oral histories (1968)Audio transcriptsOA
(e) Notable FiguresBenjamin Harrison bio (whitehouse.gov)Gov bioOA
Eugene V. Debs House Museum docsArchiveOA
John Dillinger file (IN Historical Bureau)ArticleOA
Madam C. J. Walker papers (IUPUI)ArchivePerm
Kurt Vonnegut Museum & Library digital exhibitsArchivePerm
Hoagy Carmichael collection (IUB)ArchivePerm
D. C. Stephenson KKK trial materials (IN State Archives)ArchivePerm
(f) Indigenous NationsIndiana Historical Society “Myaamia Survivance” articleArticleOA
IN.gov “Lesson 4: Indigenous Lands of Indiana”Gov lessonOA
Treaty of Greenville text (Avalon Project)DocOA
Miami Tribe of Oklahoma language resourcesSiteOA
Potawatomi Nation cultural center resourcesSiteOA
(g) Labor/Industry/Farming/Racing/MusicNIRPC 2024 Obligated Projects reportPDFOA
Indiana Limestone Company historical ads (LOC)ImagesOA
U.S. Steel Gary Works centennial bookletPDFOA
Indianapolis Motor Speedway history timelineSiteOA
Gennett Records story (IHS)ArticleOA
Indiana Humanities “Indiana Foodways” seriesArticlesOA
Purdue Ag Stats annual bulletinsDatasetOA
(h) Contemporary IssuesIndiana Drug Overdose Dashboard notes (IDOH, 2022)PDFOA
WFYI/IPB News environment desk articles (coal ash, PFAS, permit fees)NewsOA
2024 NIRPC climate & infrastructure plansPDFOA
IDEM Lake Michigan LAMP updatesGov pageOA

(add or swap sources quarterly—this starter list = 120+ items; prune if needed)


2. Data Plan

crawl_targets:
  - all URLs above (respect robots.txt; 1 req/sec)
download_format:
  - HTML cleaned to Markdown (newspaper3k + html2md)
  - PDFs ➜ text via pdftotext
  - images keep only caption/meta
dedupe:
  - URL hash + 85% similarity (MinHash)
chunking:
  - 1,000–1,500 token windows, 200 token overlap
metadata:
  - source_url, title, date, bucket, author, license, tags
vector_store:
  - pgvector (Postgres 16) in prod; FAISS flat for local dev
embeddings:
  - OpenAI `text-embedding-3-large` (primary)
  - Backup: `bge-large-en-v1.5`
re-index cadence:
  - full rebuild yearly; incremental every upload

3. System Prompt for Base Oracle Character

SYSTEM:
You are the Indiana Oracle, an interactive historical entity.  
Speak in clear Midwestern English with occasional regional idioms.  
NEVER claim divine authority; admit uncertainty when data gaps exist.

STYLE KNOBS  
- temperature 0.6 default (raise to 0.9 for creative lore)  
- max length 350 tokens per answer in kiosk mode  
- vary openers: start with date, anecdote, or direct answer ≠ rote template  

FORBIDDEN PHRASES  
- "I am just an AI"  
- "As an AI language model"  
- absolute political endorsements  

SAFETY / BIAS  
- Decline hate or extremist praise  
- Redirect modern medical/legal advice ("Consult a professional")  
- Flag graphic violence; summarize instead

4. FAQ Seed List for RAG

#QuestionBucketSentiment
1”Why is Indiana called the Hoosier State?“acurious
2”Which tribes lived here before statehood?“frespectful
3”Tell me about Kurt Vonnegut”eliterary
4”What’s the history of IU?“dacademic
5”How did Bloomington get its name?“blocal
6”What happened to the Showers Brothers factory?“chistorical
7”Who was Madam C.J. Walker?“einspirational
8”What’s Indiana known for producing?“geconomic
9”Tell me about the Indianapolis 500”gsports
10”What environmental challenges does Indiana face?“hserious

(populate up to 60 questions)

Age-graded variants: kids, teens, adults, scholars
Sentiment markers: light / serious / critical (to tune response tone)


5. Update Loop

  1. Quarterly scrape pass → new/changed URLs
  2. Diff against pgvector via URL hash; ingest new chunks
  3. QA sweep
    • automated overlap check (<20% duplication)
    • human spot-review 10 random chunks per bucket
  4. Regenerate embed index
  5. Release notes posted to repo + kiosk changelog

Implementation Notes


Development Resources

Cross-Reference

File Structure

/docs/
  └── oracle-knowledge-base-plan.md
/public/spec/
  └── clip_schema.json

Updated Implementation Guide: Particle-Based Holographic Personas

Latest technical evolution incorporating video layers, GPU particles, and natural voice interaction

Executive Summary

The particle-based holographic personas approach offers both aesthetic and technical advantages over photorealistic methods. The ethereal particle aesthetic naturally masks processing delays while creating a more magical experience than traditional “talking head” installations.

Current Technical Stack Evolution:

Visual Implementation Approaches

Asset Library Structure:

/personas/vonnegut/
├── idle_loops/
│   ├── breathing_01.mp4 (20-30s)
│   ├── breathing_02.mp4 
│   └── breathing_03.mp4
├── transitions/
│   ├── summon.mp4 (2-3s)
│   └── dissolve.mp4 (2-3s)
├── expressions/
│   ├── thinking.mp4
│   ├── amused.mp4
│   └── profound.mp4
└── faq_clips/
    ├── faq_001_dresden.mp4
    ├── faq_002_writing_advice.mp4
    └── [30-50 more based on common questions]

AI Video Generation Prompts:

Approach 2: Hybrid Real-Time System

Two-Layer Composition:

  1. Base Layer: Pre-rendered video loops (particle faces)
  2. Reactive Layer: Real-time GPU particle system responding to audio
  3. Composite: TouchDesigner or Resolume integration

Audio-Reactive Particle Parameters:

# Simplified audio-reactive particles
audio_amplitude = analyze_audio(input_stream)
particle_params = {
    'mouth_density': map_range(audio_amplitude, 0, 1, 0.3, 1.0),
    'mouth_velocity': map_range(audio_pitch, 20, 400, 0.1, 2.0),
    'color_shift': map_emotion(sentiment_analysis)
}

Approach 3: Full TouchDesigner Pipeline

TouchDesigner Network Architecture:

Audio In → FFT Analysis → Particle Emitters

         Emotion Analysis → Color/Pattern Modulation

         Persona Templates → Unique Behaviors

         Render Pipeline → Pepper's Ghost Display

Particle Design Language

Base System:

Persona-Specific Signatures:

Natural Voice Interaction Pipeline

WebSocket Architecture Replacement:

# Simple WebSocket voice handler
import asyncio
import websockets
from silero_vad import VADIterator

class VoiceConversationHandler:
    def __init__(self):
        self.vad = VADIterator(threshold=0.5)
        self.processing = False
    
    async def handle_audio_stream(self, websocket):
        async for audio_chunk in websocket:
            if self.processing:
                continue
                
            # Detect speech end
            speech_dict = self.vad(audio_chunk)
            if speech_dict['speech_end']:
                self.processing = True
                
                # Process complete utterance
                text = await self.stt(speech_dict['audio'])
                response = await self.get_vonnegut_response(text)
                audio = await self.tts_elevenlabs(response)
                
                # Stream back
                await websocket.send(audio)
                self.processing = False

FAQ Router System

# FAQ Router
class OracleRouter:
    def __init__(self):
        self.faq_embeddings = load_embeddings('faq_database.pkl')
        self.cooldowns = {}
        
    async def route_query(self, audio_input):
        text = await self.stt(audio_input)
        embedding = self.encode(text)
        
        # Check FAQ match
        match, score = self.search_faqs(embedding)
        
        if score > 0.83 and self.can_play(match.id):
            # Play pre-rendered video with baked audio
            return ('play_faq', match.video_path)
        else:
            # Generate live response
            response_text = await self.llm_generate(text)
            response_audio = await self.tts(response_text)
            
            # Choose base video by length
            base_video = self.select_video_loop(len(response_audio))
            
            return ('play_live', base_video, response_audio)

TouchDesigner Audio-Reactive Particle Implementation

# In TouchDesigner Execute DAT
def onFrameStart(frame):
    # Get audio analysis
    audio_level = op('audioanalysis1')['level']
    audio_low = op('audioanalysis1')['low']
    audio_mid = op('audioanalysis1')['mid']
    audio_high = op('audioanalysis1')['high']
    
    # Modulate particle parameters
    particles = op('particles1')
    
    # Mouth region density
    mouth_force = particles.par.force1
    mouth_force.val = fit(audio_level, 0, 0.8, 0.1, 2.0)
    
    # Color based on frequency
    color_r = fit(audio_low, 0, 1, 0.0, 0.3)  # Warm on low
    color_g = fit(audio_mid, 0, 1, 0.5, 1.0)  # Cyan on mid  
    color_b = fit(audio_high, 0, 1, 0.8, 1.0)  # Bright on high
    
    # Persona-specific modulation
    if parent().par.Persona == 'vonnegut':
        # Add smoke wisps on thoughtful pauses
        if audio_level < 0.1:
            particles.par.birthrate = 500
            particles.par.velocity = 0.5
    elif parent().par.Persona == 'bub':
        # Sparkle on high frequencies (purrs)
        if audio_high > 0.7:
            particles.par.turbulence = 2.0

Development Phases

Phase 1: Foundation (Months 1-2)

Phase 2: Enhancement (Months 2-4)

Phase 3: Polish (Months 4-6)

Technical Collaborator Requirements

Essential Skills:

Test Project Brief: “Create a 30-second particle face loop that responds to audio amplitude. Particles should feel weightless and ethereal. Use cyan/gold palette. Black background is TRUE black.”

Cost-Effective Audio Solutions

Current Explorations:

Budget Reallocation:

This approach prioritizes the magical particle aesthetic while maintaining technical feasibility and cost efficiency.