Docker & AI Agent¶

The Docker stack runs the full Video Synth environment — including a local LLM — with a single command. No GPU required; the LLM runs on CPU via Ollama and the video synth uses Mesa software OpenGL rendering.

Stack Architecture¶

Three services defined in docker-compose.yml:

Service	Container	Port	Description
`ollama`	`video_synth_ollama`	11434	Ollama local LLM inference server
`video_synth`	`video_synth_app`	8000	Video synth in headless API mode
`agent`	`video_synth_agent`	8001	AI agent web chat UI

┌─────────────────────────────────────────────────────────┐
│  Browser                                                │
│    http://localhost:8000/ui/    ←  Web control panel    │
│    http://localhost:8001/       ←  AI agent chat        │
└──────────────┬────────────────────────┬─────────────────┘
               │                        │
    ┌──────────▼──────────┐  ┌──────────▼──────────┐
    │   video_synth:8000  │  │    agent:8001        │
    │   FastAPI + Mesa GL │  │   FastAPI + OpenAI   │
    │   Xvfb virtual disp │  │   SDK → Ollama       │
    └─────────────────────┘  └──────────┬───────────┘
                                        │
                             ┌──────────▼──────────┐
                             │   ollama:11434       │
                             │   llama3.2:3b (CPU)  │
                             └─────────────────────┘

The agent service waits for both ollama and video_synth to pass their health checks before starting.

Quick Start¶

# Build and start all three services
docker compose up --build

# Run in the background
docker compose up --build -d

# View logs
docker compose logs -f

# Tear down (keeps ollama_data volume with downloaded model weights)
docker compose down

First-Run Note¶

On the first start, Ollama downloads the llama3.2:3b model (~2 GB). This happens once; subsequent starts use the ollama_data Docker volume as a cache. You can track progress with:

docker compose logs -f ollama

Service URLs¶

Once all services are healthy:

URL	Description
`http://localhost:8000/ui/`	React web control panel
`http://localhost:8000/docs`	Swagger interactive API explorer
`http://localhost:8000/stream`	MJPEG live video stream
`http://localhost:8000/snapshot`	Current frame as JPEG
`http://localhost:8001/`	AI agent web chat UI
`http://localhost:8001/chat`	Agent POST endpoint (`{"message": "..."}`)
`http://localhost:8001/docs`	Agent API docs
`http://localhost:11434`	Ollama API (for direct model queries)

Changing the LLM Model¶

The default model is llama3.2:3b (~2 GB, fast on CPU). To use a larger or different model, edit the OLLAMA_MODEL environment variable in docker-compose.yml:

agent:
  environment:
    - OLLAMA_MODEL=llama3.1:8b   # ~5 GB, better reasoning

Recommended models for this use case:

Model	Size	Notes
`llama3.2:3b`	~2 GB	Default; fast, adequate for parameter control
`llama3.1:8b`	~5 GB	Better at multi-step reasoning
`qwen2.5:7b`	~5 GB	Strong tool-calling performance
`minicpm-v`	~5 GB	Multimodal — can analyze `/snapshot` frames visually

After changing the model, rebuild:

docker compose down
docker compose up --build

Enabling Vision (Multimodal)¶

Set VISION_MODEL to a multimodal model to let the agent analyze the current visual output before deciding what to change:

agent:
  environment:
    - OLLAMA_MODEL=llama3.2:3b
    - VISION_MODEL=minicpm-v

When VISION_MODEL is set, the agent fetches /snapshot from the video synth and includes the frame in its context before responding.

AI Agent Chat UI¶

Open http://localhost:8001/ in a browser. Type natural-language commands:

"Make the plasma animation pulse with a slow sine wave on the speed parameter"

"Switch to Metaballs and crank up the glitch intensity"

"Reduce all warp parameters and create a smooth, calming blue wash"

The agent has access to two tools:

list_params — fetches the full parameter list from /params
set_param — calls PUT /params/{name} to update a value

At startup, the agent reads documentation/PARAMETERS.md to learn what every parameter does, so it can make informed decisions without fetching docs on every request.

Chat API¶

You can also POST to the agent programmatically:

curl -X POST http://localhost:8001/chat \
  -H "Content-Type: application/json" \
  -d '{"message": "Set a random combination of glitch effects"}'

Response:

{
  "response": "I've enabled pixel shift, color splitting, and set glitch intensity to 70...",
  "tool_calls": [
    {"name": "set_param", "args": {"name": "enable_pixel_shift", "value": 1}},
    {"name": "set_param", "args": {"name": "enable_color_split", "value": 1}},
    {"name": "set_param", "args": {"name": "glitch_intensity_max", "value": 70}}
  ]
}

Customizing the Agent System Prompt¶

The agent's personality, constraints, and context are defined in the system prompt inside agent/agent.py. To change how the agent behaves — for example to restrict it to a specific animation style or give it a performance persona — edit the SYSTEM_PROMPT string in that file and rebuild:

docker compose up --build agent

You can also inject additional context via a mounted file. For example, add a agent/context.md with patch descriptions and mount it:

agent:
  volumes:
    - ./agent/context.md:/app/context.md

Then reference it in the system prompt: Path('/app/context.md').read_text().

Persisting Patches¶

Patches saved through the web UI or API are written to save/saved_values.yaml. This directory is mounted as a Docker volume so your patches survive container restarts:

video_synth:
  volumes:
    - ./save:/app/save

To back up your patches, copy the save/ directory to a safe location.

Troubleshooting¶

Agent never becomes ready¶

The agent waits for Ollama's health check to pass and for the model pull to complete. Check:

docker compose logs ollama   # Is the model still downloading?
docker compose logs agent    # Any startup errors?

Video synth shows black frames¶

Mesa software rendering requires the LIBGL_ALWAYS_SOFTWARE=1 and DISPLAY=:99 environment variables (set in docker-compose.yml). If you see GL errors, confirm these are present:

docker compose exec video_synth env | grep -E "DISPLAY|LIBGL"

Port conflicts¶

If ports 8000, 8001, or 11434 are already in use, change the host-side port mappings in docker-compose.yml:

ports:
  - "9000:8000"   # Access video synth at localhost:9000

Running on a machine with a GPU (Nvidia)¶

Add the deploy section to the ollama service for GPU acceleration:

ollama:
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: 1
            capabilities: [gpu]