User Guide
Practical how-to for daily use, based on the current codebase.
Overview
This app is a local-first assistant. The browser UI is served by your own Flask server and talks to local models through Ollama. Speech features can run in the browser or on the server.
Chats, audio, and analytics stay on your machine by default. Caddy provides the HTTPS gateway so the UI can be used securely from a phone while Ollama stays bound to localhost.
What the app includes
This is the current scope of the product:
- Core stack: Flask backend + SPA frontend with streaming replies and a Stop control.
- Local inference: Ollama runs models on-device with per-mode selection and an allowlist.
- Sessions + metadata: Chats are saved to JSON with AI-generated title and summary.
- Authenticated access: All API routes require login tokens; the docs remain public.
- Role-based access: admin, normal users, and a read-only guest with muted actions.
- Server protection: per-user daily prompt caps, per-chat caps, max chat count, GPU guard, and polling limits.
- Speech pipeline: browser STT/TTS plus server neural STT/TTS with fallbacks and user restrictions.
- Analytics + stats: dashboard with filters, maps, charts, tables, and responsive layout adjustments.
- UI system: consistent headers, compact pills, unified toasts, light/dark themes, and responsive layout.
- Ops + deployment: scripts/run.sh, launchd helpers, Caddy TLS proxy, eco mode, logs.
- Docs set: architecture, installation, benchmarks, performance, audience, and narrative "about".
Getting Started
- Login: Use the splash screen to sign in. Tokens are stored in your browser and used for API calls.
- Sidebar: Toggle the sidebar from the top-left. On desktop you can resize it by dragging the right edge.
- Welcome tour: Run the tour from Settings -> Account & tools -> Welcome tour.
- Docs: Open the docs from Settings -> Account & tools -> Resources.
Benchmark Workflow
Use benchmark pages as a three-step flow: baseline report, alternative interpretation, then live monitor when a run is active.
- Read the primary report: Open
/docs/benchmark_guided.htmlfor the canonical benchmark summary and model ranking. - Compare with alternative analysis: Open
/docs/benchmark_autonomous_claude.htmlto cross-check conclusions using a different KPI weighting and chart style. - Track an active run live: Open
/docs/benchmark_monitor.htmlfor real-time progress, telemetry, and the latest executed task.
When to use which page:
- Need final decisions: Start with
benchmark_guided.html. - Need another lens: Compare with
benchmark_autonomous_claude.html. - Need operational visibility: Use
benchmark_monitor.htmlduring execution.
Access Levels and Limits
These limits keep the app responsive when many people connect at once. They protect the GPU and avoid long queues while still letting everyone explore the app safely.
Guest — a safe, read-only account for public access and demos.
- Settings shows a read-only guest badge.
- A top warning pill explains demo limitations and asks for a real account to fully test the app.
- Browse chats and open sessions only; no create, rename, pin, delete, or send.
- STT/TTS stay browser-only; server speech modes are locked.
Normal users — full chat access with guardrails to keep the server smooth.
- Prompt caps apply per user (daily total + per-chat length) and per-account chat count.
- Only Fast and Normal are available; Deep is disabled to avoid long runs.
- The model list is limited to deepseek-r1:8b, gemma3:4b, and magistral:24b; defaults are Fast=deepseek-r1:8b and Normal=gemma3:4b.
- When GPU utilization is high, Send is disabled and a short toast explains the limit.
Admins — unrestricted access for owners.
- Full access to modes, models, and settings with no prompt or chat caps.
Conversation Basics
Type in the composer and press Send or Enter. Use Shift+Enter for line breaks. The composer grows with your text.
- Streaming: Replies arrive progressively; the Send button becomes Stop to cancel.
- Metrics line: After each response you see model name, token counts, speed, elapsed time, and a speaker that starts audio playback.
- Summary line: A short summary will appear under the title after the server generates metadata.
Modes and Models
Use the Fast / Normal / Deep buttons in the header to control how detailed the response should be.
- Per-mode models: Assign a model to each mode in Settings -> Models.
- Allowlist: The backend only accepts models in its allowlist; unsupported names are ignored.
- Tooltips: Hover a mode button to see which model is assigned.
- Non-admin limits: Some accounts only see Fast/Normal modes and a small model list.
Sessions and History
- New chat: Click "+ New chat" to start a fresh session.
- Chat list: Select a title to load an older session.
- Options menu: Use the ⋮ menu on the active chat to pin/unpin, rename, or delete.
- Auto metadata: The server generates titles and summaries after the conversation is idle.
- Storage: Sessions are stored as JSON files under
chats/<user>/. - Guest read-only: Guests can browse sessions but cannot change them.
Speech Input (STT)
- Web mode: Uses the browser SpeechRecognition API. Click the mic to start/stop. A language toggle cycles EN/FR/ES when available.
- Server mode: Uses neural speech-to-text on the backend. Audio is recorded in the browser and sent to
/api/stt. - Modes: Switch STT mode in Settings -> Speech input.
- Guest access: Guests are limited to browser-only STT.
Speech Output (TTS)
- Speaker button: Click the speaker icon under a response to play audio.
- Web mode: Uses the browser SpeechSynthesis API.
- Server mode: Uses neural text-to-speech with chunked streaming for long replies.
- Modes: Switch TTS mode in Settings -> Speech output.
- Guest access: Guests are limited to browser-only TTS.
Appearance and Settings
- Theme: Light and Dark mode.
- Font: Choose between available font stacks.
- Text size: Small, medium, or large.
- Server mode: The "server mode" row shows eco or performance based on the backend config.
Admin and Debug Tools
- Debug panel: Toggle in Settings -> Account & tools -> Debug.
- Dashboard: Open
/dashboardfrom Settings -> Account & tools -> Dashboard (requires admin access). - Dashboard docs: Open
/docs/dashboard.htmlfor panel-by-panel explanations and filter workflow. - Logs: The debug panel mirrors console output for troubleshooting.
Keyboard Shortcuts
- Enter: Send message.
- Shift+Enter: New line in the composer.
Remote Access Notes
If the app is exposed publicly, access still requires login for all API routes. The docs are public.
- HTTPS: Caddy handles TLS termination for secure mobile access.
- Multi-device: Any device can log in with valid credentials and see its own sessions.
Troubleshooting
- Mic unavailable: The browser may block microphone access or not support SpeechRecognition.
- Server STT/TTS errors: Ensure neural speech-to-text and neural text-to-speech dependencies are installed and configured.
- No GPU gauge: The GPU widget appears only when the server reports utilization.
- Benchmark monitor looks stale: Reload the page and verify the backend benchmark APIs are reachable.
- Report/monitor mismatch: The monitor is live state; reports are post-run analysis snapshots.
- Limit reached: When you hit a cap, a short toast explains the restriction and the chat is not updated.
- 401 responses: Log in again if tokens expire or the splash shows "Session expired".