Installation Guide

Set up the my-gpt stack on macOS (Ollama + Flask UI + HTTPS via Caddy + optional speech).

Overview

The app runs fully on your machine. The browser UI is served by Flask, and inference is handled by Ollama. Speech features can run in-browser or on the server.

Local inference: Models run through Ollama on localhost.
Single server: Flask serves the UI, docs, and API.
HTTPS gateway: Caddy terminates TLS and exposes the app securely to mobile browsers.
Local storage: Chat sessions and analytics are stored on disk.

Requirements (macOS)

Python 3.11+: Used for the Flask server and speech modules.
Ollama: Local LLM runtime (ollama.com).
Homebrew: For installing system packages.
Required: caddy for HTTPS. espeak-ng is optional for server TTS.

Quick Start (macOS)

brew install git python@3.11 caddy jq espeak-ng
git clone https://github.com/JoseviOliveira/my-gpt.git
cd my-gpt
python3.11 -m venv chat_env
source ./chat_env/bin/activate
./chat_env/bin/pip install --upgrade pip
./chat_env/bin/pip install -r requirements.txt

# confirm expected interpreter
python --version   # should be 3.11.x

ollama serve &
ollama pull gpt-oss:20b
ollama pull gemma3:4b
ollama pull magistral:24b

./scripts/run.sh start
# open http://127.0.0.1:4200

If python3.11 is not available, install it with Homebrew first. Using Python 3.9 can break dependency installation.

Configure Users and Port

Create .chat.conf in the repo root (used by scripts/run.sh):

export APP_USER="yourname"
export APP_PASS="yourpassword"
# Optional multi-user list:
export APP_USERS="yourname:yourpassword,guest:guest"
export APP_GUEST_USER="guest"
export CHAT_PORT=4200

export MODEL="gpt-oss:20b"
export SUMMARY_MODEL="gpt-oss:20b"
export APP_LOG_LEVEL="INFO"

Optional non-admin limits (per user):

export NON_ADMIN_DAILY_PROMPT_LIMIT=100
export NON_ADMIN_CHAT_PROMPT_LIMIT=30
export NON_ADMIN_CHAT_LIMIT=10
export NON_ADMIN_ALLOWED_MODES="fast,normal"
export NON_ADMIN_MODEL_ALLOWLIST="deepseek-r1:8b,gemma3:4b,magistral:24b"
export NON_ADMIN_FAST_MODEL="deepseek-r1:8b"
export NON_ADMIN_NORMAL_MODEL="gemma3:4b"

Configure the port in .chat.conf if you want something other than 4200:

export CHAT_PORT=4200

Run the Server

./scripts/run.sh start
./scripts/run.sh status
./scripts/run.sh stop

The script sources .chat.conf, then orchestrates the full stack: it ensures Ollama is running (and pulls models if missing), starts the Flask app, and brings Caddy up or down when enabled. Logs are written to log/server.out.log, with separate logs for Caddy and Ollama in log/.

Start: Use start to launch in the background; it should return immediately.
Status: status reports whether the service is running and which port is bound.
Stop: stop cleanly shuts down the server process.
Logs: Tail log/server.out.log for startup errors, model load time, or auth issues.

Open http://127.0.0.1:4200 and log in using your credentials.

Eco Mode (Optional)

Eco mode reduces GPU/CPU pressure by preferring smaller models and limiting Ollama concurrency.

Run once: ./scripts/run.sh --eco start
Always on: set ECO_MODE=1 in .chat.conf
When eco is active, Settings shows server mode eco.

Models and Ollama

Ollama must be running and have the models you plan to use.

ollama serve &
ollama pull gpt-oss:20b
ollama pull gemma3:4b
ollama pull gemma3:12b
ollama pull magistral:24b

Optional environment variables:

export OLLAMA_URL="http://127.0.0.1:11434"
export MODEL="gpt-oss:20b"
export SUMMARY_MODEL="gpt-oss:20b"
export MODEL_EXTRA="gemma3:4b,gemma3:12b,magistral:24b"

Speech (Optional)

Speech features are configurable via environment variables:

export STT_MODE="browser"   # or "whisper"
export TTS_MODE="browser"   # or "coqui"

Browser STT/TTS: Uses the browser APIs and requires no extra setup.
Server STT: Neural speech-to-text runs inside the Python environment and is installed via requirements.txt.
Server TTS: Neural text-to-speech runs in-process. espeak-ng helps with phonemizer support.

If you use an external helper script, set the following:

export COQUI_TTS_PY="$HOME/my-gpt/tts_env/bin/python"
export COQUI_TTS_SCRIPT="$HOME/my-gpt/scripts/tts_synthesize.py"

HTTPS with Caddy (Required)

Point your domain DNS to your public IP.
Forward ports 80 and 443 to your Mac.
Edit deploy/Caddyfile with your domain.

Run Caddy:

cd ~/my-gpt
caddy run --config deploy/Caddyfile

Auto-Start (Optional)

The helper deploy/launchd-run.sh is designed to be called by a LaunchAgent. Create a LaunchAgent that runs it at login.

cat > ~/Library/LaunchAgents/my-gpt.plist <<'PLIST'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
 "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>my-gpt</string>
  <key>ProgramArguments</key>
  <array>
    <string>/Users/USER/my-gpt/deploy/launchd-run.sh</string>
  </array>
  <key>RunAtLoad</key>
  <true/>
</dict>
</plist>
PLIST

launchctl load ~/Library/LaunchAgents/my-gpt.plist

Replace USER with your macOS username.

Verify Everything

Open http://127.0.0.1:4200 and log in.
Run a test prompt and confirm streaming output.
Check the health endpoint: curl -fsS -u yourname:yourpassword http://127.0.0.1:4200/health
Open the docs at /docs/.

GPU KPI (macOS, Optional)

The header GPU KPI reads powermetrics on macOS. To allow the server to access it without prompting every time, add a sudoers rule:

sudo sh -c 'printf "%s\n" \
"# Allow my-gpt to read GPU usage without a password" \
"YOUR_USER ALL=(root) NOPASSWD: /usr/bin/powermetrics -n 1 --samplers gpu_power" \
> /etc/sudoers.d/my-gpt-powermetrics'

Replace YOUR_USER with your macOS username. Restart the server afterward.

Troubleshooting

Ollama not ready: Check ollama serve and that models are pulled.
401 errors: Confirm your APP_USER/APP_PASS or APP_USERS values.
Speech errors: Install espeak-ng and confirm STT/TTS modes.
Port in use: Change CHAT_PORT or stop the conflicting process.