Installation Guide

Set up the my-gpt stack on macOS (Ollama + Flask UI + HTTPS via Caddy + optional speech).

Overview

The app runs fully on your machine. The browser UI is served by Flask, and inference is handled by Ollama. Speech features can run in-browser or on the server.

Requirements (macOS)

Quick Start (macOS)

brew install git python@3.11 caddy jq espeak-ng
git clone https://github.com/JoseviOliveira/my-gpt.git
cd my-gpt
python3.11 -m venv chat_env
source ./chat_env/bin/activate
./chat_env/bin/pip install --upgrade pip
./chat_env/bin/pip install -r requirements.txt

# confirm expected interpreter
python --version   # should be 3.11.x

ollama serve &
ollama pull gpt-oss:20b
ollama pull gemma3:4b
ollama pull magistral:24b

./scripts/run.sh start
# open http://127.0.0.1:4200

If python3.11 is not available, install it with Homebrew first. Using Python 3.9 can break dependency installation.

Configure Users and Port

Create .chat.conf in the repo root (used by scripts/run.sh):

export APP_USER="yourname"
export APP_PASS="yourpassword"
# Optional multi-user list:
export APP_USERS="yourname:yourpassword,guest:guest"
export APP_GUEST_USER="guest"
export CHAT_PORT=4200

export MODEL="gpt-oss:20b"
export SUMMARY_MODEL="gpt-oss:20b"
export APP_LOG_LEVEL="INFO"

Optional non-admin limits (per user):

export NON_ADMIN_DAILY_PROMPT_LIMIT=100
export NON_ADMIN_CHAT_PROMPT_LIMIT=30
export NON_ADMIN_CHAT_LIMIT=10
export NON_ADMIN_ALLOWED_MODES="fast,normal"
export NON_ADMIN_MODEL_ALLOWLIST="deepseek-r1:8b,gemma3:4b,magistral:24b"
export NON_ADMIN_FAST_MODEL="deepseek-r1:8b"
export NON_ADMIN_NORMAL_MODEL="gemma3:4b"

Configure the port in .chat.conf if you want something other than 4200:

export CHAT_PORT=4200

Run the Server

./scripts/run.sh start
./scripts/run.sh status
./scripts/run.sh stop

The script sources .chat.conf, then orchestrates the full stack: it ensures Ollama is running (and pulls models if missing), starts the Flask app, and brings Caddy up or down when enabled. Logs are written to log/server.out.log, with separate logs for Caddy and Ollama in log/.

Open http://127.0.0.1:4200 and log in using your credentials.

Eco Mode (Optional)

Eco mode reduces GPU/CPU pressure by preferring smaller models and limiting Ollama concurrency.

Models and Ollama

Ollama must be running and have the models you plan to use.

ollama serve &
ollama pull gpt-oss:20b
ollama pull gemma3:4b
ollama pull gemma3:12b
ollama pull magistral:24b

Optional environment variables:

export OLLAMA_URL="http://127.0.0.1:11434"
export MODEL="gpt-oss:20b"
export SUMMARY_MODEL="gpt-oss:20b"
export MODEL_EXTRA="gemma3:4b,gemma3:12b,magistral:24b"

Speech (Optional)

Speech features are configurable via environment variables:

export STT_MODE="browser"   # or "whisper"
export TTS_MODE="browser"   # or "coqui"

If you use an external helper script, set the following:

export COQUI_TTS_PY="$HOME/my-gpt/tts_env/bin/python"
export COQUI_TTS_SCRIPT="$HOME/my-gpt/scripts/tts_synthesize.py"

HTTPS with Caddy (Required)

  1. Point your domain DNS to your public IP.
  2. Forward ports 80 and 443 to your Mac.
  3. Edit deploy/Caddyfile with your domain.
  4. Run Caddy:
    cd ~/my-gpt
    caddy run --config deploy/Caddyfile

Auto-Start (Optional)

The helper deploy/launchd-run.sh is designed to be called by a LaunchAgent. Create a LaunchAgent that runs it at login.

cat > ~/Library/LaunchAgents/my-gpt.plist <<'PLIST'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
 "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
  <key>Label</key>
  <string>my-gpt</string>
  <key>ProgramArguments</key>
  <array>
    <string>/Users/USER/my-gpt/deploy/launchd-run.sh</string>
  </array>
  <key>RunAtLoad</key>
  <true/>
</dict>
</plist>
PLIST

launchctl load ~/Library/LaunchAgents/my-gpt.plist

Replace USER with your macOS username.

Verify Everything

GPU KPI (macOS, Optional)

The header GPU KPI reads powermetrics on macOS. To allow the server to access it without prompting every time, add a sudoers rule:

sudo sh -c 'printf "%s\n" \
"# Allow my-gpt to read GPU usage without a password" \
"YOUR_USER ALL=(root) NOPASSWD: /usr/bin/powermetrics -n 1 --samplers gpu_power" \
> /etc/sudoers.d/my-gpt-powermetrics'

Replace YOUR_USER with your macOS username. Restart the server afterward.

Troubleshooting