Installation Guide
Set up the my-gpt stack on macOS (Ollama + Flask UI + HTTPS via Caddy + optional speech).
Overview
The app runs fully on your machine. The browser UI is served by Flask, and inference is handled by Ollama. Speech features can run in-browser or on the server.
- Local inference: Models run through Ollama on localhost.
- Single server: Flask serves the UI, docs, and API.
- HTTPS gateway: Caddy terminates TLS and exposes the app securely to mobile browsers.
- Local storage: Chat sessions and analytics are stored on disk.
Requirements (macOS)
- Python 3.11+: Used for the Flask server and speech modules.
- Ollama: Local LLM runtime (ollama.com).
- Homebrew: For installing system packages.
- Required:
caddyfor HTTPS.espeak-ngis optional for server TTS.
Quick Start (macOS)
brew install git python@3.11 caddy jq espeak-ng
git clone https://github.com/JoseviOliveira/my-gpt.git
cd my-gpt
python3.11 -m venv chat_env
source ./chat_env/bin/activate
./chat_env/bin/pip install --upgrade pip
./chat_env/bin/pip install -r requirements.txt
# confirm expected interpreter
python --version # should be 3.11.x
ollama serve &
ollama pull gpt-oss:20b
ollama pull gemma3:4b
ollama pull magistral:24b
./scripts/run.sh start
# open http://127.0.0.1:4200
If python3.11 is not available, install it with Homebrew first. Using Python 3.9 can break dependency installation.
Configure Users and Port
Create .chat.conf in the repo root (used by scripts/run.sh):
export APP_USER="yourname"
export APP_PASS="yourpassword"
# Optional multi-user list:
export APP_USERS="yourname:yourpassword,guest:guest"
export APP_GUEST_USER="guest"
export CHAT_PORT=4200
export MODEL="gpt-oss:20b"
export SUMMARY_MODEL="gpt-oss:20b"
export APP_LOG_LEVEL="INFO"
Optional non-admin limits (per user):
export NON_ADMIN_DAILY_PROMPT_LIMIT=100
export NON_ADMIN_CHAT_PROMPT_LIMIT=30
export NON_ADMIN_CHAT_LIMIT=10
export NON_ADMIN_ALLOWED_MODES="fast,normal"
export NON_ADMIN_MODEL_ALLOWLIST="deepseek-r1:8b,gemma3:4b,magistral:24b"
export NON_ADMIN_FAST_MODEL="deepseek-r1:8b"
export NON_ADMIN_NORMAL_MODEL="gemma3:4b"
Configure the port in .chat.conf if you want something other than 4200:
export CHAT_PORT=4200
Run the Server
./scripts/run.sh start
./scripts/run.sh status
./scripts/run.sh stop
The script sources .chat.conf, then orchestrates the full stack: it ensures Ollama is running (and pulls models if missing), starts the Flask app, and brings Caddy up or down when enabled. Logs are written to log/server.out.log, with separate logs for Caddy and Ollama in log/.
- Start: Use
startto launch in the background; it should return immediately. - Status:
statusreports whether the service is running and which port is bound. - Stop:
stopcleanly shuts down the server process. - Logs: Tail
log/server.out.logfor startup errors, model load time, or auth issues.
Open http://127.0.0.1:4200 and log in using your credentials.
Eco Mode (Optional)
Eco mode reduces GPU/CPU pressure by preferring smaller models and limiting Ollama concurrency.
- Run once:
./scripts/run.sh --eco start - Always on: set
ECO_MODE=1in.chat.conf - When eco is active, Settings shows server mode eco.
Models and Ollama
Ollama must be running and have the models you plan to use.
ollama serve &
ollama pull gpt-oss:20b
ollama pull gemma3:4b
ollama pull gemma3:12b
ollama pull magistral:24b
Optional environment variables:
export OLLAMA_URL="http://127.0.0.1:11434"
export MODEL="gpt-oss:20b"
export SUMMARY_MODEL="gpt-oss:20b"
export MODEL_EXTRA="gemma3:4b,gemma3:12b,magistral:24b"
Speech (Optional)
Speech features are configurable via environment variables:
export STT_MODE="browser" # or "whisper"
export TTS_MODE="browser" # or "coqui"
- Browser STT/TTS: Uses the browser APIs and requires no extra setup.
- Server STT: Neural speech-to-text runs inside the Python environment and is installed via
requirements.txt. - Server TTS: Neural text-to-speech runs in-process.
espeak-nghelps with phonemizer support.
If you use an external helper script, set the following:
export COQUI_TTS_PY="$HOME/my-gpt/tts_env/bin/python"
export COQUI_TTS_SCRIPT="$HOME/my-gpt/scripts/tts_synthesize.py"
HTTPS with Caddy (Required)
- Point your domain DNS to your public IP.
- Forward ports 80 and 443 to your Mac.
- Edit
deploy/Caddyfilewith your domain. - Run Caddy:
cd ~/my-gpt caddy run --config deploy/Caddyfile
Auto-Start (Optional)
The helper deploy/launchd-run.sh is designed to be called by a LaunchAgent. Create a LaunchAgent that runs it at login.
cat > ~/Library/LaunchAgents/my-gpt.plist <<'PLIST'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN"
"http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
<key>Label</key>
<string>my-gpt</string>
<key>ProgramArguments</key>
<array>
<string>/Users/USER/my-gpt/deploy/launchd-run.sh</string>
</array>
<key>RunAtLoad</key>
<true/>
</dict>
</plist>
PLIST
launchctl load ~/Library/LaunchAgents/my-gpt.plist
Replace USER with your macOS username.
Verify Everything
- Open
http://127.0.0.1:4200and log in. - Run a test prompt and confirm streaming output.
- Check the health endpoint:
curl -fsS -u yourname:yourpassword http://127.0.0.1:4200/health - Open the docs at
/docs/.
GPU KPI (macOS, Optional)
The header GPU KPI reads powermetrics on macOS. To allow the server to access it without prompting every time, add a sudoers rule:
sudo sh -c 'printf "%s\n" \
"# Allow my-gpt to read GPU usage without a password" \
"YOUR_USER ALL=(root) NOPASSWD: /usr/bin/powermetrics -n 1 --samplers gpu_power" \
> /etc/sudoers.d/my-gpt-powermetrics'
Replace YOUR_USER with your macOS username. Restart the server afterward.
Troubleshooting
- Ollama not ready: Check
ollama serveand that models are pulled. - 401 errors: Confirm your
APP_USER/APP_PASSorAPP_USERSvalues. - Speech errors: Install
espeak-ngand confirm STT/TTS modes. - Port in use: Change
CHAT_PORTor stop the conflicting process.