Engineering Principles
Architectural decisions and operational constraints governing local-chat.
Design Philosophy
Local-First Architecture
All data processing, inference, and storage occur exclusively on the local device. No external cloud APIs are used for core functionality, including inference, analytics, or telemetry. This constraint is non-negotiable and applies to user conversations, logs, metrics, model execution, and system analytics.
Privacy-Centric Approach
User data is treated as strictly private: no external telemetry, no background reporting, and no implicit data collection. Local metrics are permitted only for operational visibility and debugging, and must never leave the device.
Simplicity and Robustness
Engineering decisions favor simple, explicit designs with small dependency trees and predictable failure modes. Complex abstractions are avoided unless they demonstrably improve robustness. If a solution appears "clever", it is assumed to be incorrect until proven otherwise.
Minimalism and KISS Principle
Minimalism is a foundational constraint of the project, not an aesthetic preference. All components of local-chat are expected to remain as simple as possible, but no simpler. Design choices must favor the smallest viable solution that satisfies functional and operational requirements.
The KISS principle is applied systematically:
- prefer fewer abstractions over layered indirection,
- prefer explicit code over generic frameworks,
- prefer readability and reasoning over optimization unless required,
- prefer removal over addition.
Complexity is treated as technical debt. Any increase in complexity must be justified by a clear, measurable improvement in robustness, reliability, or long-term operability.
Terminology and Naming
Role-Based Naming
System components and models are named according to their functional role (e.g., "speed-oriented", "quality-oriented") rather than false hierarchies or tiers.
Functional Clarity
Names must describe what a component does, not merely what it is. Ambiguous naming is discouraged.
System Architecture
Technology Stack
- Inference Engine: Ollama (local API)
- Backend: Python / Flask (lightweight HTTP server)
- Frontend: Vanilla JavaScript, HTML5, CSS3
- Persistence: SQLite (structured data), filesystem (logs/chats)
- Gateway: Caddy (HTTPS reverse proxy)
Architectural Patterns
- Thin Routes, Fat Services: API routes are limited to parsing and formatting. Business logic resides in dedicated service modules.
- No Build Step: The frontend runs directly in the browser with no bundling or transpilation. This improves debuggability and maintainability.
- Native ES Modules: Frontend modularity relies exclusively on standard ES import/export syntax.
Model Policy
- Generation Constraint: Only modern-generation models (2025+) are exposed by default. Cross-generation comparisons are avoided.
- Selection Criteria: Models are selected based on recency, quality, throughput/latency, and resource consumption. Smaller models are preferred if quality is comparable.
- Explicit Non-Goals: Not competing with cloud SaaS, not a general coding copilot, not for multi-tenant deployments.
Coding Standards
Backend (Python)
- Prefer the Python standard library; justify external dependencies.
- Centralized logging is mandatory. No
print()in production. - Model identifiers must be lowercase.
Frontend (JavaScript)
- No large frameworks (React, Vue, Svelte).
- Explicit DOM manipulation.
- File size discipline (< 600 lines).
CSS
- Theming via local CSS variables.
- Avoid utility-heavy approaches unless core.
- Minimize visual noise.
User Experience Constraints
- Core Experience: The conversation is the primary focus. Interfaces are mobile-first and responsive (>44px touch targets).
- Interaction Feedback: System state (loading, streaming, disabled) must always be explicit. Silent failures are forbidden. Errors must be visible and actionable.
- Accessibility: High contrast ratios and keyboard navigation are required.
Resource Guardrails
- GPU & Compute: GPU usage must be bounded and observable. Inference must be interruptible.
- Role-Based Limits: Non-admin users cannot trigger heavy models or monopolize resources.
Operational Discipline
- Long-Running Operation: Components must tolerate long uptime without leaks.
- Service Lifecycle: Services must be script-controllable and recover cleanly.
- Browser Responsibility: Browsers must be closed when unattended to release GPU contexts.
Benchmark Engineering Principles
- Dedicated Runtime Condition: Benchmarks are executed on a server dedicated to benchmark activity; interactive chat usage is intentionally excluded during runs.
- Reproducibility First: Each benchmark run must preserve scope, model list, dataset composition, and run metadata so results remain comparable and auditable.
- Isolation Before Conclusions: When an error appears, analysis must move from broad system behavior to minimal isolated reproduction.
Error Isolation Workflow
- Task-focused script: Create dedicated scripts that run a specific failing task/case to isolate the issue with minimal moving parts.
- Bypass app APIs: Use dedicated scripts that call Ollama directly, removing app-layer routes/middleware from the path.
- Raw CLI inspection: If needed, run Ollama via CLI and inspect raw model output to understand behavior (for example, hidden reasoning/thinking patterns).
Documentation Constraints
- Visual Consistency: Common structure and visual language across all pages.
- Tone and Style: Neutral, objective, instructional. No marketing language.
LLM Usage Philosophy
In practice, humans no longer hand-review every line of generated code. Humans define goals, constraints, and direction; LLMs propose solutions and generate code using the highest-probability approach given the prompt and available context. Failures are expected: prompts may be ambiguous, context windows may be incomplete, and generated code may be wrong or unstable. The human responsibility is to understand the technology, diagnose gaps, and iteratively guide the model toward the correct implementation.