non-linear-docs/09-TECH-STACK-AND-ARCHITECTURE.md

35 KiB

Non-Linear: Tech Stack & Architecture

Stack Overview

┌─────────────────────────────────────────────────────────┐
│                        FRONTEND                         │
│  Vue 3 + Tailwind + Headless UI + ECharts               │
│  Graph Viz: TBD (D3 vs Cytoscape — eval pending)        │
│  Command Palette: vue-command-palette / custom           │
│  Keybindings: VueUse useMagicKeys                       │
│  Icons: Lucide  │  Font: Inter  │  Motion: @vueuse/motion│
│  State: Pinia   │  HTTP: ofetch │  WS: centrifuge-js     │
├─────────────────────────────────────────────────────────┤
│                     CROSS-PLATFORM                      │
│  Desktop: Tauri (thin wrapper, no offline) — post-alpha │
│  Mobile: Capacitor (responsive web first) — v0.2+       │
├─────────────────────────────────────────────────────────┤
│                        BACKEND                          │
│  FastAPI (Python)                                       │
│  Taskiq (async task queue — webhooks, imports, agents)   │
├─────────────────────────────────────────────────────────┤
│                      DATA LAYER                         │
│  Neo4j — graph topology (nodes, edges, status, labels)  │
│  Postgres — content & metadata (rich text, comments,    │
│             attachments meta, audit logs, project cfg)   │
│  Redis — caching, rate limiting                         │
│  Meilisearch — full-text search (post-alpha)            │
│  MinIO — S3-compatible file storage (post-alpha)        │
├─────────────────────────────────────────────────────────┤
│                      REAL-TIME                          │
│  Centrifugo — WebSocket server, live updates, push      │
├─────────────────────────────────────────────────────────┤
│                         AUTH                            │
│  Authentik — OIDC, API tokens, role mgmt, SSO-ready     │
├─────────────────────────────────────────────────────────┤
│                       INFRA/OPS                         │
│  Caddy (reverse proxy + TLS)                            │
│  Vault (secrets) — post-alpha                           │
│  Prometheus + Grafana — post-alpha                      │
│  Loki + Tempo + OpenTelemetry — post-alpha              │
├─────────────────────────────────────────────────────────┤
│                     DEPLOYMENT                          │
│  Docker Compose (dev + single-node production)          │
└─────────────────────────────────────────────────────────┘

Alpha infrastructure: Alpha ships with Neo4j, Postgres, Redis, Centrifugo, Caddy, Authentik, FastAPI, Taskiq worker, and Vue frontend (~10 containers). Meilisearch is replaced by Postgres tsvector. MinIO, Vault, and the observability stack (Prometheus/Grafana/Loki/Tempo) are introduced post-alpha. See 06a-ALPHA-SCOPE.md for the full alpha boundary.

Data Boundary

Neo4j — Graph Topology

Owns the decomposition tree and all overlay edges across the four data layers:

  • Node labels: Component (Layer 1), Issue (Layer 2), Artifact (Layer 4), Cycle, Project
  • Node identity (UUID), short ID
  • Lightweight properties: status, labels, assignee_id, created_at, updated_at
  • Layer 1 edges: HAS_CHILD between components (decomposition tree)
  • Layer 1→2 edges: HAS_CHILD from components to issues (work attachment)
  • Layer 2 edges: BLOCKS, DUPLICATES, RELATES_TO between issues (work coordination)
  • Layer 3 edges: DEPENDS_ON, IMPORTS, CALLS_API, SHARES_DB between components (code connections)
  • Layer 4 edges: HAS_ARTIFACT from components/issues to artifacts
  • Project root references, cycle membership

Each edge type is scoped to a single layer, which enables efficient layer-filtered queries — a Cypher query for "show me this subtree with only Layer 3 edges" simply filters by relationship type.

Why Neo4j over Postgres recursive CTEs: Queries like "find all unblocked leaves in this subtree," "critical path through blocks links," "everything 3 hops from this node" are what Cypher is built for. CTEs get painful with lateral links and variable-depth queries. The gap widens with Layer 3 code connections (multi-hop dependency chains) and in v0.2+ with cross-project edges.

Postgres — Content & Metadata

  • Rich text content: issue and component descriptions (markdown)
  • Comment threads: body, author, parent_comment_id (threading), timestamps
  • Attachment metadata: filename, size, mime_type, s3_key, uploader_id, uploaded_at (inline attachments in comments/descriptions)
  • Artifact metadata (Layer 4): title, kind, url/file_ref, mime_type — rich metadata for external docs, designs, and uploaded files attached to nodes
  • User/agent accounts: profile data, preferences, notification settings
  • Project settings: configuration, member lists, default policies
  • Audit logs: who changed what, when, with before/after snapshots
  • Policy definitions: role templates, custom permission rules

Linked to Neo4j by UUID. Neo4j node stores id: "abc-123". Postgres stores full content keyed by same UUID. FastAPI joins them as needed. This applies to all node types: Components (Layer 1), Issues (Layer 2), and Artifacts (Layer 4).

Redis — Caching & Real-Time

  • Subtree query cache (TTL, invalidated on graph mutations)
  • WebSocket pub/sub for real-time updates
  • Rate limiting for agent API
  • Authentik token validation cache

Meilisearch — Search Index (post-alpha)

Alpha: Full-text search is handled by Postgres tsvector in alpha. Meilisearch is introduced post-alpha for typo-tolerant, prefix-aware search across large datasets.

  • Indexes issue titles, descriptions, comments, labels
  • Fed from both Neo4j and Postgres
  • Powers command palette search (issues + commands in one result set)
  • Typo-tolerant, prefix search, filtering by label/status/assignee

MinIO — File Storage (post-alpha)

Alpha: No file uploads. Links in descriptions suffice for the 2-week validation. MinIO is introduced post-alpha alongside Artifacts (Layer 4).

  • S3-compatible API, self-hosted
  • Stores attachment files (images, docs)
  • Postgres stores metadata and S3 key; MinIO stores bytes
  • Migration path to AWS S3: zero code changes

Concrete Database Schemas

UUID Strategy

All entities use UUIDv7 (time-sortable). Generated application-side by FastAPI before writing to either database. The same UUID is used as the primary key in both Neo4j and Postgres, serving as the cross-database join key.

Neo4j Schema

Neo4j stores graph topology and lightweight node properties. All content lives in Postgres.

Node labels and properties:

// Layer 1: Component node
CREATE (c:Component {
  id: "uuidv7",
  short_id: "NL-C12",
  title: "auth-service",
  status: null,                    // components have no status
  labels: ["backend", "core"],
  owner_id: "uuidv7",
  assignee_id: null,
  repo_provider: "github",
  repo_url: "https://github.com/team/auth",
  repo_path: "/src/oauth",
  repo_branch: "main",
  created_at: datetime(),
  updated_at: datetime()
})

// Layer 2: Issue node
CREATE (i:Issue {
  id: "uuidv7",
  short_id: "NL-42",
  title: "implement refresh tokens",
  status: "todo",
  labels: ["feature", "p1"],
  assignee_id: "uuidv7",
  created_by: "uuidv7",
  cycle_id: "uuidv7",
  created_at: datetime(),
  updated_at: datetime()
})

// Layer 4: Artifact node — POST-ALPHA
CREATE (a:Artifact {
  id: "uuidv7",
  title: "Login flow mockup",
  kind: "link",                    // "link" | "file" | "embed"
  url: "https://figma.com/...",    // for links/embeds
  file_ref: null,                  // MinIO s3_key for uploaded files
  mime_type: null,
  size_bytes: null,
  created_by: "uuidv7",
  created_at: datetime()
})

// Project root (virtual node linking to decomposition tree root)
CREATE (p:Project {
  id: "uuidv7",
  workspace_id: "uuidv7",
  root_id: "uuidv7"
})

Relationships (organized by layer):

// Decomposition tree (parent → child) — Layer 1 + Layer 2
(component)-[:HAS_CHILD]->(component)    // Layer 1: structure nesting
(component)-[:HAS_CHILD]->(issue)        // Layer 1→2: work attached to structure
(issue)-[:HAS_CHILD]->(issue)            // Layer 2: sub-tasks

// Layer 2: Work coordination links (between issues)
(issue)-[:BLOCKS]->(issue)
(issue)-[:RELATES_TO]->(issue)
(issue)-[:DUPLICATES]->(issue)

// Layer 3: Code connection links (between components) — POST-ALPHA
(component)-[:DEPENDS_ON {source: "manual"}]->(component)
(component)-[:IMPORTS {source: "inferred"}]->(component)
(component)-[:CALLS_API {source: "inferred"}]->(component)
(component)-[:SHARES_DB {source: "manual"}]->(component)

// Layer 4: Artifact attachments — POST-ALPHA
(component)-[:HAS_ARTIFACT]->(artifact)
(issue)-[:HAS_ARTIFACT]->(artifact)

// Cycle membership
(issue)-[:IN_CYCLE]->(cycle:Cycle { id, name, start_date, end_date })

Layer 3 edges carry a source property ("manual" or "inferred") to distinguish human-declared dependencies from code-analysis results.

Indexes:

CREATE INDEX comp_id FOR (c:Component) ON (c.id);
CREATE INDEX comp_short FOR (c:Component) ON (c.short_id);
CREATE INDEX issue_id FOR (i:Issue) ON (i.id);
CREATE INDEX issue_short FOR (i:Issue) ON (i.short_id);
CREATE INDEX issue_status FOR (i:Issue) ON (i.status);
CREATE INDEX issue_assignee FOR (i:Issue) ON (i.assignee_id);
CREATE INDEX artifact_id FOR (a:Artifact) ON (a.id);
CREATE INDEX project_id FOR (p:Project) ON (p.id);

Postgres Schema (SQLModel)

Postgres stores all content, metadata, and configuration. Managed via Alembic migrations.

class NodeContent(SQLModel, table=True):
    """Rich content for both components and issues."""
    id: uuid.UUID = Field(primary_key=True)  # matches Neo4j node id
    description: str | None = None           # markdown
    description_html: str | None = None      # pre-rendered, sanitized HTML

class Comment(SQLModel, table=True):
    id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
    node_id: uuid.UUID = Field(foreign_key="nodecontent.id", index=True)
    author_id: uuid.UUID = Field(foreign_key="actor.id")
    body: str                                # markdown
    body_html: str                           # pre-rendered, sanitized HTML
    created_at: datetime
    updated_at: datetime

class CommentReaction(SQLModel, table=True):
    id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
    comment_id: uuid.UUID = Field(foreign_key="comment.id", index=True)
    actor_id: uuid.UUID = Field(foreign_key="actor.id")
    emoji: str                               # e.g. "+1", "rocket"
    created_at: datetime

class Attachment(SQLModel, table=True):
    """File attached inline to a comment or description (e.g. pasted image)."""
    id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
    node_id: uuid.UUID = Field(foreign_key="nodecontent.id", index=True)
    filename: str
    size_bytes: int
    mime_type: str
    s3_key: str                              # MinIO object key
    uploader_id: uuid.UUID = Field(foreign_key="actor.id")
    uploaded_at: datetime

class ArtifactContent(SQLModel, table=True):
    """Layer 4: external context attached to a component or issue.
    Topology (HAS_ARTIFACT edge) lives in Neo4j; rich metadata lives here.
    POST-ALPHA: introduced alongside Artifact nodes and MinIO."""
    id: uuid.UUID = Field(primary_key=True)  # matches Neo4j Artifact node id
    title: str
    kind: str                                # "link" | "file" | "embed"
    url: str | None = None                   # external URL (Figma, Docs, etc.)
    file_ref: str | None = None              # MinIO s3_key for uploaded files
    mime_type: str | None = None
    size_bytes: int | None = None
    node_id: uuid.UUID = Field(foreign_key="nodecontent.id", index=True)
    created_by: uuid.UUID = Field(foreign_key="actor.id")
    created_at: datetime

class Actor(SQLModel, table=True):
    """Human user or AI agent."""
    id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
    type: str                                # "user" | "agent"
    name: str
    email: str | None = None
    authentik_uid: str | None = None         # OIDC subject claim
    preferences: dict = Field(default_factory=dict)  # JSON: theme, notifications, etc.
    created_at: datetime

class Workspace(SQLModel, table=True):
    id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
    name: str
    slug: str = Field(unique=True, index=True)
    created_at: datetime

class WorkspaceMember(SQLModel, table=True):
    workspace_id: uuid.UUID = Field(foreign_key="workspace.id", primary_key=True)
    actor_id: uuid.UUID = Field(foreign_key="actor.id", primary_key=True)
    role: str                                # workspace-level role
    joined_at: datetime

class ProjectConfig(SQLModel, table=True):
    id: uuid.UUID = Field(primary_key=True)  # matches Neo4j Project id
    workspace_id: uuid.UUID = Field(foreign_key="workspace.id", index=True)
    name: str
    settings: dict = Field(default_factory=dict)  # JSON: custom statuses, defaults
    created_at: datetime

class PolicyRule(SQLModel, table=True):
    id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
    project_id: uuid.UUID = Field(foreign_key="projectconfig.id", index=True)
    actor_id: uuid.UUID | None = Field(default=None)     # null = role-level
    role_name: str | None = None
    action: str                              # e.g. "read_node", "create_child", "*"
    resource_scope: str                      # "global" | "subtree:{node_id}" | "node:{node_id}"
    effect: str                              # "allow" | "deny"

class AuditLog(SQLModel, table=True):
    id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
    project_id: uuid.UUID = Field(foreign_key="projectconfig.id", index=True)
    actor_id: uuid.UUID = Field(foreign_key="actor.id")
    action: str                              # e.g. "status_changed", "reparented"
    node_id: uuid.UUID | None = None
    before: dict | None = None               # JSON snapshot
    after: dict | None = None                # JSON snapshot
    created_at: datetime = Field(index=True)

class WebhookConfig(SQLModel, table=True):
    id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
    project_id: uuid.UUID = Field(foreign_key="projectconfig.id", index=True)
    url: str
    secret_hash: str                         # hashed, never stored plaintext
    events: list[str] = Field(default_factory=list)
    active: bool = True
    consecutive_failures: int = 0
    created_at: datetime

Dual-Database Consistency

Neo4j and Postgres are not replicated — they own different data, linked by UUID. Both writes happen in the same API request. The consistency strategy for v0.1:

Write Order

  1. Postgres first. Open a SQLAlchemy transaction. Write content/metadata. Do not commit yet.
  2. Neo4j second. Perform the graph mutation (create node, update properties, create edge).
  3. Commit Postgres. If Postgres commit succeeds, the operation is complete.

Failure Handling

  • Neo4j write fails: Rollback the Postgres transaction (it hasn't committed). Clean failure, no orphans.
  • Postgres commit fails after Neo4j succeeds: Issue a compensating operation on Neo4j (delete the node/revert the property change). Log the incident for review.
  • Partial Neo4j failure (e.g., network timeout with unknown state): Flag the UUID for reconciliation review.

Reconciliation Job

A periodic background task (Taskiq, runs every 15 minutes) checks for inconsistencies:

  • UUIDs present in Neo4j but missing from Postgres (orphan graph nodes)
  • UUIDs present in Postgres NodeContent but missing from Neo4j (orphan content)
  • Mismatched lightweight properties (status, assignee) between Neo4j and Postgres audit log

Orphans are logged and surfaced in an admin dashboard. Auto-repair is deferred — manual review for v0.1.

What's Eventually Consistent

  • Meilisearch index: Updated asynchronously via Taskiq. Acceptable lag of seconds.
  • Redis cache: Invalidated on mutation. TTL-based expiry as fallback.
  • Centrifugo events: Fire-and-forget publish. Missed events are recoverable by client re-fetch.

Backend Architecture

FastAPI Application Structure

non-linear-api/
├── app/
│   ├── main.py                 # App, middleware, startup/shutdown
│   ├── config.py               # Settings from env vars
│   ├── dependencies.py         # Shared deps (db sessions, auth, current_user)
│   ├── auth/                   # Authentik integration
│   │   ├── oidc.py             # Token validation, OIDC discovery
│   │   ├── permissions.py      # Policy engine evaluation
│   │   └── agent_tokens.py     # API token management for agents
│   ├── graph/                  # Neo4j layer
│   │   ├── connection.py       # Neo4j driver management
│   │   ├── queries.py          # Cypher query templates
│   │   ├── mutations.py        # Graph write operations
│   │   └── traversal.py        # Subtree, path, neighbor queries
│   ├── content/                # Postgres layer
│   │   ├── models.py           # SQLAlchemy/SQLModel models
│   │   ├── descriptions.py     # Rich text CRUD
│   │   ├── comments.py         # Comment thread CRUD
│   │   ├── attachments.py      # Inline attachment metadata + MinIO upload/download — POST-ALPHA
│   │   └── artifacts.py        # Layer 4: artifact CRUD (links, files, embeds) — POST-ALPHA
│   ├── connections/            # Layer 3: code connection analysis — POST-ALPHA
│   │   ├── inference.py        # Auto-infer dependencies from repo analysis
│   │   └── manual.py           # Manual code connection CRUD
│   ├── search/                 # Meilisearch integration
│   │   ├── indexer.py          # Index updates on mutations
│   │   └── search.py           # Query interface
│   ├── realtime/               # WebSocket layer
│   │   ├── manager.py          # Connection management
│   │   └── events.py           # Event types and broadcasting
│   ├── tasks/                  # Taskiq background jobs
│   │   ├── webhooks.py         # Deliver webhooks to agent endpoints
│   │   ├── indexing.py         # Async search index updates
│   │   ├── notifications.py   # Notification delivery
│   │   └── connections.py     # Layer 3: periodic code connection inference — POST-ALPHA
│   └── api/v1/                 # Route handlers
│       ├── nodes.py            # CRUD + tree operations
│       ├── links.py            # Lateral link management (Layer 2 + Layer 3)
│       ├── projects.py         # Project CRUD
│       ├── comments.py         # Comment endpoints
│       ├── attachments.py      # Inline upload/download
│       ├── artifacts.py        # Layer 4: artifact endpoints — POST-ALPHA
│       ├── connections.py      # Layer 3: code connection endpoints — POST-ALPHA
│       ├── search.py           # Search endpoint
│       └── agent.py            # Agent-specific API surface
├── tests/
├── alembic/                    # Postgres migrations
├── docker-compose.yml
└── pyproject.toml

Request Flows

Typical read ("get node with full context"):

Client → FastAPI → Auth middleware (validate token via Authentik)
  → Policy engine (check permissions)
  → Neo4j: fetch node + parent + children + links
  → Postgres: fetch description, comments, attachment meta
  → Merge response → Client

Typical write ("change node status"):

Client → FastAPI → Auth → Policy engine
  → Neo4j: update node status
  → Redis: invalidate cache, publish event
  → Taskiq: queue webhook delivery, search index update
  → WebSocket: broadcast to connected clients
  → Response → Client

Sync Strategy (Neo4j ↔ Postgres)

Not replicated — they own different data. Linked by UUID. Both operations happen in same API request. Compensating transaction pattern for consistency. Eventual consistency acceptable for search index and cache.

Auth Architecture

┌──────────┐     OIDC token      ┌───────────┐
│  Vue App ├─────────────────────►│ Authentik │
└────┬─────┘     (login flow)    └─────┬─────┘
     │                                 │
     │ Bearer token                    │ Token introspection
     ▼                                 ▼
┌──────────┐◄────────────────────┌───────────┐
│ FastAPI  │  validate token     │ Authentik │
│ (resource│  check claims       │  (OIDC    │
│  server) │                     │  provider)│
└──────────┘                     └───────────┘
  • Human users: OIDC login flow. JWT access tokens.
  • AI agents: API tokens issued through Authentik, tied to agent actor accounts.
  • FastAPI: pure resource server. Validates tokens, reads claims, enforces policies.

API Error Contract

All error responses use a consistent envelope:

{
  "error": {
    "code": "validation_error",
    "message": "Human-readable description",
    "details": [
      { "field": "title", "message": "Field is required" }
    ]
  }
}

HTTP Status Codes

Code Usage
400 Malformed request (bad JSON, missing required fields)
404 Resource not found or actor lacks permission to see it. Permission-denied nodes return 404 (not 403) to prevent information leakage about resource existence.
409 Conflict (e.g., duplicate short_id, stale update)
422 Validation error. Standard FastAPI/Pydantic response with field-level detail.
429 Rate limited. Includes Retry-After header (seconds).
500 Internal server error. Logged with correlation ID for debugging.

Rate Limiting

  • Agent API: token bucket per actor, configurable per role (default: 100 req/min).
  • Human API: higher limits (default: 300 req/min).
  • Enforced via Redis. 429 response includes Retry-After and X-RateLimit-Remaining headers.

Security

Input Sanitization

  • Cypher injection: All Neo4j queries use parameterized Cypher exclusively. User-supplied values are never interpolated into query strings. The graph/queries.py module enforces this by accepting only typed parameters.
  • SQL injection: SQLModel/SQLAlchemy parameterized queries. No raw SQL with string formatting.
  • XSS prevention: All markdown content (descriptions, comments) is sanitized server-side using nh3 (Rust-based HTML sanitizer) before storage. Both raw markdown and pre-rendered sanitized HTML are stored. The frontend renders the pre-sanitized HTML.
  • File upload validation: MIME type validation against allowlist (images, PDFs, common doc formats). Size limit: 25 MB per file. Filename sanitization to prevent path traversal.

Transport & Headers

  • TLS: All traffic encrypted via Caddy reverse proxy (automatic Let's Encrypt certificates).
  • CSRF: SameSite=Lax cookies for browser sessions. Bearer token API calls are inherently CSRF-safe.
  • Content-Security-Policy: Strict CSP headers served by Caddy. script-src 'self', no inline scripts, no eval.
  • CORS: Allowlist of known origins (frontend domain). No wildcard in production.
  • Security headers: X-Content-Type-Options: nosniff, X-Frame-Options: DENY, Strict-Transport-Security.

Design Language

Targets Linear's aesthetic: minimal, fast, slightly dark-IDE feel.

  • Spacing: tight, no wasted space
  • Colors: muted base palette, high-contrast accents only for status/priority
  • Borders: almost none — separation via spacing and subtle background shifts
  • Dark mode: default, light mode secondary
  • Typography: Inter, small-but-readable sizes
  • Animations: subtle slides and fades, 100-150ms, nothing bouncy
  • Optimistic updates: every interaction feels instant, syncs in background

Real-Time Updates (Centrifugo)

Centrifugo handles both live UI updates and notification delivery over WebSocket. Redis is no longer used for WebSocket pub/sub directly — Centrifugo manages its own connections and subscribes to events published by the backend via its server API.

Channel Structure

Channel Scope Subscribers
project:{id} All mutations in a project All connected project members
node:{id} Mutations to a specific node Clients viewing the focus widget for that node
user:{id} Personal notifications Single user's connected clients

Events Pushed

Event Layer Channel Payload
node.status_changed 2 project:{id} + node:{id} node_id, old_status, new_status, actor
node.created 1/2 project:{id} node_id, parent_id, type, title, actor
node.deleted 1/2 project:{id} + node:{id} node_id, actor
node.reparented 1/2 project:{id} + node:{id} node_id, old_parent, new_parent, actor
comment.added 2 node:{id} comment_id, node_id, author, preview
link.changed 2/3 project:{id} source_id, target_id, link_type, layer, action (created/removed)
assignment.changed 2 project:{id} + node:{id} node_id, old_assignee, new_assignee
artifact.attached 4 project:{id} + node:{id} artifact_id, node_id, title, kind, actor
artifact.removed 4 project:{id} + node:{id} artifact_id, node_id, actor
connection.inferred 3 project:{id} source_id, target_id, link_type, source: "inferred"
notification user:{id} notification object

The layer field on link.changed events tells the client which layer the change affects, enabling clients to ignore events for inactive layers.

Backend Publish Flow

Mutation request → Postgres + Neo4j writes
  → Centrifugo server API: publish event to relevant channels
  → Taskiq: queue webhook delivery + search index update
  → Response to client

The backend publishes to Centrifugo via its HTTP server API (not through Redis pub/sub). This gives direct control over which channels receive which events.

Client-Side Handling

  • Pinia store: Incoming Centrifugo events are applied to the Pinia store. The graph view, focus widget, and list view all react to store changes.
  • Optimistic updates: The client applies mutations locally before the server responds. If the server rejects the mutation (4xx), the client reverts the optimistic change by re-fetching the affected node.
  • Conflict model: Last-write-wins for simple fields (status, assignee, labels). The server is the source of truth. When two clients modify the same field concurrently, the last write committed to Neo4j is the one that Centrifugo broadcasts.
  • Reconnection: On WebSocket disconnect, the client re-subscribes to channels and fetches the current state to catch up on missed events.

Cross-Platform

  • Tauri desktop: No offline support. Tauri wraps the Vue app as-is. When the network is unavailable, the app shows a connection-lost banner and retries. No local mutation queue.

Docker Compose

Alpha (~10 containers)

services:
  api:          # FastAPI (uvicorn)
  frontend:     # Vue 3 (Vite)
  worker:       # Taskiq worker (same codebase as api)
  neo4j:        # Graph database
  postgres:     # Relational database
  redis:        # Cache + rate limiting
  centrifugo:   # Real-time WebSocket server
  caddy:        # Reverse proxy + automatic TLS
  authentik:    # Identity provider (server + worker)
  authentik-db: # Authentik's own Postgres

~10 containers. Runs comfortably on 16GB RAM. Search via Postgres tsvector.

Development (v0.1-full)

services:
  api:          # FastAPI (uvicorn --reload)
  frontend:     # Vue 3 (vite dev server)
  worker:       # Taskiq worker (same codebase as api)
  neo4j:        # Graph database
  postgres:     # Relational database
  redis:        # Cache + rate limiting
  meilisearch:  # Search engine
  minio:        # Object storage
  centrifugo:   # Real-time WebSocket server
  caddy:        # Reverse proxy + automatic TLS
  authentik:    # Identity provider (server + worker)
  authentik-db: # Authentik's own Postgres

~13 containers. Runs comfortably on 16GB RAM.

Production (Single-Node, v0.1-full)

Same Docker Compose topology with production-grade additions:

services:
  # ... all of the above, plus:
  caddy:        # Reverse proxy + automatic TLS
  vault:        # Secrets management (HashiCorp Vault)
  prometheus:   # Metrics collection
  grafana:      # Dashboards + alerting
  loki:         # Log aggregation
  tempo:        # Distributed tracing

~18 containers total. Recommended: 32GB RAM, 4+ CPU cores for production.

Reverse Proxy (Caddy)

Caddy serves as the single entry point for all traffic:

  • Automatic TLS via Let's Encrypt (ACME). Zero-config HTTPS.
  • Routes: /api/* → FastAPI, /ws/* → Centrifugo, /* → Vue frontend (nginx or static files).
  • Security headers: CSP, HSTS, X-Frame-Options, X-Content-Type-Options injected at this layer.
  • Rate limiting: Basic connection-level rate limiting as a first defense layer (application-level rate limiting in FastAPI for finer control).

Secrets Management

Alpha: Docker secrets or environment variables. Vault is introduced post-alpha.

HashiCorp Vault (post-alpha)

  • All sensitive configuration (database passwords, Authentik client secrets, agent API token signing keys, webhook HMAC secrets, MinIO credentials) stored in Vault.
  • FastAPI reads secrets from Vault at startup via the hvac Python client.
  • Secret rotation supported without application restart (Vault dynamic secrets for Postgres credentials).

Docker Secrets (Fallback)

For simpler deployments that don't want Vault overhead, Docker secrets via compose files are supported. Environment variables as the last resort.

Observability (post-alpha)

Alpha: Structured JSON logs via structlog + docker logs. The full observability stack below is introduced post-alpha.

Metrics (Prometheus + Grafana)

  • FastAPI: prometheus-fastapi-instrumentator exposes request latency, status codes, in-flight requests at /metrics.
  • Neo4j: Neo4j Prometheus plugin or neo4j-exporter for query latency, cache hit rates, transaction counts.
  • Postgres: postgres_exporter for connection pool, query stats, replication lag.
  • Redis: redis_exporter for memory, hit rate, connected clients.
  • Centrifugo: Built-in Prometheus metrics for connections, channels, messages.
  • Grafana dashboards: Pre-built dashboards for each service. Alerting rules for error rate spikes, high latency, container restarts.

Tracing (OpenTelemetry + Tempo)

  • OpenTelemetry SDK instrumented in FastAPI. Traces span the full request lifecycle: auth → policy check → Neo4j query → Postgres query → response.
  • Trace context propagated to Taskiq workers (webhook delivery, indexing).
  • Traces stored in Grafana Tempo, queryable from Grafana.

Logging (Structured JSON + Loki)

  • All services emit structured JSON logs (Python structlog for FastAPI).
  • Fields: timestamp, level, correlation_id, actor_id, action, duration_ms.
  • Collected by Grafana Loki via Docker logging driver or Promtail.
  • Correlation ID links logs across FastAPI → Taskiq → Centrifugo for a single request.

Health Checks

Every service exposes a health check endpoint used by Docker Compose healthcheck directives:

  • GET /health on FastAPI, Centrifugo
  • TCP checks for Neo4j, Postgres, Redis, Meilisearch, MinIO
  • Grafana alerts on health check failures.

Database Migrations

Postgres (Alembic)

  • Alembic manages all Postgres schema migrations.
  • Migration files stored in alembic/versions/.
  • Auto-generated from SQLModel model changes (alembic revision --autogenerate).
  • Applied on deployment: alembic upgrade head runs before the API container starts.

Neo4j (Versioned Cypher Scripts)

  • Migration scripts stored in neo4j/migrations/ as numbered Cypher files (001_initial_schema.cypher, 002_add_cycle_nodes.cypher).
  • A lightweight migration runner (Python script) tracks applied migrations in a Neo4j :Migration node.
  • Applied on deployment before the API container starts.

Testing Strategy

Integration Tests (Primary)

  • Framework: pytest with testcontainers.
  • Containers: Neo4j, Postgres, Redis, Meilisearch spun up per test session (shared across tests for speed, reset between test classes).
  • Scope: API endpoint tests hitting real databases. Policy engine tests with real Neo4j graph structures. Dual-DB consistency tests verifying write-order semantics.
  • Fixtures: Factory functions that create graph structures (components, issues, links) for test scenarios.

End-to-End Tests

  • Framework: Playwright against the full Docker Compose stack.
  • Scope: Critical user flows — create project, add components, navigate graph, triage inbox, agent API workflows.
  • Environment: Dedicated docker-compose.test.yml with ephemeral containers.

What's Not Mandated

Isolated unit tests are not required by convention. The dual-DB architecture makes mocking both databases brittle. Integration tests with real containers are the priority.

CI/CD Pipeline

push/MR → lint → test → build → deploy
Stage Tools Description
Lint ruff (Python), eslint + prettier (Vue/TS) Code style and static analysis
Test pytest + testcontainers, Playwright Integration + E2E tests
Build Docker Build API, frontend, worker images
Push Container registry Push tagged images to GitLab Container Registry
Deploy SSH + docker compose pull Pull new images on production server, rolling restart

CI runs on GitLab CI. Pipeline definition in .gitlab-ci.yml. Testcontainers require Docker-in-Docker or a privileged runner.

Open Technical Questions

  1. Graph viz library: D3 vs Cytoscape — prototype comparison pending
  2. Neo4j driver: official neo4j Python driver vs neomodel OGM
  3. Gantt implementation: custom or frappe-gantt as starting point