non-linear-docs/09-TECH-STACK-AND-ARCHITECTURE.md

652 lines
30 KiB
Markdown

# Non-Linear: Tech Stack & Architecture
## Stack Overview
```
┌─────────────────────────────────────────────────────────┐
│ FRONTEND │
│ Vue 3 + Tailwind + Headless UI + ECharts │
│ Graph Viz: TBD (D3 vs Cytoscape — eval pending) │
│ Command Palette: vue-command-palette / custom │
│ Keybindings: VueUse useMagicKeys │
│ Icons: Lucide │ Font: Inter │ Motion: @vueuse/motion│
│ State: Pinia │ HTTP: ofetch │ WS: centrifuge-js │
├─────────────────────────────────────────────────────────┤
│ CROSS-PLATFORM │
│ Desktop: Tauri (thin wrapper, no offline) — v0.1 │
│ Mobile: Capacitor (responsive web first) — v0.2+ │
├─────────────────────────────────────────────────────────┤
│ BACKEND │
│ FastAPI (Python) │
│ Taskiq (async task queue — webhooks, imports, agents) │
├─────────────────────────────────────────────────────────┤
│ DATA LAYER │
│ Neo4j — graph topology (nodes, edges, status, labels) │
│ Postgres — content & metadata (rich text, comments, │
│ attachments meta, audit logs, project cfg) │
│ Redis — caching, rate limiting │
│ Meilisearch — full-text search (issues, comments) │
│ MinIO — S3-compatible file storage (attachments) │
├─────────────────────────────────────────────────────────┤
│ REAL-TIME │
│ Centrifugo — WebSocket server, live updates, push │
├─────────────────────────────────────────────────────────┤
│ AUTH │
│ Authentik — OIDC, API tokens, role mgmt, SSO-ready │
├─────────────────────────────────────────────────────────┤
│ INFRA/OPS │
│ Caddy (reverse proxy + TLS) │ Vault (secrets) │
│ Prometheus + Grafana (metrics + dashboards) │
│ Loki (logs) │ Tempo (traces) │ OpenTelemetry (SDK) │
├─────────────────────────────────────────────────────────┤
│ DEPLOYMENT │
│ Docker Compose (dev + single-node production) │
└─────────────────────────────────────────────────────────┘
```
## Data Boundary
### Neo4j — Graph Topology
Owns the decomposition tree and lateral links:
- Node identity (UUID), short ID
- Lightweight properties: status, labels, assignee_id, created_at, updated_at
- Parent → child edges (decomposition tree)
- Lateral link edges: blocks, blocked_by, relates_to, duplicates
- Project root references, cycle membership
**Why Neo4j over Postgres recursive CTEs:** Queries like "find all unblocked leaves in this subtree," "critical path through blocks links," "everything 3 hops from this node" are what Cypher is built for. CTEs get painful with lateral links and variable-depth queries. The gap widens in v0.2+ with cross-project edges.
### Postgres — Content & Metadata
- **Rich text content:** issue descriptions (markdown)
- **Comment threads:** body, author, parent_comment_id (threading), timestamps
- **Attachment metadata:** filename, size, mime_type, s3_key, uploader_id, uploaded_at
- **User/agent accounts:** profile data, preferences, notification settings
- **Project settings:** configuration, member lists, default policies
- **Audit logs:** who changed what, when, with before/after snapshots
- **Policy definitions:** role templates, custom permission rules
**Linked to Neo4j by UUID.** Neo4j node stores `id: "abc-123"`. Postgres stores full content keyed by same UUID. FastAPI joins them as needed.
### Redis — Caching & Real-Time
- Subtree query cache (TTL, invalidated on graph mutations)
- WebSocket pub/sub for real-time updates
- Rate limiting for agent API
- Authentik token validation cache
### Meilisearch — Search Index
- Indexes issue titles, descriptions, comments, labels
- Fed from both Neo4j and Postgres
- Powers command palette search (issues + commands in one result set)
- Typo-tolerant, prefix search, filtering by label/status/assignee
### MinIO — File Storage
- S3-compatible API, self-hosted
- Stores attachment files (images, docs)
- Postgres stores metadata and S3 key; MinIO stores bytes
- Migration path to AWS S3: zero code changes
## Concrete Database Schemas
### UUID Strategy
All entities use UUIDv7 (time-sortable). Generated application-side by FastAPI before writing to either database. The same UUID is used as the primary key in both Neo4j and Postgres, serving as the cross-database join key.
### Neo4j Schema
Neo4j stores graph topology and lightweight node properties. All content lives in Postgres.
**Node labels and properties:**
```cypher
// Component node
CREATE (c:Component {
id: "uuidv7",
short_id: "NL-C12",
title: "auth-service",
status: null, // components have no status
labels: ["backend", "core"],
owner_id: "uuidv7",
assignee_id: null,
repo_provider: "github",
repo_url: "https://github.com/team/auth",
repo_path: "/src/oauth",
repo_branch: "main",
created_at: datetime(),
updated_at: datetime()
})
// Issue node
CREATE (i:Issue {
id: "uuidv7",
short_id: "NL-42",
title: "implement refresh tokens",
status: "todo",
labels: ["feature", "p1"],
assignee_id: "uuidv7",
created_by: "uuidv7",
cycle_id: "uuidv7",
created_at: datetime(),
updated_at: datetime()
})
// Project root (virtual node linking to decomposition tree root)
CREATE (p:Project {
id: "uuidv7",
workspace_id: "uuidv7",
root_id: "uuidv7"
})
```
**Relationships:**
```cypher
// Decomposition tree (parent → child)
(parent)-[:HAS_CHILD]->(child)
// Lateral links
(a)-[:BLOCKS]->(b)
(a)-[:RELATES_TO]->(b)
(a)-[:DUPLICATES]->(b)
(a)-[:DEPENDS_ON]->(b) // inter-component architectural dependency
// Cycle membership
(issue)-[:IN_CYCLE]->(cycle:Cycle { id, name, start_date, end_date })
```
**Indexes:**
```cypher
CREATE INDEX comp_id FOR (c:Component) ON (c.id);
CREATE INDEX comp_short FOR (c:Component) ON (c.short_id);
CREATE INDEX issue_id FOR (i:Issue) ON (i.id);
CREATE INDEX issue_short FOR (i:Issue) ON (i.short_id);
CREATE INDEX issue_status FOR (i:Issue) ON (i.status);
CREATE INDEX issue_assignee FOR (i:Issue) ON (i.assignee_id);
CREATE INDEX project_id FOR (p:Project) ON (p.id);
```
### Postgres Schema (SQLModel)
Postgres stores all content, metadata, and configuration. Managed via Alembic migrations.
```python
class NodeContent(SQLModel, table=True):
"""Rich content for both components and issues."""
id: uuid.UUID = Field(primary_key=True) # matches Neo4j node id
description: str | None = None # markdown
description_html: str | None = None # pre-rendered, sanitized HTML
class Comment(SQLModel, table=True):
id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
node_id: uuid.UUID = Field(foreign_key="nodecontent.id", index=True)
author_id: uuid.UUID = Field(foreign_key="actor.id")
body: str # markdown
body_html: str # pre-rendered, sanitized HTML
created_at: datetime
updated_at: datetime
class CommentReaction(SQLModel, table=True):
id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
comment_id: uuid.UUID = Field(foreign_key="comment.id", index=True)
actor_id: uuid.UUID = Field(foreign_key="actor.id")
emoji: str # e.g. "+1", "rocket"
created_at: datetime
class Attachment(SQLModel, table=True):
id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
node_id: uuid.UUID = Field(foreign_key="nodecontent.id", index=True)
filename: str
size_bytes: int
mime_type: str
s3_key: str # MinIO object key
uploader_id: uuid.UUID = Field(foreign_key="actor.id")
uploaded_at: datetime
class Actor(SQLModel, table=True):
"""Human user or AI agent."""
id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
type: str # "user" | "agent"
name: str
email: str | None = None
authentik_uid: str | None = None # OIDC subject claim
preferences: dict = Field(default_factory=dict) # JSON: theme, notifications, etc.
created_at: datetime
class Workspace(SQLModel, table=True):
id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
name: str
slug: str = Field(unique=True, index=True)
created_at: datetime
class WorkspaceMember(SQLModel, table=True):
workspace_id: uuid.UUID = Field(foreign_key="workspace.id", primary_key=True)
actor_id: uuid.UUID = Field(foreign_key="actor.id", primary_key=True)
role: str # workspace-level role
joined_at: datetime
class ProjectConfig(SQLModel, table=True):
id: uuid.UUID = Field(primary_key=True) # matches Neo4j Project id
workspace_id: uuid.UUID = Field(foreign_key="workspace.id", index=True)
name: str
settings: dict = Field(default_factory=dict) # JSON: custom statuses, defaults
created_at: datetime
class PolicyRule(SQLModel, table=True):
id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
project_id: uuid.UUID = Field(foreign_key="projectconfig.id", index=True)
actor_id: uuid.UUID | None = Field(default=None) # null = role-level
role_name: str | None = None
action: str # e.g. "read_node", "create_child", "*"
resource_scope: str # "global" | "subtree:{node_id}" | "node:{node_id}"
effect: str # "allow" | "deny"
class AuditLog(SQLModel, table=True):
id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
project_id: uuid.UUID = Field(foreign_key="projectconfig.id", index=True)
actor_id: uuid.UUID = Field(foreign_key="actor.id")
action: str # e.g. "status_changed", "reparented"
node_id: uuid.UUID | None = None
before: dict | None = None # JSON snapshot
after: dict | None = None # JSON snapshot
created_at: datetime = Field(index=True)
class WebhookConfig(SQLModel, table=True):
id: uuid.UUID = Field(default_factory=uuid7, primary_key=True)
project_id: uuid.UUID = Field(foreign_key="projectconfig.id", index=True)
url: str
secret_hash: str # hashed, never stored plaintext
events: list[str] = Field(default_factory=list)
active: bool = True
consecutive_failures: int = 0
created_at: datetime
```
## Dual-Database Consistency
Neo4j and Postgres are **not replicated** — they own different data, linked by UUID. Both writes happen in the same API request. The consistency strategy for v0.1:
### Write Order
1. **Postgres first.** Open a SQLAlchemy transaction. Write content/metadata. Do not commit yet.
2. **Neo4j second.** Perform the graph mutation (create node, update properties, create edge).
3. **Commit Postgres.** If Postgres commit succeeds, the operation is complete.
### Failure Handling
- **Neo4j write fails:** Rollback the Postgres transaction (it hasn't committed). Clean failure, no orphans.
- **Postgres commit fails after Neo4j succeeds:** Issue a compensating operation on Neo4j (delete the node/revert the property change). Log the incident for review.
- **Partial Neo4j failure (e.g., network timeout with unknown state):** Flag the UUID for reconciliation review.
### Reconciliation Job
A periodic background task (Taskiq, runs every 15 minutes) checks for inconsistencies:
- UUIDs present in Neo4j but missing from Postgres (orphan graph nodes)
- UUIDs present in Postgres `NodeContent` but missing from Neo4j (orphan content)
- Mismatched lightweight properties (status, assignee) between Neo4j and Postgres audit log
Orphans are logged and surfaced in an admin dashboard. Auto-repair is deferred — manual review for v0.1.
### What's Eventually Consistent
- **Meilisearch index:** Updated asynchronously via Taskiq. Acceptable lag of seconds.
- **Redis cache:** Invalidated on mutation. TTL-based expiry as fallback.
- **Centrifugo events:** Fire-and-forget publish. Missed events are recoverable by client re-fetch.
## Backend Architecture
### FastAPI Application Structure
```
non-linear-api/
├── app/
│ ├── main.py # App, middleware, startup/shutdown
│ ├── config.py # Settings from env vars
│ ├── dependencies.py # Shared deps (db sessions, auth, current_user)
│ ├── auth/ # Authentik integration
│ │ ├── oidc.py # Token validation, OIDC discovery
│ │ ├── permissions.py # Policy engine evaluation
│ │ └── agent_tokens.py # API token management for agents
│ ├── graph/ # Neo4j layer
│ │ ├── connection.py # Neo4j driver management
│ │ ├── queries.py # Cypher query templates
│ │ ├── mutations.py # Graph write operations
│ │ └── traversal.py # Subtree, path, neighbor queries
│ ├── content/ # Postgres layer
│ │ ├── models.py # SQLAlchemy/SQLModel models
│ │ ├── descriptions.py # Rich text CRUD
│ │ ├── comments.py # Comment thread CRUD
│ │ └── attachments.py # Metadata + MinIO upload/download
│ ├── search/ # Meilisearch integration
│ │ ├── indexer.py # Index updates on mutations
│ │ └── search.py # Query interface
│ ├── realtime/ # WebSocket layer
│ │ ├── manager.py # Connection management
│ │ └── events.py # Event types and broadcasting
│ ├── tasks/ # Taskiq background jobs
│ │ ├── webhooks.py # Deliver webhooks to agent endpoints
│ │ ├── indexing.py # Async search index updates
│ │ └── notifications.py # Notification delivery
│ └── api/v1/ # Route handlers
│ ├── nodes.py # CRUD + tree operations
│ ├── links.py # Lateral link management
│ ├── projects.py # Project CRUD
│ ├── comments.py # Comment endpoints
│ ├── attachments.py # Upload/download
│ ├── search.py # Search endpoint
│ └── agent.py # Agent-specific API surface
├── tests/
├── alembic/ # Postgres migrations
├── docker-compose.yml
└── pyproject.toml
```
### Request Flows
**Typical read ("get node with full context"):**
```
Client → FastAPI → Auth middleware (validate token via Authentik)
→ Policy engine (check permissions)
→ Neo4j: fetch node + parent + children + links
→ Postgres: fetch description, comments, attachment meta
→ Merge response → Client
```
**Typical write ("change node status"):**
```
Client → FastAPI → Auth → Policy engine
→ Neo4j: update node status
→ Redis: invalidate cache, publish event
→ Taskiq: queue webhook delivery, search index update
→ WebSocket: broadcast to connected clients
→ Response → Client
```
### Sync Strategy (Neo4j ↔ Postgres)
Not replicated — they own different data. Linked by UUID. Both operations happen in same API request. Compensating transaction pattern for consistency. Eventual consistency acceptable for search index and cache.
## Auth Architecture
```
┌──────────┐ OIDC token ┌───────────┐
│ Vue App ├─────────────────────►│ Authentik │
└────┬─────┘ (login flow) └─────┬─────┘
│ │
│ Bearer token │ Token introspection
▼ ▼
┌──────────┐◄────────────────────┌───────────┐
│ FastAPI │ validate token │ Authentik │
│ (resource│ check claims │ (OIDC │
│ server) │ │ provider)│
└──────────┘ └───────────┘
```
- **Human users:** OIDC login flow. JWT access tokens.
- **AI agents:** API tokens issued through Authentik, tied to agent actor accounts.
- **FastAPI:** pure resource server. Validates tokens, reads claims, enforces policies.
## API Error Contract
All error responses use a consistent envelope:
```json
{
"error": {
"code": "validation_error",
"message": "Human-readable description",
"details": [
{ "field": "title", "message": "Field is required" }
]
}
}
```
### HTTP Status Codes
| Code | Usage |
|------|-------|
| `400` | Malformed request (bad JSON, missing required fields) |
| `404` | Resource not found **or** actor lacks permission to see it. Permission-denied nodes return 404 (not 403) to prevent information leakage about resource existence. |
| `409` | Conflict (e.g., duplicate `short_id`, stale update) |
| `422` | Validation error. Standard FastAPI/Pydantic response with field-level detail. |
| `429` | Rate limited. Includes `Retry-After` header (seconds). |
| `500` | Internal server error. Logged with correlation ID for debugging. |
### Rate Limiting
- Agent API: token bucket per actor, configurable per role (default: 100 req/min).
- Human API: higher limits (default: 300 req/min).
- Enforced via Redis. `429` response includes `Retry-After` and `X-RateLimit-Remaining` headers.
## Security
### Input Sanitization
- **Cypher injection:** All Neo4j queries use parameterized Cypher exclusively. User-supplied values are never interpolated into query strings. The `graph/queries.py` module enforces this by accepting only typed parameters.
- **SQL injection:** SQLModel/SQLAlchemy parameterized queries. No raw SQL with string formatting.
- **XSS prevention:** All markdown content (descriptions, comments) is sanitized server-side using `nh3` (Rust-based HTML sanitizer) before storage. Both raw markdown and pre-rendered sanitized HTML are stored. The frontend renders the pre-sanitized HTML.
- **File upload validation:** MIME type validation against allowlist (images, PDFs, common doc formats). Size limit: 25 MB per file. Filename sanitization to prevent path traversal.
### Transport & Headers
- **TLS:** All traffic encrypted via Caddy reverse proxy (automatic Let's Encrypt certificates).
- **CSRF:** SameSite=Lax cookies for browser sessions. Bearer token API calls are inherently CSRF-safe.
- **Content-Security-Policy:** Strict CSP headers served by Caddy. `script-src 'self'`, no inline scripts, no `eval`.
- **CORS:** Allowlist of known origins (frontend domain). No wildcard in production.
- **Security headers:** `X-Content-Type-Options: nosniff`, `X-Frame-Options: DENY`, `Strict-Transport-Security`.
## Design Language
Targets Linear's aesthetic: minimal, fast, slightly dark-IDE feel.
- **Spacing:** tight, no wasted space
- **Colors:** muted base palette, high-contrast accents only for status/priority
- **Borders:** almost none — separation via spacing and subtle background shifts
- **Dark mode:** default, light mode secondary
- **Typography:** Inter, small-but-readable sizes
- **Animations:** subtle slides and fades, 100-150ms, nothing bouncy
- **Optimistic updates:** every interaction feels instant, syncs in background
## Real-Time Updates (Centrifugo)
Centrifugo handles both live UI updates and notification delivery over WebSocket. Redis is no longer used for WebSocket pub/sub directly — Centrifugo manages its own connections and subscribes to events published by the backend via its server API.
### Channel Structure
| Channel | Scope | Subscribers |
|---------|-------|-------------|
| `project:{id}` | All mutations in a project | All connected project members |
| `node:{id}` | Mutations to a specific node | Clients viewing the focus widget for that node |
| `user:{id}` | Personal notifications | Single user's connected clients |
### Events Pushed
| Event | Channel | Payload |
|-------|---------|---------|
| `node.status_changed` | `project:{id}` + `node:{id}` | node_id, old_status, new_status, actor |
| `node.created` | `project:{id}` | node_id, parent_id, type, title, actor |
| `node.deleted` | `project:{id}` + `node:{id}` | node_id, actor |
| `node.reparented` | `project:{id}` + `node:{id}` | node_id, old_parent, new_parent, actor |
| `comment.added` | `node:{id}` | comment_id, node_id, author, preview |
| `link.changed` | `project:{id}` | source_id, target_id, link_type, action (created/removed) |
| `assignment.changed` | `project:{id}` + `node:{id}` | node_id, old_assignee, new_assignee |
| `notification` | `user:{id}` | notification object |
### Backend Publish Flow
```
Mutation request → Postgres + Neo4j writes
→ Centrifugo server API: publish event to relevant channels
→ Taskiq: queue webhook delivery + search index update
→ Response to client
```
The backend publishes to Centrifugo via its HTTP server API (not through Redis pub/sub). This gives direct control over which channels receive which events.
### Client-Side Handling
- **Pinia store:** Incoming Centrifugo events are applied to the Pinia store. The graph view, focus widget, and list view all react to store changes.
- **Optimistic updates:** The client applies mutations locally before the server responds. If the server rejects the mutation (4xx), the client reverts the optimistic change by re-fetching the affected node.
- **Conflict model:** Last-write-wins for simple fields (status, assignee, labels). The server is the source of truth. When two clients modify the same field concurrently, the last write committed to Neo4j is the one that Centrifugo broadcasts.
- **Reconnection:** On WebSocket disconnect, the client re-subscribes to channels and fetches the current state to catch up on missed events.
### Cross-Platform
- **Tauri desktop:** No offline support. Tauri wraps the Vue app as-is. When the network is unavailable, the app shows a connection-lost banner and retries. No local mutation queue.
## Docker Compose
### Development
```yaml
services:
api: # FastAPI (uvicorn --reload)
frontend: # Vue 3 (vite dev server)
worker: # Taskiq worker (same codebase as api)
neo4j: # Graph database
postgres: # Relational database
redis: # Cache + rate limiting
meilisearch: # Search engine
minio: # Object storage
centrifugo: # Real-time WebSocket server
authentik: # Identity provider (server + worker)
authentik-db: # Authentik's own Postgres
```
~12 containers. Runs comfortably on 16GB RAM.
### Production (Single-Node)
Same Docker Compose topology with production-grade additions:
```yaml
services:
# ... all of the above, plus:
caddy: # Reverse proxy + automatic TLS
vault: # Secrets management (HashiCorp Vault)
prometheus: # Metrics collection
grafana: # Dashboards + alerting
loki: # Log aggregation
tempo: # Distributed tracing
```
~18 containers total. Recommended: 32GB RAM, 4+ CPU cores for production.
## Reverse Proxy (Caddy)
Caddy serves as the single entry point for all traffic:
- **Automatic TLS** via Let's Encrypt (ACME). Zero-config HTTPS.
- **Routes:** `/api/*` → FastAPI, `/ws/*` → Centrifugo, `/*` → Vue frontend (nginx or static files).
- **Security headers:** CSP, HSTS, X-Frame-Options, X-Content-Type-Options injected at this layer.
- **Rate limiting:** Basic connection-level rate limiting as a first defense layer (application-level rate limiting in FastAPI for finer control).
## Secrets Management
### HashiCorp Vault (Primary)
- All sensitive configuration (database passwords, Authentik client secrets, agent API token signing keys, webhook HMAC secrets, MinIO credentials) stored in Vault.
- FastAPI reads secrets from Vault at startup via the `hvac` Python client.
- Secret rotation supported without application restart (Vault dynamic secrets for Postgres credentials).
### Docker Secrets (Fallback)
For simpler deployments that don't want Vault overhead, Docker secrets via compose files are supported. Environment variables as the last resort.
## Observability
### Metrics (Prometheus + Grafana)
- **FastAPI:** `prometheus-fastapi-instrumentator` exposes request latency, status codes, in-flight requests at `/metrics`.
- **Neo4j:** Neo4j Prometheus plugin or `neo4j-exporter` for query latency, cache hit rates, transaction counts.
- **Postgres:** `postgres_exporter` for connection pool, query stats, replication lag.
- **Redis:** `redis_exporter` for memory, hit rate, connected clients.
- **Centrifugo:** Built-in Prometheus metrics for connections, channels, messages.
- **Grafana dashboards:** Pre-built dashboards for each service. Alerting rules for error rate spikes, high latency, container restarts.
### Tracing (OpenTelemetry + Tempo)
- OpenTelemetry SDK instrumented in FastAPI. Traces span the full request lifecycle: auth → policy check → Neo4j query → Postgres query → response.
- Trace context propagated to Taskiq workers (webhook delivery, indexing).
- Traces stored in Grafana Tempo, queryable from Grafana.
### Logging (Structured JSON + Loki)
- All services emit structured JSON logs (Python `structlog` for FastAPI).
- Fields: timestamp, level, correlation_id, actor_id, action, duration_ms.
- Collected by Grafana Loki via Docker logging driver or Promtail.
- Correlation ID links logs across FastAPI → Taskiq → Centrifugo for a single request.
### Health Checks
Every service exposes a health check endpoint used by Docker Compose `healthcheck` directives:
- `GET /health` on FastAPI, Centrifugo
- TCP checks for Neo4j, Postgres, Redis, Meilisearch, MinIO
- Grafana alerts on health check failures.
## Database Migrations
### Postgres (Alembic)
- Alembic manages all Postgres schema migrations.
- Migration files stored in `alembic/versions/`.
- Auto-generated from SQLModel model changes (`alembic revision --autogenerate`).
- Applied on deployment: `alembic upgrade head` runs before the API container starts.
### Neo4j (Versioned Cypher Scripts)
- Migration scripts stored in `neo4j/migrations/` as numbered Cypher files (`001_initial_schema.cypher`, `002_add_cycle_nodes.cypher`).
- A lightweight migration runner (Python script) tracks applied migrations in a Neo4j `:Migration` node.
- Applied on deployment before the API container starts.
## Testing Strategy
### Integration Tests (Primary)
- **Framework:** pytest with testcontainers.
- **Containers:** Neo4j, Postgres, Redis, Meilisearch spun up per test session (shared across tests for speed, reset between test classes).
- **Scope:** API endpoint tests hitting real databases. Policy engine tests with real Neo4j graph structures. Dual-DB consistency tests verifying write-order semantics.
- **Fixtures:** Factory functions that create graph structures (components, issues, links) for test scenarios.
### End-to-End Tests
- **Framework:** Playwright against the full Docker Compose stack.
- **Scope:** Critical user flows — create project, add components, navigate graph, triage inbox, agent API workflows.
- **Environment:** Dedicated `docker-compose.test.yml` with ephemeral containers.
### What's Not Mandated
Isolated unit tests are not required by convention. The dual-DB architecture makes mocking both databases brittle. Integration tests with real containers are the priority.
## CI/CD Pipeline
```
push/MR → lint → test → build → deploy
```
| Stage | Tools | Description |
|-------|-------|-------------|
| **Lint** | ruff (Python), eslint + prettier (Vue/TS) | Code style and static analysis |
| **Test** | pytest + testcontainers, Playwright | Integration + E2E tests |
| **Build** | Docker | Build API, frontend, worker images |
| **Push** | Container registry | Push tagged images to GitLab Container Registry |
| **Deploy** | SSH + docker compose pull | Pull new images on production server, rolling restart |
CI runs on GitLab CI. Pipeline definition in `.gitlab-ci.yml`. Testcontainers require Docker-in-Docker or a privileged runner.
## Open Technical Questions
1. **Graph viz library:** D3 vs Cytoscape — prototype comparison pending
2. **Neo4j driver:** official `neo4j` Python driver vs `neomodel` OGM
3. **Gantt implementation:** custom or frappe-gantt as starting point