non-linear-docs/10-ALPHA-DB-SCHEMA.md

20 KiB

Non-Linear: Alpha Database Schema (Postgres-Only)

Design Decisions

Why Postgres-Only

Alpha ships without Neo4j (~8 containers). The graph in alpha is small (<500 nodes per project) with limited lateral link types (blocks, relates_to). Postgres handles this comfortably via:

  • Adjacency list (parent_id) for the decomposition tree
  • ltree materialized path for fast subtree queries without recursive CTEs
  • A lateral_links table for typed edges between nodes

The dual-DB architecture (Neo4j for topology, Postgres for content) remains the long-term direction for Layer 3 code connections and cross-project edges. This schema is designed to migrate cleanly: the nodes table splits into Neo4j (topology + lightweight props) and Postgres (content) by extracting description, description_html into a node_content table keyed by the same UUID.

Unified Nodes Table

Components and issues share a single nodes table with a node_type discriminator. Rationale:

  • The decomposition tree mixes both types (a component's child can be an issue, an issue's child can be a sub-issue)
  • Tree queries (parent_id, path) operate uniformly across types
  • Issue-specific columns (status, assignee_id) are nullable and ignored for components
  • Avoids polymorphic joins for tree traversal

ltree for Subtree Queries

Each node stores a materialized path column of type ltree. Example: root.comp_abc.issue_def. This enables:

  • SELECT * FROM nodes WHERE path <@ 'root.comp_abc' — all descendants of a component
  • SELECT * FROM nodes WHERE path @> 'root.comp_abc.issue_def' — all ancestors
  • Index-backed, no recursion needed

The path uses node short-integer IDs (the seq value) as segments for compactness. Updated on reparent via a single UPDATE ... SET path = new_prefix || subpath(path, nlevel(old_prefix)) for the subtree.

Labels as Array

Labels are stored as text[] with a GIN index. For alpha's freeform tags this is simpler than a join table and supports queries like WHERE labels @> ARRAY['bug', 'p0']. A normalized labels table can be introduced post-alpha if label management (rename, merge, color) becomes necessary.

UUIDv7

All primary keys use UUIDv7 (time-sortable, generated application-side). Benefits:

  • Natural chronological ordering without a separate created_at sort
  • Safe for distributed ID generation (no coordination needed)
  • Same ID used across systems (future Neo4j migration, Centrifugo channels, webhook payloads)

Short IDs

Human-readable IDs like NL-42 are generated per-project using an atomic counter (next_short_id on the projects table). The prefix is configurable per project. Short IDs are unique within a project and immutable once assigned.


Extensions

CREATE EXTENSION IF NOT EXISTS ltree;
CREATE EXTENSION IF NOT EXISTS pgcrypto;  -- gen_random_uuid() fallback if app doesn't supply UUIDv7

Enums

CREATE TYPE node_type AS ENUM ('component', 'issue');

CREATE TYPE node_status AS ENUM (
    'backlog',
    'todo',
    'in_progress',
    'in_review',
    'done',
    'cancelled'
);

CREATE TYPE link_type AS ENUM ('blocks', 'relates_to');

CREATE TYPE actor_type AS ENUM ('user', 'agent');

CREATE TYPE workspace_role AS ENUM ('owner', 'member');

CREATE TYPE project_role AS ENUM ('owner', 'member', 'agent');

CREATE TYPE repo_provider AS ENUM ('github', 'gitlab');

CREATE TYPE audit_action AS ENUM (
    'node_created',
    'node_updated',
    'node_deleted',
    'node_reparented',
    'status_changed',
    'assignee_changed',
    'labels_changed',
    'link_created',
    'link_deleted',
    'comment_created',
    'comment_updated',
    'comment_deleted',
    'member_added',
    'member_removed',
    'member_role_changed',
    'repo_connected',
    'repo_disconnected',
    'webhook_created',
    'webhook_deleted'
);

Tables

workspaces

CREATE TABLE workspaces (
    id          UUID PRIMARY KEY,
    name        TEXT NOT NULL,
    slug        TEXT NOT NULL UNIQUE,
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE UNIQUE INDEX idx_workspaces_slug ON workspaces (slug);

actors

CREATE TABLE actors (
    id              UUID PRIMARY KEY,
    actor_type      actor_type NOT NULL,
    display_name    TEXT NOT NULL,
    email           TEXT,
    avatar_url      TEXT,
    authentik_uid   TEXT UNIQUE,       -- OIDC subject claim (users)
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_actors_email ON actors (email) WHERE email IS NOT NULL;
CREATE INDEX idx_actors_authentik ON actors (authentik_uid) WHERE authentik_uid IS NOT NULL;

workspace_members

CREATE TABLE workspace_members (
    workspace_id    UUID NOT NULL REFERENCES workspaces(id) ON DELETE CASCADE,
    actor_id        UUID NOT NULL REFERENCES actors(id) ON DELETE CASCADE,
    role            workspace_role NOT NULL DEFAULT 'member',
    joined_at       TIMESTAMPTZ NOT NULL DEFAULT now(),

    PRIMARY KEY (workspace_id, actor_id)
);

CREATE INDEX idx_wm_actor ON workspace_members (actor_id);

projects

CREATE TABLE projects (
    id              UUID PRIMARY KEY,
    workspace_id    UUID NOT NULL REFERENCES workspaces(id) ON DELETE CASCADE,
    name            TEXT NOT NULL,
    slug            TEXT NOT NULL,
    short_id_prefix TEXT NOT NULL DEFAULT 'NL',  -- e.g. "NL" → NL-1, NL-2
    next_short_id   INTEGER NOT NULL DEFAULT 1,  -- atomically incremented
    root_node_id    UUID,                        -- set after root node creation
    settings        JSONB NOT NULL DEFAULT '{}', -- custom statuses, defaults (post-alpha)
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now(),

    UNIQUE (workspace_id, slug)
);

CREATE INDEX idx_projects_workspace ON projects (workspace_id);

project_members

CREATE TABLE project_members (
    project_id  UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    actor_id    UUID NOT NULL REFERENCES actors(id) ON DELETE CASCADE,
    role        project_role NOT NULL DEFAULT 'member',
    joined_at   TIMESTAMPTZ NOT NULL DEFAULT now(),

    PRIMARY KEY (project_id, actor_id)
);

CREATE INDEX idx_pm_actor ON project_members (actor_id);

nodes

The core table. Stores both components and issues in a single table with the decomposition tree structure.

CREATE TABLE nodes (
    id              UUID PRIMARY KEY,
    project_id      UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    node_type       node_type NOT NULL,
    short_id        INTEGER NOT NULL,            -- numeric part: 42 in "NL-42"
    title           TEXT NOT NULL,
    description     TEXT,                        -- markdown source
    description_html TEXT,                       -- pre-rendered, sanitized HTML

    -- Tree structure
    parent_id       UUID REFERENCES nodes(id) ON DELETE SET NULL,
    path            ltree NOT NULL,              -- materialized path for subtree queries

    -- Issue-specific (NULL for components)
    status          node_status,
    assignee_id     UUID REFERENCES actors(id) ON DELETE SET NULL,
    created_by      UUID NOT NULL REFERENCES actors(id),

    -- Shared
    labels          TEXT[] NOT NULL DEFAULT '{}',

    -- Timestamps
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at      TIMESTAMPTZ NOT NULL DEFAULT now(),

    -- Constraints
    UNIQUE (project_id, short_id),
    CONSTRAINT chk_issue_has_status CHECK (
        (node_type = 'issue' AND status IS NOT NULL)
        OR node_type = 'component'
    ),
    CONSTRAINT chk_component_no_status CHECK (
        (node_type = 'component' AND status IS NULL)
        OR node_type = 'issue'
    )
);

-- Tree traversal
CREATE INDEX idx_nodes_parent ON nodes (parent_id) WHERE parent_id IS NOT NULL;
CREATE INDEX idx_nodes_path ON nodes USING GIST (path);

-- Filtering
CREATE INDEX idx_nodes_project ON nodes (project_id);
CREATE INDEX idx_nodes_status ON nodes (project_id, status) WHERE status IS NOT NULL;
CREATE INDEX idx_nodes_assignee ON nodes (assignee_id) WHERE assignee_id IS NOT NULL;
CREATE INDEX idx_nodes_labels ON nodes USING GIN (labels);
CREATE INDEX idx_nodes_type ON nodes (project_id, node_type);

-- Full-text search
ALTER TABLE nodes ADD COLUMN search_vector tsvector
    GENERATED ALWAYS AS (
        setweight(to_tsvector('english', coalesce(title, '')), 'A') ||
        setweight(to_tsvector('english', coalesce(description, '')), 'B')
    ) STORED;

CREATE INDEX idx_nodes_fts ON nodes USING GIN (search_vector);

-- Short ID lookup
CREATE INDEX idx_nodes_short_id ON nodes (project_id, short_id);

Typed edges between nodes (not part of the decomposition tree).

CREATE TABLE lateral_links (
    id          UUID PRIMARY KEY,
    project_id  UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    source_id   UUID NOT NULL REFERENCES nodes(id) ON DELETE CASCADE,
    target_id   UUID NOT NULL REFERENCES nodes(id) ON DELETE CASCADE,
    link_type   link_type NOT NULL,
    created_by  UUID NOT NULL REFERENCES actors(id),
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),

    -- No self-links, no duplicate links in same direction
    CONSTRAINT chk_no_self_link CHECK (source_id != target_id),
    UNIQUE (source_id, target_id, link_type)
);

CREATE INDEX idx_links_source ON lateral_links (source_id);
CREATE INDEX idx_links_target ON lateral_links (target_id);
CREATE INDEX idx_links_project ON lateral_links (project_id);

Semantics:

  • blocks: directed — source blocks target. Query "what blocks issue X" = WHERE target_id = X AND link_type = 'blocks'.
  • relates_to: undirected — stored once (lower UUID as source by convention). Query both directions.

comments

Flat comment stream per node.

CREATE TABLE comments (
    id          UUID PRIMARY KEY,
    node_id     UUID NOT NULL REFERENCES nodes(id) ON DELETE CASCADE,
    author_id   UUID NOT NULL REFERENCES actors(id),
    body        TEXT NOT NULL,                -- markdown source
    body_html   TEXT NOT NULL,               -- pre-rendered, sanitized HTML
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now(),
    updated_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_comments_node ON comments (node_id, created_at);
CREATE INDEX idx_comments_author ON comments (author_id);

-- Full-text search on comments
ALTER TABLE comments ADD COLUMN search_vector tsvector
    GENERATED ALWAYS AS (
        to_tsvector('english', coalesce(body, ''))
    ) STORED;

CREATE INDEX idx_comments_fts ON comments USING GIN (search_vector);

audit_events

Append-only change history. Every mutation is recorded.

CREATE TABLE audit_events (
    id          UUID PRIMARY KEY,
    project_id  UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    actor_id    UUID NOT NULL REFERENCES actors(id),
    action      audit_action NOT NULL,
    node_id     UUID,                        -- NULL for non-node events (member changes, etc.)
    before_data JSONB,                       -- snapshot of changed fields before mutation
    after_data  JSONB,                       -- snapshot of changed fields after mutation
    metadata    JSONB,                       -- additional context (e.g. commit SHA for linked changes)
    created_at  TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_audit_project_time ON audit_events (project_id, created_at DESC);
CREATE INDEX idx_audit_node ON audit_events (node_id, created_at DESC) WHERE node_id IS NOT NULL;
CREATE INDEX idx_audit_actor ON audit_events (actor_id, created_at DESC);

repo_connections

Repositories linked to a project. Components reference these via repo_connection_id.

CREATE TABLE repo_connections (
    id                  UUID PRIMARY KEY,
    project_id          UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    provider            repo_provider NOT NULL,
    repo_url            TEXT NOT NULL,
    default_branch      TEXT NOT NULL DEFAULT 'main',
    access_token_enc    TEXT,                -- encrypted OAuth token
    webhook_secret_hash TEXT,                -- hashed webhook secret for incoming pushes
    connected_by        UUID NOT NULL REFERENCES actors(id),
    connected_at        TIMESTAMPTZ NOT NULL DEFAULT now(),

    UNIQUE (project_id, repo_url)
);

CREATE INDEX idx_repos_project ON repo_connections (project_id);

Maps components to specific paths within connected repositories.

CREATE TABLE node_repo_links (
    node_id             UUID PRIMARY KEY REFERENCES nodes(id) ON DELETE CASCADE,
    repo_connection_id  UUID NOT NULL REFERENCES repo_connections(id) ON DELETE CASCADE,
    path                TEXT,                -- subdirectory within repo (NULL = repo root)
    branch              TEXT                 -- branch override (NULL = repo default)
);

CREATE INDEX idx_nrl_repo ON node_repo_links (repo_connection_id);

api_tokens

Bearer tokens for agent access.

CREATE TABLE api_tokens (
    id              UUID PRIMARY KEY,
    actor_id        UUID NOT NULL REFERENCES actors(id) ON DELETE CASCADE,
    token_hash      TEXT NOT NULL UNIQUE,    -- SHA-256 hash of the token (never store plaintext)
    name            TEXT NOT NULL,           -- human-readable label ("triage-agent-prod")
    last_used_at    TIMESTAMPTZ,
    expires_at      TIMESTAMPTZ,
    created_at      TIMESTAMPTZ NOT NULL DEFAULT now(),

    CONSTRAINT chk_expiry CHECK (expires_at IS NULL OR expires_at > created_at)
);

CREATE INDEX idx_tokens_actor ON api_tokens (actor_id);
CREATE INDEX idx_tokens_hash ON api_tokens (token_hash);

webhook_configs

Minimal webhook registration per project.

CREATE TABLE webhook_configs (
    id                      UUID PRIMARY KEY,
    project_id              UUID NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
    url                     TEXT NOT NULL,
    events                  TEXT[] NOT NULL DEFAULT '{}',  -- e.g. {'node.status_changed', 'comment.added'}
    active                  BOOLEAN NOT NULL DEFAULT true,
    consecutive_failures    INTEGER NOT NULL DEFAULT 0,
    last_delivery_at        TIMESTAMPTZ,
    created_by              UUID NOT NULL REFERENCES actors(id),
    created_at              TIMESTAMPTZ NOT NULL DEFAULT now()
);

CREATE INDEX idx_webhooks_project ON webhook_configs (project_id);

ltree Path Maintenance

Path Format

Each node's path is composed of segments representing the chain of short_id values from root to node, prefixed with the project's root identifier:

root.12.45.78
 │    │   │  └── this node (short_id=78)
 │    │   └── parent (short_id=45)
 │    └── grandparent (short_id=12)
 └── project root sentinel

Using integer short IDs as segments keeps paths compact and unique within a project.

On Node Creation

-- Pseudocode (application layer):
-- 1. Atomically claim a short_id:
UPDATE projects SET next_short_id = next_short_id + 1
WHERE id = :project_id
RETURNING next_short_id - 1 AS new_short_id;

-- 2. Compute path from parent:
-- If parent_id IS NULL (root node): path = 'root'
-- Otherwise: path = parent.path || '.' || new_short_id::text
INSERT INTO nodes (id, project_id, node_type, short_id, title, parent_id, path, ...)
VALUES (:id, :project_id, :type, :new_short_id, :title, :parent_id, :computed_path, ...);

On Reparent

When a node moves to a new parent, update the entire subtree's paths in one statement:

-- :old_path = current node's path (e.g. 'root.12.45')
-- :new_parent_path = new parent's path (e.g. 'root.99')
-- :node_short_id = the moved node's short_id segment

UPDATE nodes
SET
    path = :new_parent_path || '.' || :node_short_id::text
               || subpath(path, nlevel(:old_path)),
    parent_id = CASE WHEN id = :node_id THEN :new_parent_id ELSE parent_id END,
    updated_at = now()
WHERE path <@ :old_path;

This updates the moved node and all its descendants in a single indexed operation.


Short ID Generation

Short IDs are assigned atomically using UPDATE ... RETURNING:

-- Claim next short_id for a project (called from application layer)
UPDATE projects
SET next_short_id = next_short_id + 1
WHERE id = :project_id
RETURNING next_short_id - 1 AS short_id;

The full human-readable ID is {project.short_id_prefix}-{short_id}, e.g. NL-42. This is computed at read time, not stored as a string — only the integer is persisted on the node.


Search is powered by generated tsvector columns with GIN indexes on both nodes and comments.

Query Pattern

-- Search nodes in a project
SELECT id, short_id, title, node_type,
       ts_rank(search_vector, query) AS rank
FROM nodes, to_tsquery('english', :search_term) query
WHERE project_id = :project_id
  AND search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

-- Search comments in a project (via join)
SELECT c.id, c.node_id, c.body, c.author_id,
       ts_rank(c.search_vector, query) AS rank
FROM comments c
JOIN nodes n ON n.id = c.node_id, to_tsquery('english', :search_term) query
WHERE n.project_id = :project_id
  AND c.search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

The command palette performs a unified search across nodes (by title/description) with results ranked by relevance. The generated column approach means no trigger maintenance — the search_vector updates automatically on any title or description change.


Inbox Query

Issues without a parent (triage inbox):

SELECT * FROM nodes
WHERE project_id = :project_id
  AND node_type = 'issue'
  AND parent_id IS NULL
ORDER BY created_at DESC;

Note: the project's root node is the only component with parent_id IS NULL. Orphaned issues (inbox items) are distinguished by node_type = 'issue'.


Unblocked Issues Query

Issues that are not blocked by any open issue:

SELECT n.* FROM nodes n
WHERE n.project_id = :project_id
  AND n.node_type = 'issue'
  AND n.status IN ('todo', 'in_progress')
  AND NOT EXISTS (
      SELECT 1 FROM lateral_links ll
      JOIN nodes blocker ON blocker.id = ll.source_id
      WHERE ll.target_id = n.id
        AND ll.link_type = 'blocks'
        AND blocker.status NOT IN ('done', 'cancelled')
  );

Cascade and Deletion Behavior

FK Relationship On Delete
nodes.parent_id → nodes.id SET NULL (orphan to inbox, don't cascade-delete subtrees)
nodes.project_id → projects.id CASCADE (project deletion removes all nodes)
lateral_links → nodes CASCADE (removing a node removes its links)
comments → nodes CASCADE (removing a node removes its comments)
nodes.assignee_id → actors.id SET NULL (deleting an actor unassigns them)
workspace_members → workspaces/actors CASCADE
project_members → projects/actors CASCADE

Constraints Summary

Constraint Purpose
UNIQUE (project_id, short_id) Short IDs unique within project
UNIQUE (workspace_id, slug) on projects Project slugs unique per workspace
UNIQUE (source_id, target_id, link_type) on links No duplicate links
CHECK (source_id != target_id) on links No self-links
CHECK on nodes Issues must have status; components must not
UNIQUE (project_id, repo_url) on repo_connections No duplicate repo links

Migration Notes

  • Alembic manages all migrations. The initial migration creates extensions, enums, and all tables in dependency order.
  • ltree extension must be created by a superuser or a user with CREATE privilege on the database. The Alembic migration should run CREATE EXTENSION IF NOT EXISTS ltree in an op.execute() call.
  • Generated columns (search_vector) require Postgres 12+. Target minimum: Postgres 15.
  • UUIDv7 is generated application-side (Python uuid7 package). Postgres stores it as standard UUID type — no special extension needed.

Future Migration Path (Post-Alpha)

When introducing Neo4j for Layer 3 code connections:

  1. Extract graph topology from nodes → Neo4j nodes (id, short_id, title, status, labels, assignee_id, path)
  2. Move lateral_links → Neo4j relationships
  3. Keep nodes in Postgres but rename to node_content (description, description_html only)
  4. Add artifacts table and Neo4j Artifact label + HAS_ARTIFACT edge
  5. Add cycles table and Neo4j IN_CYCLE relationship

The schema is designed so this split is additive — the UUID primary key is the cross-database join key, and no structural changes are needed to the Postgres tables beyond extracting topology fields.