PostgreSQL Table Design

Core Rules

Define a PRIMARY KEY for reference tables (users, orders, etc.). Not always needed for time-series/event/log data. When used, prefer BIGINT GENERATED ALWAYS AS IDENTITY; use UUID only when global uniqueness/opacity is needed.
Normalize first (to 3NF) to eliminate data redundancy and update anomalies; denormalize only for measured, high-ROI reads where join performance is proven problematic. Premature denormalization creates maintenance burden.
Add NOT NULL everywhere it’s semantically required; use DEFAULTs for common values.
Create indexes for access paths you actually query: PK/unique (auto), FK columns (manual!), frequent filters/sorts, and join keys.
Prefer TIMESTAMPTZ for event time; NUMERIC for money; TEXT for strings; BIGINT for integer values, DOUBLE PRECISION for floats (or NUMERIC for exact decimal arithmetic).

PostgreSQL “Gotchas”

Identifiers: unquoted → lowercased. Avoid quoted/mixed-case names. Convention: use snake_case for table/column names.
Unique + NULLs: UNIQUE allows multiple NULLs. Use UNIQUE (...) NULLS NOT DISTINCT (PG15+) to restrict to one NULL.
FK indexes: PostgreSQL does not auto-index FK columns. Add them.
No silent coercions: length/precision overflows error out (no truncation). Example: inserting 999 into NUMERIC(2,0) fails with error, unlike some databases that silently truncate or round.
Sequences/identity have gaps (normal; don't "fix"). Rollbacks, crashes, and concurrent transactions create gaps in ID sequences (1, 2, 5, 6...). This is expected behavior—don't try to make IDs consecutive.
Heap storage: no clustered PK by default (unlike SQL Server/MySQL InnoDB); CLUSTER is one-off reorganization, not maintained on subsequent inserts. Row order on disk is insertion order unless explicitly clustered.
MVCC: updates/deletes leave dead tuples; vacuum handles them—design to avoid hot wide-row churn.

Data Types

IDs: BIGINT GENERATED ALWAYS AS IDENTITY preferred (GENERATED BY DEFAULT also fine); UUID when merging/federating/used in a distributed system or for opaque IDs. Generate with uuidv7() (preferred if using PG18+) or gen_random_uuid() (if using an older PG version).
Integers: prefer BIGINT unless storage space is critical; INTEGER for smaller ranges; avoid SMALLINT unless constrained.
Floats: prefer DOUBLE PRECISION over REAL unless storage space is critical. Use NUMERIC for exact decimal arithmetic.
Strings: prefer TEXT; if length limits needed, use CHECK (LENGTH(col) <= n) instead of VARCHAR(n); avoid CHAR(n). Use BYTEA for binary data. Large strings/binary (>2KB default threshold) automatically stored in TOAST with compression. TOAST storage: PLAIN (no TOAST), EXTENDED (compress + out-of-line), EXTERNAL (out-of-line, no compress), MAIN (compress, keep in-line if possible). Default EXTENDED usually optimal. Control with ALTER TABLE tbl ALTER COLUMN col SET STORAGE strategy and ALTER TABLE tbl SET (toast_tuple_target = 4096) for threshold. Case-insensitive: for locale/accent handling use non-deterministic collations; for plain ASCII use expression indexes on LOWER(col) (preferred unless column needs case-insensitive PK/FK/UNIQUE) or CITEXT.
Money: NUMERIC(p,s) (never float).
Time: TIMESTAMPTZ for timestamps; DATE for date-only; INTERVAL for durations. Avoid TIMESTAMP (without timezone). Use now() for transaction start time, clock_timestamp() for current wall-clock time.
Booleans: BOOLEAN with NOT NULL constraint unless tri-state values are required.
Enums: CREATE TYPE ... AS ENUM for small, stable sets (e.g. US states, days of week). For business-logic-driven and evolving values (e.g. order statuses) → use TEXT (or INT) + CHECK or lookup table.
Arrays: TEXT[], INTEGER[], etc. Use for ordered lists where you query elements. Index with GIN for containment (@>, <@) and overlap (&&) queries. Access: arr[1] (1-indexed), arr[1:3] (slicing). Good for tags, categories; avoid for relations—use junction tables instead. Literal syntax: '{val1,val2}' or ARRAY[val1,val2].
Range types: daterange, numrange, tstzrange for intervals. Support overlap (&&), containment (@>), operators. Index with GiST. Good for scheduling, versioning, numeric ranges. Pick a bounds scheme and use it consistently; prefer [) (inclusive/exclusive) by default.
Network types: INET for IP addresses, CIDR for network ranges, MACADDR for MAC addresses. Support network operators (<<, >>, &&).
Geometric types: avoid POINT, LINE, POLYGON, CIRCLE. Index with GiST. Consider PostGIS for spatial features.
Text search: TSVECTOR for full-text search documents, TSQUERY for search queries. Index tsvector with GIN. Always specify language: `to_t

...

design-postgres-tables

SKILL.md

PostgreSQL Table Design

Core Rules

PostgreSQL “Gotchas”

Data Types

Repository

Repository Stats