Skip to content

Data Generation

seedcli uses intelligent heuristics to generate contextually appropriate fake data based on column names and types.

The Data Engine analyzes each column and applies generation strategies in this order:

  1. Custom Generators - User-registered generators (highest priority)
  2. Column Name Heuristics - Based on naming patterns (email, phone, etc.)
  3. Type-Based Generation - Based on column data type
  4. Default Fallback - Random strings or numbers

seedcli automatically detects common column naming patterns:

Column Name ContainsGenerated Data
emailjohn.doe@example.com
first_name, firstnameJohn
last_name, lastnameDoe
nameJohn Doe
username, userjohndoe42
phone+1-555-123-4567
address123 Main St, Anytown, ST 12345
cityNew York
stateCalifornia
countryUnited States
zip, postal12345
url, websitehttps://example.com
slugmy-awesome-post
titleSenior Software Engineer
companyAcme Corporation
description, bioLorem ipsum dolor sit amet...
uuid, guid550e8400-e29b-41d4-a716-446655440000
price, amount99.99
age32
rating, score4 (1-5 range)
created_at, updated_at2025-03-15 10:30:00

When name heuristics don’t apply, seedcli generates based on column type:

Column TypeGenerated Data
BOOLEANtrue or false
INTEGER, SERIALRandom integer
FLOAT, DOUBLE, NUMERICRandom decimal
VARCHAR, TEXT, CHARRandom sentence
DATERandom date within 5 years
TIMESTAMP, DATETIMERandom datetime
TIME14:30:00
UUIDValid UUID v4
JSON, JSONB{"key": "value"}
BYTEA, BLOBRandom bytes
ARRAYArray of appropriate type

For ENUM columns, seedcli automatically selects from allowed values:

CREATE TABLE users (
status user_status -- ENUM('active', 'inactive', 'pending')
);

Generated values will be one of: active, inactive, or pending.

seedcli tracks generated values to ensure uniqueness:

CREATE TABLE users (
email VARCHAR(255) UNIQUE
);

Each generated email will be unique within the seeding session.

Non-nullable columns always receive values:

CREATE TABLE users (
email VARCHAR(255) NOT NULL -- Always gets a value
);

Nullable columns have a configurable probability of being NULL (default 30%):

data_generation:
null_probability: 0.3 # 30% chance of NULL

seedcli automatically handles foreign key relationships:

  1. Tables are sorted by dependencies (topological sort)
  2. Parent tables are seeded first
  3. Child tables reference actual inserted IDs
CREATE TABLE orders (
user_id INTEGER REFERENCES users(id) -- Uses real user IDs
);

Override generation for specific columns in seedcli.yaml:

tables:
users:
columns:
# Use specific values
role:
values:
- admin
- user
- guest
# Set numeric range
age:
min: 18
max: 65
# Use pattern
employee_id:
pattern: "EMP-{A-Z}{2}-{0-9}{4}"
unique: true
# Force specific generator
avatar_url:
generator: url

Use patterns for formatted strings:

PatternExample Output
{A-Z}Single uppercase letter
{a-z}Single lowercase letter
{0-9}Single digit
{A-Z}{3}Three uppercase letters
SKU-{A-Z}{2}-{0-9}{4}SKU-AB-1234
{a-z}{5}@example.comabcde@example.com

Use the --seed flag for deterministic generation:

Terminal window
seedcli seed --all --seed 42

Running with the same seed produces identical data every time.