Skip to main content
Databuddy uses a dual-database architecture to optimize for both high-volume analytics and transactional metadata. Analytics data is stored in ClickHouse, while application metadata lives in PostgreSQL.

Database Architecture

ClickHouse

Analytics & Time-Series Data
  • Events (page views, custom events)
  • Web Vitals metrics
  • Error tracking
  • Revenue transactions
  • AI usage logs
  • Uptime monitoring

PostgreSQL

Application Metadata
  • Users & organizations
  • Websites & domains
  • API keys & authentication
  • Feature flags
  • Goals & funnels
  • Links & annotations

Why Two Databases?

This separation provides several benefits:
  1. Performance: ClickHouse excels at analytical queries on billions of events
  2. Scalability: Analytics data can grow independently from metadata
  3. Cost: ClickHouse compresses data 10-100x, reducing storage costs
  4. Query Speed: Columnar storage makes aggregations lightning-fast
  5. Isolation: Critical user data is separated from high-volume analytics

ClickHouse Schema

ClickHouse stores all time-series and event data across three databases:
  • analytics - Main analytics events and aggregations
  • uptime - Uptime monitoring data
  • observability - AI call spans and traces

Events Table

The main events table stores all page views and interactions:
-- From packages/db/src/clickhouse/schema.ts:9-78
CREATE TABLE analytics.events (
  id UUID,
  client_id String,
  event_name String,
  anonymous_id String,
  time DateTime64(3, 'UTC'),
  session_id String,
  
  event_type LowCardinality(String) DEFAULT 'track',
  event_id Nullable(String),
  session_start_time Nullable(DateTime64(3, 'UTC')),
  timestamp DateTime64(3, 'UTC') DEFAULT time,
  
  -- Page context
  referrer Nullable(String),
  url String,
  path String,
  title Nullable(String),
  
  -- User agent & geo (enriched server-side)
  ip String,
  user_agent String,
  browser_name Nullable(String),
  browser_version Nullable(String),
  os_name Nullable(String),
  os_version Nullable(String),
  device_type Nullable(String),
  device_brand Nullable(String),
  device_model Nullable(String),
  country Nullable(String),
  region Nullable(String),
  city Nullable(String),
  
  -- Browser info
  screen_resolution Nullable(String),
  viewport_size Nullable(String),
  language Nullable(String),
  timezone Nullable(String),
  
  -- Network quality
  connection_type Nullable(String),
  rtt Nullable(Int16),
  downlink Nullable(Float32),
  
  -- Engagement metrics
  time_on_page Nullable(Float32),
  scroll_depth Nullable(Float32),
  interaction_count Nullable(Int16),
  page_count UInt8 DEFAULT 1,
  
  -- UTM parameters
  utm_source Nullable(String),
  utm_medium Nullable(String),
  utm_campaign Nullable(String),
  utm_term Nullable(String),
  utm_content Nullable(String),
  
  -- Performance metrics
  load_time Nullable(Int32),
  dom_ready_time Nullable(Int32),
  dom_interactive Nullable(Int32),
  ttfb Nullable(Int32),
  connection_time Nullable(Int32),
  request_time Nullable(Int32),
  render_time Nullable(Int32),
  redirect_time Nullable(Int32),
  domain_lookup_time Nullable(Int32),
  
  -- Custom properties (JSON string)
  properties String,
  
  created_at DateTime64(3, 'UTC')
) ENGINE = MergeTree()
PARTITION BY toYYYYMM(time)
ORDER BY (client_id, time, id)
SETTINGS index_granularity = 8192;
The events table is partitioned by month and ordered by client_id and time for optimal query performance.

Web Vitals Tables

Web Vitals use a spans-oriented design for efficient storage:
-- From packages/db/src/clickhouse/schema.ts:162-180
CREATE TABLE analytics.web_vitals_spans (
  client_id String CODEC(ZSTD(1)),
  anonymous_id String CODEC(ZSTD(1)),
  session_id String CODEC(ZSTD(1)),
  
  timestamp DateTime64(3, 'UTC') CODEC(Delta(8), ZSTD(1)),
  path String CODEC(ZSTD(1)),
  
  metric_name LowCardinality(String) CODEC(ZSTD(1)),
  metric_value Float64 CODEC(Gorilla, ZSTD(1)),
  
  INDEX idx_session_id session_id TYPE bloom_filter(0.01) GRANULARITY 1,
  INDEX idx_metric_value metric_value TYPE minmax GRANULARITY 1
) ENGINE = MergeTree
PARTITION BY toDate(timestamp)
ORDER BY (client_id, metric_name, path, timestamp)
SETTINGS index_granularity = 8192;
Supported metrics:
  • LCP (Largest Contentful Paint)
  • FCP (First Contentful Paint)
  • CLS (Cumulative Layout Shift)
  • INP (Interaction to Next Paint)
  • TTFB (Time to First Byte)
  • FPS (Frames Per Second)

Hourly Aggregations

Web Vitals are pre-aggregated hourly for fast dashboard queries:
-- From packages/db/src/clickhouse/schema.ts:185-203
CREATE TABLE analytics.web_vitals_hourly (
  client_id String CODEC(ZSTD(1)),
  path String CODEC(ZSTD(1)),
  metric_name LowCardinality(String) CODEC(ZSTD(1)),
  hour DateTime CODEC(Delta(4), ZSTD(1)),
  
  sample_count UInt64 CODEC(ZSTD(1)),
  p75 Float64 CODEC(ZSTD(1)),
  p50 Float64 CODEC(ZSTD(1)),
  avg_value Float64 CODEC(ZSTD(1)),
  min_value Float64 CODEC(ZSTD(1)),
  max_value Float64 CODEC(ZSTD(1))
) ENGINE = SummingMergeTree
PARTITION BY toYYYYMM(hour)
ORDER BY (client_id, metric_name, path, hour)
TTL toDateTime(hour) + INTERVAL 1 YEAR;
A materialized view automatically populates this table:
CREATE MATERIALIZED VIEW analytics.web_vitals_hourly_mv
TO analytics.web_vitals_hourly
AS SELECT
  client_id,
  path,
  metric_name,
  toStartOfHour(timestamp) AS hour,
  count() AS sample_count,
  quantile(0.75)(metric_value) AS p75,
  quantile(0.50)(metric_value) AS p50,
  avg(metric_value) AS avg_value,
  min(metric_value) AS min_value,
  max(metric_value) AS max_value
FROM analytics.web_vitals_spans
GROUP BY client_id, path, metric_name, hour;

Error Tracking Tables

Errors are stored in a lightweight spans table:
-- From packages/db/src/clickhouse/schema.ts:84-107
CREATE TABLE analytics.error_spans (
  client_id String CODEC(ZSTD(1)),
  anonymous_id String CODEC(ZSTD(1)),
  session_id String CODEC(ZSTD(1)),
  
  timestamp DateTime64(3, 'UTC') CODEC(Delta(8), ZSTD(1)),
  path String CODEC(ZSTD(1)),
  
  message String CODEC(ZSTD(1)),
  filename Nullable(String) CODEC(ZSTD(1)),
  lineno Nullable(Int32) CODEC(ZSTD(1)),
  colno Nullable(Int32) CODEC(ZSTD(1)),
  stack Nullable(String) CODEC(ZSTD(1)),
  error_type LowCardinality(String) CODEC(ZSTD(1)),
  
  INDEX idx_session_id session_id TYPE bloom_filter(0.01) GRANULARITY 1,
  INDEX idx_error_type error_type TYPE bloom_filter(0.01) GRANULARITY 1,
  INDEX idx_message message TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 1
) ENGINE = MergeTree
PARTITION BY toDate(timestamp)
ORDER BY (client_id, error_type, path, timestamp);
Similar to Web Vitals, errors are aggregated hourly:
CREATE TABLE analytics.error_hourly (
  client_id String,
  path String,
  error_type LowCardinality(String),
  message_hash UInt64,  -- Hash of error message
  hour DateTime,
  
  error_count UInt64,
  affected_users AggregateFunction(uniq, String),
  affected_sessions AggregateFunction(uniq, String),
  sample_message String
) ENGINE = AggregatingMergeTree
PARTITION BY toYYYYMM(hour)
ORDER BY (client_id, error_type, path, hour, message_hash);

Custom Events Table

Custom events tracked via the SDK are stored separately:
-- From packages/db/src/clickhouse/schema.ts:355-380
CREATE TABLE analytics.custom_events (
  owner_id String CODEC(ZSTD(1)),          -- Org ID from API key
  website_id Nullable(String) CODEC(ZSTD(1)),
  
  timestamp DateTime64(3, 'UTC') CODEC(Delta(8), ZSTD(1)),
  
  event_name LowCardinality(String) CODEC(ZSTD(1)),
  namespace LowCardinality(Nullable(String)) CODEC(ZSTD(1)),
  path Nullable(String) CODEC(ZSTD(1)),
  properties String CODEC(ZSTD(1)),
  
  anonymous_id Nullable(String) CODEC(ZSTD(1)),
  session_id Nullable(String) CODEC(ZSTD(1)),
  
  source LowCardinality(Nullable(String)) CODEC(ZSTD(1)),
  
  INDEX idx_event_name event_name TYPE bloom_filter(0.01) GRANULARITY 1,
  INDEX idx_namespace namespace TYPE bloom_filter(0.01) GRANULARITY 1,
  INDEX idx_website_id website_id TYPE bloom_filter(0.01) GRANULARITY 1
) ENGINE = MergeTree
PARTITION BY toDate(timestamp)
ORDER BY (owner_id, event_name, timestamp);
Custom events are scoped to the organization (owner_id) rather than individual websites, allowing cross-site analytics.

Revenue Tracking

Revenue transactions from Stripe/Paddle are stored for attribution:
-- From packages/db/src/clickhouse/schema.ts:492-529
CREATE TABLE analytics.revenue (
  owner_id String,
  website_id Nullable(String),
  
  transaction_id String,
  provider LowCardinality(String),  -- stripe, paddle
  type LowCardinality(String),      -- sale, refund, subscription
  status LowCardinality(String),
  
  amount Decimal64(4),
  original_amount Decimal64(4),
  original_currency LowCardinality(String),
  currency LowCardinality(String),
  
  anonymous_id Nullable(String),
  session_id Nullable(String),
  customer_id Nullable(String),
  
  product_id Nullable(String),
  product_name Nullable(String),
  
  metadata String DEFAULT '{}',
  
  created DateTime('UTC'),
  synced_at DateTime('UTC'),
  
  INDEX idx_anonymous_id anonymous_id TYPE bloom_filter(0.01) GRANULARITY 1
) ENGINE = ReplacingMergeTree(synced_at)
PARTITION BY toYYYYMM(created)
ORDER BY (owner_id, transaction_id);
The ReplacingMergeTree engine handles updates (e.g., refunds) by replacing rows with matching transaction_id.

AI Call Spans

For AI observability, Databuddy tracks LLM API calls:
-- From packages/db/src/clickhouse/schema.ts:410-453
CREATE TABLE observability.ai_call_spans (
  owner_id String,
  
  timestamp DateTime64(3, 'UTC'),
  
  type LowCardinality(String),       -- generate, stream
  model String,
  provider LowCardinality(String),   -- openai, anthropic, etc.
  finish_reason LowCardinality(Nullable(String)),
  
  input_tokens UInt32,
  output_tokens UInt32,
  total_tokens UInt32,
  cached_input_tokens Nullable(UInt32),
  cache_creation_input_tokens Nullable(UInt32),
  reasoning_tokens Nullable(UInt32),
  web_search_count Nullable(UInt16),
  
  input_token_cost_usd Nullable(Float64),
  output_token_cost_usd Nullable(Float64),
  total_token_cost_usd Nullable(Float64),
  
  tool_call_count UInt16,
  tool_result_count UInt16,
  tool_call_names Array(String),
  
  duration_ms UInt32,
  trace_id Nullable(String),
  http_status Nullable(UInt16),
  
  error_name LowCardinality(Nullable(String)),
  error_message Nullable(String),
  error_stack Nullable(String)
) ENGINE = MergeTree
PARTITION BY toDate(timestamp)
ORDER BY (owner_id, provider, model, timestamp);

PostgreSQL Schema

PostgreSQL stores application metadata using Drizzle ORM. Key tables include:

Users & Organizations

// From packages/db/src/drizzle/schema.ts:338-356
export const user = pgTable("user", {
  id: text().primaryKey().notNull(),
  name: text().notNull(),
  email: text().notNull(),
  emailVerified: boolean("email_verified").notNull(),
  image: text(),
  firstName: text(),
  lastName: text(),
  status: userStatus().default("ACTIVE").notNull(),
  createdAt: timestamp("created_at").notNull(),
  updatedAt: timestamp("updated_at").notNull(),
  deletedAt: timestamp({ precision: 3, mode: "string" }),
  role: role().default("USER").notNull(),
  twoFactorEnabled: boolean("two_factor_enabled"),
});

export const organization = pgTable("organization", {
  id: text().primaryKey().notNull(),
  name: text().notNull(),
  slug: text(),
  logo: text(),
  createdAt: timestamp("created_at").notNull(),
  metadata: text(),
});

Websites

// From packages/db/src/drizzle/schema.ts:310-336
export const websites = pgTable("websites", {
  id: text().primaryKey().notNull(),
  domain: text().notNull(),
  name: text(),
  status: websiteStatus().default("ACTIVE").notNull(),
  isPublic: boolean().default(false).notNull(),
  createdAt: timestamp({ precision: 3 }).defaultNow().notNull(),
  updatedAt: timestamp({ precision: 3 }).defaultNow().notNull(),
  deletedAt: timestamp({ precision: 3 }),
  organizationId: text("organization_id").notNull(),
  integrations: jsonb(),
  settings: jsonb(),
});

API Keys

// From packages/db/src/drizzle/schema.ts:493-531
export const apikey = pgTable("apikey", {
  id: text().primaryKey().notNull(),
  name: text().notNull(),
  prefix: text().notNull(),
  start: text().notNull(),
  keyHash: text("key_hash").notNull(),
  userId: text("user_id"),
  organizationId: text("organization_id"),
  type: apiKeyType("type").notNull().default("user"),
  scopes: apiScope("scopes").array().notNull().default([]),
  enabled: boolean("enabled").notNull().default(true),
  revokedAt: timestamp("revoked_at"),
  rateLimitEnabled: boolean("rate_limit_enabled").notNull().default(true),
  rateLimitTimeWindow: integer("rate_limit_time_window"),
  rateLimitMax: integer("rate_limit_max"),
  expiresAt: timestamp("expires_at", { mode: "string" }),
  metadata: jsonb("metadata").default({}),
  createdAt: timestamp("created_at").notNull().defaultNow(),
  updatedAt: timestamp("updated_at").notNull().defaultNow(),
});

Feature Flags

// From packages/db/src/drizzle/schema.ts:649-675
export const flags = pgTable("flags", {
  id: text().primaryKey().notNull(),
  key: text().notNull(),
  name: text(),
  description: text(),
  type: flagType().default("boolean").notNull(),
  status: flagStatus().default("active").notNull(),
  defaultValue: jsonb("default_value").default(false).notNull(),
  payload: jsonb("payload"),
  rules: jsonb("rules").default([]),
  persistAcrossAuth: boolean("persist_across_auth").default(false).notNull(),
  rolloutPercentage: integer("rollout_percentage").default(0),
  rolloutBy: text("rollout_by"),
  websiteId: text("website_id"),
  organizationId: text("organization_id"),
  userId: text("user_id"),
  createdBy: text("created_by").notNull(),
  variants: jsonb("variants").default([]),
  dependencies: text("dependencies").array(),
  targetGroupIds: text("target_group_ids").array(),
  environment: text("environment"),
  createdAt: timestamp("created_at").defaultNow().notNull(),
  updatedAt: timestamp("updated_at").defaultNow().notNull(),
  deletedAt: timestamp("deleted_at"),
});

Data Flow

1

Event Collection

The tracker SDK collects events in the browser and batches them.
2

Ingestion

Events are sent to the basket API service for validation.
3

Enrichment

Server-side enrichment adds geo data, parses user agents, and extracts device info.
4

ClickHouse Insert

Enriched events are bulk-inserted into ClickHouse tables.
5

Materialized Views

Materialized views automatically create hourly aggregations.
6

Query Layer

The API service queries ClickHouse for analytics and PostgreSQL for metadata.

Compression & Storage

ClickHouse uses aggressive compression:
  • ZSTD(1): General-purpose compression for strings
  • Delta encoding: Efficient for timestamps and sequential numbers
  • Gorilla codec: Optimized for floating-point metrics
  • LowCardinality: Dictionary encoding for enum-like columns
Typical compression ratios:
  • Events: 15-20x compression
  • Web Vitals: 30-40x compression
  • Errors: 10-15x compression
1 billion events typically use 50-100 GB of disk space after compression.

Partitioning Strategy

All ClickHouse tables are partitioned by time:
  • Monthly partitions for events: PARTITION BY toYYYYMM(time)
  • Daily partitions for spans: PARTITION BY toDate(timestamp)
This allows:
  • Efficient TTL deletion of old data
  • Fast queries on recent time ranges
  • Parallel query execution across partitions

Indexing

ClickHouse uses specialized indexes:
The primary sorting key determines data layout on disk:
ORDER BY (client_id, time, id)
Queries filtering by client_id and time are extremely fast.
Probabilistic indexes for string matching:
INDEX idx_session_id session_id TYPE bloom_filter(0.01) GRANULARITY 1
Used for filtering by session_id, event_name, etc.
Store min/max values for range queries:
INDEX idx_metric_value metric_value TYPE minmax GRANULARITY 1
Useful for filtering numeric ranges (e.g., LCP > 2500).
For full-text search in error messages:
INDEX idx_message message TYPE tokenbf_v1(10240, 3, 0) GRANULARITY 1
Enables fast substring searches.

Query Examples

SELECT
  path,
  count(*) as pageviews,
  uniq(anonymous_id) as unique_visitors
FROM analytics.events
WHERE client_id = 'your-client-id'
  AND event_name = 'screen_view'
  AND time >= now() - INTERVAL 7 DAY
GROUP BY path
ORDER BY pageviews DESC
LIMIT 10;

Data Retention

Configure TTL (Time To Live) policies:
-- Automatic deletion after 1 year
ALTER TABLE analytics.events
MODIFY TTL time + INTERVAL 1 YEAR;

-- Daily aggregates retained for 2 years
ALTER TABLE analytics.daily_pageviews
MODIFY TTL date + INTERVAL 2 YEAR;

Best Practices

Use Aggregates

Query hourly/daily aggregates instead of raw spans for date range queries.

Filter Early

Always filter by client_id and time range first:
WHERE client_id = 'xxx' AND time >= '2024-01-01'

Limit Results

Always use LIMIT on large tables:
SELECT * FROM events LIMIT 1000

Materialize Views

Create materialized views for frequently-run aggregations to pre-compute results.

Learn More

Privacy First

Understand data anonymization and privacy features

Event Tracking

Learn how events flow through the system

Sessions & Users

Explore user and session identification