databasesbenchmarkingperformance

Benchmarking ClickHouse at Scale: Reproducing Real-World OLAP Workloads

UUnknown

2026-02-27

9 min read

Practical, reproducible ClickHouse benchmarks for TPC-H, streaming ingestion and high-concurrency queries — scripts and observability tips for 2026.

Hook: Why your ClickHouse evaluation must include reproducible, real-world OLAP benchmarks

Teams evaluating ClickHouse in 2026 face a common, urgent problem: vendors publish headline throughput numbers, but your production workload — a mix of TPC-H-style analytical joins, sustained event ingestion, and hundreds of concurrent interactive dashboards — will behave differently. After ClickHouse's transformative funding and rapid product expansion, the stakes are higher: you need a rigorous, reproducible benchmarking playbook to make buying and architecture decisions you can trust.

Executive summary — what you'll get

Blueprints to reproduce three representative OLAP workloads: TPC-H analytical queries, high-throughput event ingestion, and high-concurrency query workloads.
Ready-to-run scripts (Docker and shell) you can clone and execute in your environment.
Practical observability guidance: the key metrics and Grafana queries to trust results.
Reference points informed by 2025–2026 trends: ClickHouse Keeper adoption, cloud-managed offerings, and tiered object-storage optimizations.

Context in 2026 — why this matters now

ClickHouse's late 2025 funding round accelerated feature rollouts and cloud integrations. Teams migrating from Snowflake or building analytics platforms now run ClickHouse in mixed modes: on-premise clusters for low-latency slices and cloud-native clusters for scale and cost. That combination creates evaluation complexity — you must measure not just raw throughput but how the system behaves with object-storage tiers, Kafka-style ingestion, and multi-tenant concurrency.

Quick fact: ClickHouse raised $400M in late 2025, increasing resources and roadmap velocity — more features means more configuration choices for benchmarking.

High-level approach: make benchmarks reproducible

Reproducibility is the difference between a one-off report and a procurement-grade evaluation. Follow these principles:

Fix versions and seeds: record ClickHouse version, OS, container images, and random seeds for data generators.
Infrastructure as code: publish Docker Compose or Kubernetes manifests so reviewers can recreate topology.
Measure multiple dims: ingestion throughput, query latency (P50/P95/P99), resource saturation (CPU, memory, disk IO), and cluster-level metrics (replication lag, mutations).
Automate collection: export metrics to Prometheus and archive results (JSON + raw logs).

Test 1 — Reproducible TPC-H benchmark (analytics and joins)

Why TPC-H?

TPC-H-style workloads stress complex joins and aggregations at scale and are widely used for comparative analytics testing. They give a predictable query set and data skew typical in business intelligence workloads.

What you need

dbgen (TPC-H generator) to create raw .tbl files
ClickHouse cluster or single-node instance (for initial evaluation)
Scripts to load data and run the standard 22 query set

Reproducible steps (simplified)

Use the following shell script to generate data with dbgen, import to ClickHouse, and run the standard query set with clickhouse-benchmark. Save as run-tpch.sh.

# run-tpch.sh
set -e
# Requirements: docker, clickhouse-client available locally
SCALE=1  # change to 10, 100 for larger runs
OUTDIR=./tpch_data
mkdir -p $OUTDIR
# 1) generate data using tpch dbgen docker image
docker run --rm -v $(pwd)/$OUTDIR:/tpch-data ghcr.io/tpch/dbgen:latest /bin/sh -c 'cd /tpch && ./dbgen -s $SCALE'
# 2) create schema in ClickHouse
clickhouse-client --multiquery 'CREATE DATABASE IF NOT EXISTS tpch;'
# minimal example: create lineitem and orders (full schema in repo)
clickhouse-client --query '
CREATE TABLE IF NOT EXISTS tpch.lineitem (
  l_orderkey Int32,
  l_partkey Int32,
  l_suppkey Int32,
  l_linenumber Int32,
  l_quantity Float32,
  l_extendedprice Float32,
  l_discount Float32,
  l_tax Float32,
  l_returnflag String,
  l_linestatus String,
  l_shipdate Date,
  l_commitdate Date,
  l_receiptdate Date,
  l_shipinstruct String,
  l_shipmode String,
  l_comment String
) ENGINE = MergeTree() ORDER BY l_orderkey;'
# 3) import data (lineitem.tbl uses '|' delimiter)
cat $OUTDIR/lineitem.tbl | clickhouse-client --query 'INSERT INTO tpch.lineitem FORMAT CSVWithNames' --format_csv_delimiter='|'
# 4) run TPC-H query set via clickhouse-benchmark
clickhouse-benchmark --queries-file=tpch-queries.sql --concurrency=4 --iterations=3 > tpch-results.txt
echo 'Finished. Results in tpch-results.txt'

Notes:

Use the canonical TPC-H schema; the sample above illustrates the approach.
Record the ClickHouse server version: clickhouse-client --version.
Increase SCALE for larger datasets; use 10/100 for cluster testing.

Test 2 — Event ingestion throughput (Kafka → ClickHouse)

Why this test?

Many analytics pipelines are based on streaming event ingestion. You must measure sustained rows/sec, peak bursts, and backpressure behavior. In 2026, common patterns use Kafka-compatible brokers (Kafka, Redpanda) and ClickHouse's Kafka table engine or native integrations.

Topology

Producer simulator (generates JSON/Avro events)
Kafka-compatible broker (we recommend Redpanda for fast single-node testing)
ClickHouse table with ENGINE = Kafka and a MaterializedView to push to MergeTree

Reproducible docker-compose snippet

# docker-compose.yml excerpt
version: '3.8'
services:
  redpanda:
    image: vectorized/redpanda:latest
    command: start --overprovisioned --smp 1 --memory 1G --reserve-memory 0M
    ports:
      - '9092:9092'
  clickhouse:
    image: clickhouse/clickhouse-server:latest
    ulimits:
      nofile: 262144:262144
    ports:
      - '9000:9000'
      - '8123:8123'

Schema and ingestion

# create Kafka table and materialized view
clickhouse-client --query "
CREATE TABLE IF NOT EXISTS events_kafka (
  key String,
  value String
) ENGINE = Kafka SETTINGS kafka_broker_list = 'redpanda:9092', kafka_topic_list = 'events', kafka_group_name = 'ch-test', kafka_format = 'JSONEachRow';

CREATE TABLE IF NOT EXISTS events (
  ts DateTime,
  user_id UInt64,
  event_type String,
  payload String
) ENGINE = MergeTree() ORDER BY (user_id, ts);

CREATE MATERIALIZED VIEW events_mv TO events AS
SELECT parseDateTimeBestEffort(JSONExtractString(value, 'ts')) AS ts,
       JSONExtractUInt64(value, 'user_id') AS user_id,
       JSONExtractString(value, 'event_type') AS event_type,
       JSONExtractString(value, 'payload') AS payload
FROM events_kafka;
"

Producer script (Python)

# produce-events.py
from time import sleep, time
from confluent_kafka import Producer
import json

p = Producer({'bootstrap.servers': 'localhost:9092'})
TOPIC = 'events'

def delivery(err, msg):
    if err:
        print('Delivery failed:', err)

n = 100000
start = time()
for i in range(n):
    evt = {
        'ts': int(time()),
        'user_id': i % 100000,
        'event_type': 'click',
        'payload': 'sample'
    }
    p.produce(TOPIC, json.dumps(evt).encode('utf-8'), callback=delivery)
    if i % 1000 == 0:
        p.flush(0)
end = time()
print('Produced', n, 'events in', end-start, 's')

How to measure ingest throughput

Monitor ClickHouse metric KafkaConsumers_total_messages and system.metrics (rows written per second).
Use Prometheus and a Grafana panel plotting rows/sec and write latency.
Vary producer rate to find sustainable throughput and point of backpressure (increased query latency, queue build-up).

Test 3 — High-concurrency query testing (interactive dashboards)

Why concurrency matters

Interactive analytics users expect consistent latencies under concurrency. High concurrency exposes CPU scheduling, network contention, and limits in query parallelism and thread pools.

Concurrency test strategy

Prepare a query set that reflects dashboard queries (mix of aggregations, group-by, and low-selectivity scans).
Use a client-side harness that issues concurrent queries and records per-query latencies (P50/P95/P99) and error rates.
Test with increasing concurrency ramps (10, 50, 100, 500) and identify knee points.

Client harness (Python snippet using clickhouse-driver)

# concur-test.py
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from clickhouse_driver import Client

client = Client(host='localhost')
QUERIES = [
    'SELECT count() FROM tpch.lineitem WHERE l_shipdate >= today() - 365',
    'SELECT l_partkey, sum(l_extendedprice) FROM tpch.lineitem GROUP BY l_partkey ORDER BY sum DESC LIMIT 10'
]

def run_query(q):
    t0 = time.time()
    client.execute(q)
    return time.time() - t0

concurrency = 100
runtimes = []
with ThreadPoolExecutor(max_workers=concurrency) as ex:
    futures = [ex.submit(run_query, QUERIES[i % len(QUERIES)]) for i in range(1000)]
    for f in as_completed(futures):
        try:
            runtimes.append(f.result())
        except Exception as e:
            print('Query error', e)

runtimes.sort()
print('P50', runtimes[int(0.5*len(runtimes))])
print('P95', runtimes[int(0.95*len(runtimes))])
print('P99', runtimes[int(0.99*len(runtimes))])

Tuning knobs to consider

max_threads: global thread limit per server.
max_concurrent_queries and max_memory_usage per query.
Resource pooling: use query pools and settings per user to prioritize dashboard queries.
Consider server-side query result caching for dashboards with repeated reads.

Observability and metrics — what to record

A benchmark is only meaningful with consistent metrics. Instrument the following:

Query latency: P50/P95/P99 per query template.
Throughput: rows/sec and MB/sec for ingestion and query scans.
Resource utilization: CPU%, memory RSS, disk IO, network bytes/sec.
ClickHouse internals: system.metrics, system.events (ProfileEvents), and system.asynchronous_metrics.
Errors: failed queries, mutation failures, replication lag.

Use Prometheus to scrape ClickHouse exporter (built-in) and create a dashboard with panels for these metrics. Archive raw CSV/JSON results for future comparison.

Interpreting results — practical advice

Always compare trends, not single numbers. A 2x throughput improvement at low concurrency may collapse at 100 concurrent connections.
Look for resource saturation signals: if CPU idle is low but latency spikes, the bottleneck might be I/O or network.
When investigating performance regressions across versions, reproduce with the same dataset, same query plans (EXPLAIN), and same server settings.

Common pitfalls and how to avoid them

Not fixing seeds for synthetic data: different data distribution changes join behavior.
Ignoring cold-cache effects: measure both cold and warm cache runs.
Failing to isolate background tasks: system merges and TTL deletions can distort results; schedule these outside measurement windows.
Using non-representative query sets: use real dashboard queries where possible.

2026 trends to include in your evaluations

Object-storage tiering: many deployments now use S3-backed storage for cold data. Measure query latencies against tiered data and quantify egress/latency costs.
Cloud-managed ClickHouse: managed services add multi-tenant isolation and autoscaling — include a test for cluster autoscaling behavior if evaluating managed offerings.
ClickHouse Keeper: if you run clusters, test with Keeper (the ZooKeeper alternative) for stability and replication consistency.

Deliverables — what to publish for stakeholders

Repository with Docker Compose / k8s manifests, data-generation scripts, query sets, and client harnesses.
Benchmark runbook: exact steps to reproduce each test, environment variables, and versions used.
Raw output and a summary PDF with charts: ingestion throughput curves, latency percentiles, and resource saturation graphs.

Actionable checklist before you start

Lock ClickHouse and client versions (e.g., clickhouse-server: 24.x or 25.x).
Decide target scale (SCALE factor for TPC-H, rows/sec for ingestion).
Provision monitoring (Prometheus + Grafana) and logging collection.
Prepare the real query set from production dashboards where possible.
Run warm-up passes, then cold-cache and warm-cache runs, and repeat three times for statistical confidence.

Final thoughts — benchmark defensibly, decide confidently

ClickHouse's expanding ecosystem in 2025–2026 makes it a compelling OLAP choice, but rapid feature growth increases configuration complexity. The only defensible way to evaluate ClickHouse is with reproducible, automated benchmarks that reflect your real workloads: TPC-H joins, streaming ingestion, and interactive concurrency. Publish the scripts and results so peers and auditors can validate claims.

Call to action

Clone our reference repo, run the three benchmark pipelines, and open an issue with your findings. If you need a tailored benchmark for your data model, share a sample schema and query set — we’ll help convert it into a reproducible test. Start benchmarking now and make your ClickHouse decision data-driven.

Unknown

Contributor

Senior editor and content strategist. Writing about technology, design, and the future of digital media. Follow along for deep dives into the industry's moving parts.

Up Next

ClickHouse vs Snowflake: A Hands-on Migration Guide for Analytics Teams

recommendation•10 min read

Practical Guide to Building Reliable Conversational Recommenders for Group Decisions

productivity•10 min read

Reduce AI Project Risk: How to Scope Small Features That Don’t Require Boiling the Ocean

portfolio•9 min read

Job-Ready Portfolio Project: Build a Full-Stack Agent That Books Travel Using Qwen

strategy•10 min read

How to Evaluate the Trade-Offs of On-Device AI Hardware for Mobile-First Startups

From Our Network

Trending stories across our publication group

From Chrome to Puma: Migrating Extensions and Web Apps to Local-AI Browsers

codeacademy.site

webdev•10 min read

How to Evaluate and Select GPU Providers for Model Training: A Checklist for Engineering Teams

Benchmarks You Can Trust: ClickHouse vs. Snowflake vs. DuckDB for Analytics Workloads

codeguru.app

benchmarks•10 min read

Benchmarks You Can Trust: ClickHouse vs. Snowflake vs. DuckDB for Analytics Workloads

Chaos on the Desktop: Building a Safe 'Process Roulette' Simulator for QA

codewithme.online

testing•10 min read

Chaos on the Desktop: Building a Safe 'Process Roulette' Simulator for QA

2026-02-27T05:08:25.922Z