Run Your Own Benchmarks

Everyone says you should. BenchBox makes it easy to do. Use as CLI, Python library, or MCP for AI assistants. No Docker. No compilers. Just pip install.

CLI, Library, or MCP

# CLI - Quick benchmarking
benchbox run --platform duckdb --benchmark tpch

# MCP - AI assistant integration
"Run TPC-H on DuckDB" # Claude executes via MCP

# Library - Deep Python integration
from benchbox import TPCH

Star on GitHub ⭐ v0.1.1 Python 3.10+ No Docker Required

BenchBox - Database Benchmarking Toolkit

Supported Platforms

Single-Node Analytics Engines

In-process and local columnar databases

DuckDB

In-process analytical database. Built-in, no setup required.

Built-in • OLAP

ClickHouse

Column-oriented OLAP database for real-time analytics.

Local/Cloud • OLAP

DataFusion

Apache Arrow-native query engine for in-memory analytics.

In-process • Arrow

Row-Based Databases

Traditional relational databases

PostgreSQL

The world's most advanced open source relational database.

OLTP/OLAP • Relational

SQLite

Embedded transactional database for testing and CI/CD.

Built-in • OLTP

Cloud Data Platforms

Enterprise cloud data warehouses and lakehouses

Snowflake

Multi-cloud data warehouse with elastic compute.

Cloud • Warehouse

Databricks

Data Intelligence Platform with lakehouse architecture.

Cloud • Lakehouse

BigQuery

Google Cloud serverless data warehouse.

GCP • Serverless

ClickHouse Cloud

Managed ClickHouse with serverless and dedicated options.

Cloud • OLAP

Redshift

AWS cloud data warehouse (serverless or provisioned).

AWS • Warehouse

Amazon Athena

AWS serverless query service for S3 data lakes.

AWS • Serverless

Azure Synapse

Microsoft Azure analytics service.

Azure • Analytics

Microsoft Fabric

Unified analytics platform with OneLake.

Azure • Unified

Firebolt

Cloud data warehouse optimized for analytics.

Cloud • OLAP

MotherDuck

Serverless cloud DuckDB with hybrid local/cloud execution.

Cloud • DuckDB

Starburst

Enterprise Trino with managed cloud offering.

Cloud • Federation

Open Source Platforms

Distributed query engines for federated data

Trino

Distributed SQL query engine for federated data.

Distributed • Federation

PrestoDB

Distributed SQL query engine (Meta's fork).

Distributed • Federation

Apache Spark SQL

Distributed SQL engine for large-scale data processing.

Distributed • Spark

Managed Spark Services

Cloud-managed Spark for lakehouse and data lake analytics

Onehouse Quanton

Serverless Spark with Hudi, Iceberg, and Delta Lake support.

Serverless • Multi-format

AWS Glue

Serverless data integration service with Spark ETL.

AWS • ETL

Amazon EMR Serverless

Serverless Spark with automatic scaling.

AWS • Serverless

Athena for Apache Spark

Interactive Spark notebooks on AWS Athena.

AWS • Interactive

GCP Dataproc

Managed Spark and Hadoop clusters on Google Cloud.

GCP • Clusters

GCP Dataproc Serverless

Serverless Spark with no cluster management.

GCP • Serverless

Microsoft Fabric Spark

SaaS Spark with OneLake integration.

Azure • SaaS

Azure Synapse Spark

Enterprise Spark pools with ADLS Gen2.

Azure • Enterprise

Snowpark Connect

PySpark API compatibility on Snowflake.

Snowflake • PySpark API

Time Series Databases

Optimized for time-stamped data

TimescaleDB

Time-series database built on PostgreSQL.

Time-series • PostgreSQL

InfluxDB

Time-series database for metrics and events.

Time-series • IoT

DataFrame Platforms

Native DataFrame APIs instead of SQL

Polars

Fast Rust-based DataFrame library with lazy evaluation.

polars-df • Expression API

Pandas

Reference Python DataFrame implementation.

pandas-df • Pandas API

PySpark DataFrame

Apache Spark DataFrame API for distributed computing.

pyspark-df • Distributed

DataFusion DataFrame

Arrow-native DataFrame with lazy evaluation.

datafusion-df • Arrow

Modin

Distributed Pandas replacement (Ray/Dask backend).

modin-df • Pandas API

Dask

Parallel computing library with DataFrame API.

dask-df • Distributed

cuDF (RAPIDS)

GPU-accelerated DataFrame library from NVIDIA.

cudf-df • GPU

AI Assistant Integration

BenchBox includes an MCP (Model Context Protocol) server for Claude Code and other AI assistants. Run benchmarks with natural language instead of memorizing CLI flags.

Quick Setup

Add to Claude Code

claude mcp add benchbox -- uv run python -m benchbox.mcp

What You Can Do

Natural language commands for benchmarking workflows

Discover

"What benchmarks are available?"

"Which platforms support TPC-DS?"

Explore 18 benchmarks and 38 platforms without reading documentation.

Execute

"Run TPC-H on DuckDB at scale 0.1"

"Compare Polars and Pandas on SSB"

Run benchmarks without memorizing CLI syntax or options.

Analyze

"Which queries were slowest?"

"Compare results from my last two runs"

Get AI-powered analysis of performance patterns and regressions.

Full MCP Documentation →

Get Started

1. Install

uv add benchbox

2. Run as CLI

# Quick TPC-H benchmark on DuckDB
benchbox run --platform duckdb --benchmark tpch --scale 0.1

# Preview on cloud before spending credits
benchbox run --platform databricks --benchmark tpch --dry-run ./preview

3. Or Use as Library

from benchbox import TPCH

# Initialize and generate benchmark data
tpch = TPCH(scale_factor=0.1)
data_files = tpch.generate_data()

# Get schema and queries
create_sql = tpch.get_create_tables_sql()
query = tpch.get_query(1)  # Q1: Pricing Summary Report

Requirements

Python 3.10 or higher
No external dependencies for data generation
Optional: database drivers (duckdb, sqlite3, etc.)

Run Your Own Benchmarks

What Makes It Simple

Python-Only Core

18 Benchmarks

SQL & DataFrame

One Benchmark, Run Anywhere

Generate Data at Any Scale

AI-Powered with MCP

Benchmarks by Category

TPC Standards

TPC-H

TPC-DS

TPC-DI

Academic Benchmarks

Star Schema Benchmark

AMPLab

Join Order Benchmark

Industry Benchmarks

ClickBench

H2ODB

NYC Taxi

TSBS DevOps

CoffeeShop

BenchBox Primitives

Read Primitives

Write Primitives

Transaction Primitives

BenchBox Experimental

TPC-Havoc

TPC-H Skew

TPC-DS-OBT

Data Vault