The AI-Native
Database.

SQL + vector search + local embeddings in one binary. No Pinecone. No OpenAI API. No data pipeline. Your existing psycopg2 code works unchanged.

$curl -fsSLgalaxdb.com/get| bash

// Declare an embedding column. GalaxDB generates vectors automatically on every INSERT.

1CREATE TABLE docs (
2 id INT PRIMARY KEY,
3 body TEXT
4 EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2'
5 DIM 384
6);
7 
8-- Embeddings computed automatically. No pipeline needed.
9INSERT INTO docs (id, body) VALUES
10 (1, 'machine learning and neural networks');
AuroraSQL
0.990recall@10HNSW on SIFT-1M, ef=200
258Kwrite TPS16 threads, 1M rows, NVMe
4.49 GB/sscan throughputPAX blocks + zone-map pruning
740tests passing7 chaos scenarios in 10.9s

// THE PROBLEM

Five systems for one product.
GalaxDB is the one.

Modern AI apps glue together a transactional DB, a vector store, a cache, blob storage, and an orchestration tool. Each one drifts. Each one breaks. GalaxDB collapses the stack into a single binary.

Before — the AI stack of 20245 services
PostgreSQL
Transactional rows
Pinecone
Vector index
Redis
Hot cache + queue
S3 + DVC
Blobs + versioning
Airflow
Embedding pipeline
5 dashboards · 5 SLAs · 5 billsdata drift inevitable
After — GalaxDB1 binary
GalaxDB
v1.0 · 60 MB
OLTP rows
Vector index
Time-travel
Embeddings
Blobs
Feedback
1 dashboard · 1 SLA · 1 billstrictly serializable

// CAPABILITIES

What you get out of the box

v1

Unified Data Atom

One row stores structured fields, JSON, full-text, dense embeddings, raw binaries, and lineage. No more fanning data across five systems.

CREATE TABLE documents (
  id INT PRIMARY KEY,
  body TEXT,
  meta JSONB,
  vec EMBEDDING MODEL 'mini-LM',
  raw BLOB
);
v1

Auto-Embedding Pipeline

EMBEDDING MODEL in DDL spawns a sidecar that handles inference, queueing, and back-pressure. You write SQL, the database does the ML plumbing.

INSERT INTO products (name, description)
VALUES ('Tent', 'Lightweight 2-person');

-- ✓ embedding generated
-- ✓ index updated
-- ✓ no Airflow, no Lambda
v1

Time-Travel Queries

Tag snapshots with CREATE VERSION TAG. Query historical data with AT VERSION. Every training run is reproducible. EU AI Act compliance built in.

-- Pin a training snapshot
CREATE VERSION TAG 'train-v1' FOR TRAINING;

-- Query data as it was at that point
SELECT * FROM docs
AT VERSION 'train-v1';
v1

Training Export

One SQL command exports a versioned Lance dataset. Load directly into PyTorch with zero-copy memory mapping. No Airflow, no S3 pipeline.

-- Export as Lance dataset
CREATE VERSION TAG 'train-v2'
  FOR TRAINING
  WITH TRAINING PRECISION 'sq8';

# Python: zero-copy PyTorch
path = db.training_dataset('train-v2')

// HOW IT WORKS

Six capabilities.
Familiar SQL.

Everything ships in v1. No cloud required. No external services.

schema.sql
1-- One line replaces an entire embedding pipeline
2CREATE TABLE docs (
3 id INT PRIMARY KEY,
4 body TEXT
5 EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2'
6 DIM 384
7);
8 
9-- Embedding computed automatically
10INSERT INTO docs (id, body) VALUES (1, 'machine learning and neural networks');
1 row inserted
Embedding generated: 384 dims, 14ms

// BENCHMARKS

Real numbers. Real hardware.

Measured on AWS c6id.4xlarge (Intel Xeon Platinum 8375C, 16 vCPU, 32 GiB RAM, 884 GB NVMe), release build. Reproduction commands in BENCHMARKS.md.

0.990recall@10HNSW on SIFT-1M, ef=200
258Kwrite TPS16 threads, 1M rows, NVMe
4.49 GB/sscan throughputPAX blocks + zone-map pruning
3 µsread p50warm cache, ART index

// COMPARISON

GalaxDB vs the alternatives

Full comparison with pricing →
FeaturePG + pgvectorPineconeQdrantLanceDBGalaxDB
Full SQL queries
Vector search (HNSW)
Local embeddings (no API)
Time-travel (AT VERSION)
Training export (Lance)
Near-dedup (MinHash LSH)
Embedded mode (no server)
PostgreSQL wire protocol
Self-hosted
Single binary
Yes Partial No

// OPEN SOURCE

100% Apache 2.0. Join the community.

GalaxDB v1 is fully open source. Contribute, extend, and run it anywhere. No cloud lock-in. No feature gates.

Loading...

Community support

Apache 2.0

Free forever

v1.0.0-beta.1

Public beta

Want to contribute? We welcome pull requests and issues.

// FOR DEVELOPERS

Familiar SQL.
AI-native primitives.

GalaxDB extends standard SQL with four new keywords: EMBEDDING MODEL, SEMANTIC_MATCH, AT VERSION, and WHERE NOT DUPLICATE. Everything else is standard SQL your tools already understand.

Rust core

Storage engine, WAL, HNSW, and wire protocol all written in Rust. No GC pauses, no JVM overhead.

PostgreSQL wire protocol

Your existing psycopg2, SQLAlchemy, tokio-postgres, and JDBC code works unchanged.

Embedded or server

Use as a Python library with no server (like SQLite), or run as a standalone server.

Zero external deps

Single binary. No Redis, no Kafka, no Airflow. The sidecar for embeddings is optional.

1import galaxdb
2 
3# Embedded mode -- no server needed
4db = galaxdb.Database("./mydata")
5 
6# Create table with auto-embedding
7db.execute("""
8 CREATE TABLE docs (
9 id INT PRIMARY KEY,
10 body TEXT EMBEDDING MODEL
11 'sentence-transformers/all-MiniLM-L6-v2' DIM 384
12 )
13""")
14 
15# Insert -- embeddings computed automatically
16db.execute("INSERT INTO docs (id, body) VALUES (1, 'machine learning')")
17 
18# Semantic search
19rows = db.execute(
20 "SELECT id, body FROM docs WHERE SEMANTIC_MATCH(body, 'AI', 0.4)"
21)
$curl http://localhost:9091/health
{"status":"ok","version":"1.0.0-beta.1","subsystems":{
"sidecar_healthy":true,"connections_active":0}}

One binary. All the AI primitives.

SQL + vector search + local embeddings + training export. No Pinecone. No OpenAI API. No data pipeline. Apache 2.0 open source.

Apache 2.0. Built with Rust.