The AI-Native
Database.

SQL + vector search + local embeddings in one binary. No Pinecone. No OpenAI API. No data pipeline. Your existing psycopg2 code works unchanged.

Download Star on GitHub Read the docs

$curl -fsSLgalaxdb.com/get| bash

schema.sql

// Declare an embedding column. GalaxDB generates vectors automatically on every INSERT.

1CREATE TABLE docs (
2  id   INT PRIMARY KEY,
3  body TEXT
4    EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2'
5    DIM 384
6);
7 
8-- Embeddings computed automatically. No pipeline needed.
9INSERT INTO docs (id, body) VALUES
10  (1, 'machine learning and neural networks');

AuroraSQL

10 lines·SQL·UTF-8

0.990recall@10HNSW on SIFT-1M, ef=200

258Kwrite TPS16 threads, 1M rows, NVMe

4.49 GB/sscan throughputPAX blocks + zone-map pruning

740tests passing7 chaos scenarios in 10.9s

// THE PROBLEM

Five systems for one product.
GalaxDB is the one.

Modern AI apps glue together a transactional DB, a vector store, a cache, blob storage, and an orchestration tool. Each one drifts. Each one breaks. GalaxDB collapses the stack into a single binary.

Before — the AI stack of 20245 services

PostgreSQL

Transactional rows

Pinecone

Vector index

Redis

Hot cache + queue

S3 + DVC

Blobs + versioning

Airflow

Embedding pipeline

5 dashboards · 5 SLAs · 5 billsdata drift inevitable

After — GalaxDB1 binary

GalaxDB

v1.0 · 60 MB

OLTP rows

Vector index

Time-travel

Embeddings

Blobs

Feedback

1 dashboard · 1 SLA · 1 billstrictly serializable

// CAPABILITIES

What you get out of the box

Unified Data Atom

One row stores structured fields, JSON, full-text, dense embeddings, raw binaries, and lineage. No more fanning data across five systems.

CREATE TABLE documents (
  id INT PRIMARY KEY,
  body TEXT,
  meta JSONB,
  vec EMBEDDING MODEL 'mini-LM',
  raw BLOB
);

Auto-Embedding Pipeline

EMBEDDING MODEL in DDL spawns a sidecar that handles inference, queueing, and back-pressure. You write SQL, the database does the ML plumbing.

INSERT INTO products (name, description)
VALUES ('Tent', 'Lightweight 2-person');

-- ✓ embedding generated
-- ✓ index updated
-- ✓ no Airflow, no Lambda

Time-Travel Queries

Tag snapshots with CREATE VERSION TAG. Query historical data with AT VERSION. Every training run is reproducible. EU AI Act compliance built in.

-- Pin a training snapshot
CREATE VERSION TAG 'train-v1' FOR TRAINING;

-- Query data as it was at that point
SELECT * FROM docs
AT VERSION 'train-v1';

Training Export

One SQL command exports a versioned Lance dataset. Load directly into PyTorch with zero-copy memory mapping. No Airflow, no S3 pipeline.

-- Export as Lance dataset
CREATE VERSION TAG 'train-v2'
  FOR TRAINING
  WITH TRAINING PRECISION 'sq8';

# Python: zero-copy PyTorch
path = db.training_dataset('train-v2')

// HOW IT WORKS

Six capabilities.
Familiar SQL.

Everything ships in v1. No cloud required. No external services.

schema.sqlSQL

1-- One line replaces an entire embedding pipeline
2CREATE TABLE docs (
3  id   INT PRIMARY KEY,
4  body TEXT
5    EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2'HuggingFace model, runs locally
6    DIM 384
7);
8 
9-- Embedding computed automatically
10INSERT INTO docs (id, body) VALUES (1, 'machine learning and neural networks');vector stored, index updated

1 row inserted

Embedding generated: 384 dims, 14ms

// BENCHMARKS

Real numbers. Real hardware.

Measured on AWS c6id.4xlarge (Intel Xeon Platinum 8375C, 16 vCPU, 32 GiB RAM, 884 GB NVMe), release build. Reproduction commands in BENCHMARKS.md.

0.990recall@10HNSW on SIFT-1M, ef=200

258Kwrite TPS16 threads, 1M rows, NVMe

4.49 GB/sscan throughputPAX blocks + zone-map pruning

3 µsread p50warm cache, ART index

// COMPARISON

GalaxDB vs the alternatives

Full comparison with pricing →

Feature	PG + pgvector	Pinecone	Qdrant	LanceDB	GalaxDB
Full SQL queries
Vector search (HNSW)
Local embeddings (no API)
Time-travel (AT VERSION)
Training export (Lance)
Near-dedup (MinHash LSH)
Embedded mode (no server)
PostgreSQL wire protocol
Self-hosted
Single binary

Yes Partial No

// OPEN SOURCE

100% Apache 2.0. Join the community.

GalaxDB v1 is fully open source. Contribute, extend, and run it anywhere. No cloud lock-in. No feature gates.

View on GitHub Read the Docs

Community support

Apache 2.0

Free forever

v1.0.0-beta.1

Public beta

Want to contribute? We welcome pull requests and issues.

Report an issue-Submit a PR-Join discussions

// FOR DEVELOPERS

Familiar SQL.
AI-native primitives.

GalaxDB extends standard SQL with four new keywords: EMBEDDING MODEL, SEMANTIC_MATCH, AT VERSION, and WHERE NOT DUPLICATE. Everything else is standard SQL your tools already understand.

Rust core

Storage engine, WAL, HNSW, and wire protocol all written in Rust. No GC pauses, no JVM overhead.

PostgreSQL wire protocol

Your existing psycopg2, SQLAlchemy, tokio-postgres, and JDBC code works unchanged.

Embedded or server

Use as a Python library with no server (like SQLite), or run as a standalone server.

Zero external deps

Single binary. No Redis, no Kafka, no Airflow. The sidecar for embeddings is optional.

Read the docs →View on GitHub →Discussions →

rag.py

1import galaxdb
2 
3# Embedded mode -- no server needed
4db = galaxdb.Database("./mydata")
5 
6# Create table with auto-embedding
7db.execute("""
8  CREATE TABLE docs (
9    id   INT PRIMARY KEY,
10    body TEXT EMBEDDING MODEL
11      'sentence-transformers/all-MiniLM-L6-v2' DIM 384
12  )
13""")
14 
15# Insert -- embeddings computed automatically
16db.execute("INSERT INTO docs (id, body) VALUES (1, 'machine learning')")
17 
18# Semantic search
19rows = db.execute(
20  "SELECT id, body FROM docs WHERE SEMANTIC_MATCH(body, 'AI', 0.4)"
21)

$curl http://localhost:9091/health

{"status":"ok","version":"1.0.0-beta.1","subsystems":{

"sidecar_healthy":true,"connections_active":0}}

One binary. All the AI primitives.

SQL + vector search + local embeddings + training export. No Pinecone. No OpenAI API. No data pipeline. Apache 2.0 open source.

Download Star on GitHub

Apache 2.0. Built with Rust.

The AI-Native
Database.

Five systems for one product.
GalaxDB is the one.

What you get out of the box

Unified Data Atom

Auto-Embedding Pipeline

Time-Travel Queries

Training Export

Six capabilities.
Familiar SQL.

Auto-Embedding on INSERT

Semantic Search in SQL

Time-Travel Queries

Training Export to PyTorch

Near-Duplicate Deduplication

Embedded or Server Mode

Real numbers. Real hardware.

GalaxDB vs the alternatives

100% Apache 2.0. Join the community.

Familiar SQL.
AI-native primitives.

Rust core

PostgreSQL wire protocol

Embedded or server

Zero external deps

One binary. All the AI primitives.

The AI-NativeDatabase.

Five systems for one product.GalaxDB is the one.

What you get out of the box

Unified Data Atom

Auto-Embedding Pipeline

Time-Travel Queries

Training Export

Six capabilities.Familiar SQL.

Auto-Embedding on INSERT

Semantic Search in SQL

Time-Travel Queries

Training Export to PyTorch

Near-Duplicate Deduplication

Embedded or Server Mode

Real numbers. Real hardware.

GalaxDB vs the alternatives

100% Apache 2.0. Join the community.

Familiar SQL.AI-native primitives.

Rust core

PostgreSQL wire protocol

Embedded or server

Zero external deps

One binary. All the AI primitives.

The AI-Native
Database.

Five systems for one product.
GalaxDB is the one.

Six capabilities.
Familiar SQL.

Familiar SQL.
AI-native primitives.