The AI-Native
Database.
SQL + vector search + local embeddings in one binary. No Pinecone. No OpenAI API. No data pipeline. Your existing psycopg2 code works unchanged.
// Declare an embedding column. GalaxDB generates vectors automatically on every INSERT.
// THE PROBLEM
Five systems for one product.
GalaxDB is the one.
Modern AI apps glue together a transactional DB, a vector store, a cache, blob storage, and an orchestration tool. Each one drifts. Each one breaks. GalaxDB collapses the stack into a single binary.
// CAPABILITIES
What you get out of the box
Unified Data Atom
One row stores structured fields, JSON, full-text, dense embeddings, raw binaries, and lineage. No more fanning data across five systems.
CREATE TABLE documents ( id INT PRIMARY KEY, body TEXT, meta JSONB, vec EMBEDDING MODEL 'mini-LM', raw BLOB );
Auto-Embedding Pipeline
EMBEDDING MODEL in DDL spawns a sidecar that handles inference, queueing, and back-pressure. You write SQL, the database does the ML plumbing.
INSERT INTO products (name, description) VALUES ('Tent', 'Lightweight 2-person'); -- ✓ embedding generated -- ✓ index updated -- ✓ no Airflow, no Lambda
Time-Travel Queries
Tag snapshots with CREATE VERSION TAG. Query historical data with AT VERSION. Every training run is reproducible. EU AI Act compliance built in.
-- Pin a training snapshot CREATE VERSION TAG 'train-v1' FOR TRAINING; -- Query data as it was at that point SELECT * FROM docs AT VERSION 'train-v1';
Training Export
One SQL command exports a versioned Lance dataset. Load directly into PyTorch with zero-copy memory mapping. No Airflow, no S3 pipeline.
-- Export as Lance dataset CREATE VERSION TAG 'train-v2' FOR TRAINING WITH TRAINING PRECISION 'sq8'; # Python: zero-copy PyTorch path = db.training_dataset('train-v2')
// HOW IT WORKS
Six capabilities.
Familiar SQL.
Everything ships in v1. No cloud required. No external services.
// BENCHMARKS
Real numbers. Real hardware.
Measured on AWS c6id.4xlarge (Intel Xeon Platinum 8375C, 16 vCPU, 32 GiB RAM, 884 GB NVMe), release build. Reproduction commands in BENCHMARKS.md.
// COMPARISON
GalaxDB vs the alternatives
| Feature | PG + pgvector | Pinecone | Qdrant | LanceDB | GalaxDB |
|---|---|---|---|---|---|
| Full SQL queries | |||||
| Vector search (HNSW) | |||||
| Local embeddings (no API) | |||||
| Time-travel (AT VERSION) | |||||
| Training export (Lance) | |||||
| Near-dedup (MinHash LSH) | |||||
| Embedded mode (no server) | |||||
| PostgreSQL wire protocol | |||||
| Self-hosted | |||||
| Single binary |
// OPEN SOURCE
100% Apache 2.0. Join the community.
GalaxDB v1 is fully open source. Contribute, extend, and run it anywhere. No cloud lock-in. No feature gates.
Community support
Free forever
Public beta
Want to contribute? We welcome pull requests and issues.
// FOR DEVELOPERS
Familiar SQL.
AI-native primitives.
GalaxDB extends standard SQL with four new keywords: EMBEDDING MODEL, SEMANTIC_MATCH, AT VERSION, and WHERE NOT DUPLICATE. Everything else is standard SQL your tools already understand.
Rust core
Storage engine, WAL, HNSW, and wire protocol all written in Rust. No GC pauses, no JVM overhead.
PostgreSQL wire protocol
Your existing psycopg2, SQLAlchemy, tokio-postgres, and JDBC code works unchanged.
Embedded or server
Use as a Python library with no server (like SQLite), or run as a standalone server.
Zero external deps
Single binary. No Redis, no Kafka, no Airflow. The sidecar for embeddings is optional.
One binary. All the AI primitives.
SQL + vector search + local embeddings + training export. No Pinecone. No OpenAI API. No data pipeline. Apache 2.0 open source.
Apache 2.0. Built with Rust.