clojure
May 15, 2019

Introducing XTDB

Open Time Store

author picture
Jon Pither
CEO & Co-founder
image

Crux is now XTDB September 2021 Update: Crux has recently been renamed XTDB. The official home for XTDB is now https://xtdb.com.

We have just released Crux - an open-source bitemporal database.

What is Crux?

Crux is a document store that indexes documents for graph query. The indexes are bitemporal, meaning that you can query against valid time and transaction time, the usefulness of which is covered in our previous post “the value of bitemporality”.

Unbundled

Crux is an unbundled database - to use Martin Kleppman’s phrase - shipping as a connected set of pluggable parts. This means that users can swap out parts and contribute their own, and that Crux itself follows the Unix philosophy of each part doing one thing particularly well.

This pluggability allows Crux to scale with you as your scaling needs increase. You can start out using Crux with the transaction log being a local-disk based implementation, and then in future you could switch it out to Kafka, which offers much higher data throughput and retention guarantees.

With the open, unbundled architecture, it’s intended that Crux be extended and experimented with. The various parts in Crux are described by Clojure protocols, meaning that users can get in and provide their own implementations that would either fully replace or decorate the existing ones.

How Crux Works

image

Crux is schemaless, with transactions being submitted through the Crux API. The data is then sent to two event-log topics for storage: the transaction topic and the document topic.

We use two topics because whilst the transaction topic is immutable, messages in the document topic can be permanently erased, forming the basis of Crux’s ground-up strategy to provide ease of content eviction for data privacy reasons, to align with compliance regimes such as GDPR.

Using a separate topic for the content documents also allows for compaction to remove duplicates, as the message ID is a content hash of the document. From a Kafka perspective, the transaction topic uses a single Kafka partition, but it is in our roadmap to shard the document topic to potentially use multiple partitions.

The event-log that Crux uses is the golden store of data, with Crux leveraging Kafka’s infinite retention capability.

Crux Nodes will then ingest the data from the event-log and index the transactions and documents locally into a local Key/Value store such as RocksDB or LMDB, which acts as the foundation for both a local document store and a set of bitemporal indexes that Crux maintains for graph query. RocksDB and LMDB use fundamentally different data structures and therefore present a choice of performance characteristics and trade-offs.

Crux currently supports both a Java and Clojure API. See the JavaDocs.

Transacting and Querying

Crux supports an Edn Datalog format, similar to - though not the same as - Datomic’s. To get a feel of transacting to and querying against Crux, check out the query documentation and/or read Ivan Fedorov’s “a bitemporal tale”.

Crux supports four transaction operations:

  • PUT

  • DELETE

  • CAS

  • EVICT

PUT will store a document whereas DELETE will delete it from a given valid time, but the data will still be stored in Crux history. Use EVICT to get rid of data permanently, either for all of history, or for a given valid time window. Use CAS to compare-and-swap, to ensure that the data in a document/entity is what you think it is before adding a new version, or else abort the transaction.

Inside of Crux we use a Worse Case Optimal Join algorithm, which enables the query engine to lazily stream out results for an arbitrary complex query with multiple join conditions and clauses. This, in combination with an external merge sort used for additional sorting, means that we avoid manifesting intermediary results in memory.

Deployment

image

Crux can be deployed as a JAR file within your application, or Crux has a HTTP server that you can use. You can use Crux in a standalone mode without Kafka (substituting in a local disk-based event-log), or you can deploy a cluster of Crux nodes that use Kafka.

Open

Crux is open source so that you can see the code, commit history, warts and all. You can see the GitHub issues where design decisions are made, and you can contribute in this process. You can fork Crux and send PRs our way. We encourage developers to try out Crux and to expose and publish patterns of using it, to feedback their ideas and critique.

Crux is a product that JUXT will offer various support models for, including enterprise support and managed hosting. If you have any questions about Crux or would like to talk to us about using it, please email us or visit our Zulip.

Have a play with XTDB - add the JAR to your project and scale up from there. Crux is Alpha. Please raise any issues on our GitHub.

Recommended Resources
Head Office
Norfolk House, Silbury Blvd.
Milton Keynes, MK9 2AH
United Kingdom
Company registration: 08457399
Copyright © JUXT LTD. 2012-2024
Privacy Policy Terms of Use Contact Us
Get industry news, insights, research, updates and events directly to your inbox

Sign up for our newsletter