For the past two weeks Oleg Andreev and me spent most of our time working on a stuff we enjoyed really a lot — StrokeDB project
What’s it?
StrokeDB is a lightweight approach to document-oriented database, currently implemented in Ruby. The concept is pretty much simple:
- each document is uniquely identified by UUID
- each document has a set of slots, which are basically key/value pairs, where key is a string and value is a simplistic data structure (boolean, number, string, array, hash — like in JSON)
- each time you update documents, its version is updated. Version is basically a hash-function for document content.
- reference to previous version is automatically maintained by StrokeDB
- each document may reference 1+ “meta documents”, which are the documents that declaratively describe an essence of a particular document
One of the motivations for StrokeDB was my desire to decentralize some databases. Currently databases are pretty much centralized, like in SaaS you use — you basically host your data at some company’s data center. I believe that in some cases it is not a proper way of managing your data. Due to centralization you put your data security at risk, you need their database software to be really shining fast (because there a lot of clients working with their data), etc. But what I really want is to have my data right where I am working with it (i.e. on my laptop), be able to share it with other parties in a secure way, back it up, etc.
So, yes, I just want to return some data to the client’s computer.
That’s how I came to StrokeDB, which was greatly inspired by Git and my previous experiments in metaframe databases.
Why another document database?
Why not CouchDB/ThruDB/SimpleDB? Well, we had a number of reasons to launch own project:
- We want it to be really lightweight, and basically, embeddable. That’s how it is implemented now — it is just a Ruby library.
- We want to workaround natural limitations of the mentioned DBs. CouchDB does not support code injection to the database core, indexes in particular (like in PostgreSQL). SimpleDB is hosted elsewhere, supports very primitive queries, not extendable. ThruDB supports only keyword-based search index (no special indexes). Also, partitioning and distribution is done via SimpleDB.
- We want to build a system on the top of concept of asynchronous operation. We do not rely on locking or a synchronous conflict resolution (aka optimistic locking). Well-designed asynchronous workflow leads to several useful features: unlimited data distribution, offline work, replication-based load balancing, data consistency, availability and fast access altogether.
Metadocuments?
Here is a simple example of metadocuments usage:
Imagine you have document that represents some concrete apple:
some_apple:
weight: 3oz
color: green
price: $3
it could have three metadocuments that “describe it”: Apple, Fruit
and Product:
some_apple:
__meta__: [Apple, Fruit, Product]
weight: 3oz
color: green
price: $3
Upon this document load ruby object will be extended by three modules
(Apple, Fruit and Product).
For example, you have them defined as
Apple = Meta.new
Fruit = Meta.new do
def green?
color == 'green'
end
end
Product = Meta.new do
def sell!
# ...
end
end
So when you load that some_apple document (by finding it with slot-based search, or by its UUID), you will have an object that also responds to #green? and #sell! methods.
It will also will respond positively to #is_a?(Apple), #is_a?(Fruit), #is_a?(Product)
Some examples?
Here you go:
config = StrokeDB::Config.new(true)
config.add_storage :mem, :memory_chunk
config.add_storage :fs, :file_chunk, 'test/storages/test'
config.chain :mem, :fs
config[:mem].authoritative_source = config[:fs]
config.add_storage :index_storage, :inverted_list_file, 'test/storages/index'
config.add_index :default, :inverted_list, :index_storage
config.add_store :default, :skiplist, :mem, :cut_level => 4
User = Stroke::Meta.new
unless u = config.indexes[:default].find(:__meta__ => User.document, :email => "someemail@gmail.com").first
puts "User not found, creating new user"
u = User.new :email => "someemail@gmail.com"
u.save!
else
puts "We've found him!"
end
puts u
config[:mem].sync_chained_storages!
What do we still miss?
A lot:
- Transactions (though we have some building blocks ready to build them)
- Replication (but again, we have building blocks for streaming replication already)
- Efficient indexes
- Nice API (time cures this disease!)
But hey, it was only two weeks of hacking — so stuff is definitely coming.
Questions? Ideas?
Join our mailing list