Traccia: » ALCA in Redis-land
Non sei ancora collegato! Inserisci il tuo nome utente e la tua password per autenticarti. E' necessario che il tuo browser abbia i cookie abilitati.
Short summary of an adventurous journey in the NOSQL world with Redis.
As every other journey, the ours too is made of legs, so let’s write them down before going on:
To make a long story short, Redis is a sort of key-value database on steroids. Why steroids? Simply because Redis is extremely fast (all data are loaded in memory and saved on disk, if desired) and feature rich (it supports various datatypes along with many complex operations on them).
An important peculiarity of Redis is the fact that it’s not a database in the common meaning of the word; it is a database in the sense that it stores and keeps data for you, but it doesn’t provide any SQL dialect to play with them (as relational databases do). Don’t be afraid, Redis is not a store-and-lose database, it just doesn’t provide our beloved SQL and all its amenities, but it has a robust and tested protocol that lets you interact with it.
In Redis you’ll not deal with tables, selects, joins, views; there’re no integer or varchar fields and so on. You have to play with a very low level dataset structure and with more primitive datatypes.
Let’s give a more deep look at how this odd database work. As seen in the previous leg, Redis is based on a key-value store paradigm and we’ll focus for the moment on its "key" part.
Keys are simply strings (officially not binary safe. UPDATE: now, key strings are binary safe.) such as "username" or "password". You can’t use space in keys, but alphanumeric values such as ".", ":", "_" (and so on) work fine, so keys like "user_name", "user:123:age", "user:123:username" are perfectly legal ones.
We must put a lot of attention to keys because, unlike field names in the RDBMS world, they are fully engaged in the game. As we’ll see in the following sub-legs, Redis doesn’t knows about tables, so simple tasks such as "SELECT username from users WHERE user_id=123;" have to be approached in a different way. As an example, to obtain in Redis the same result of the previous SQL query, we’ll have to get the value of the key "user:123:username", so, as you can see, keys could bring "information" (such as user ids) with them and play a very different role than the field names of an SQL table.
Now that things about keys should be more clear, let’s take a tour of the available datatypes.
Redis strings are the most basic available datatype; they are normal, binary-safe strings of maximum dimension of 1 Gb.
You can set a key to a string value using the SET command and get it using GET. If you need to store a numeric value, such as counter, you can safely save it on a string and use INCR or DECR to increment or decrement it.
Redis lists are collections of Redis strings sorted by insertion order. You can think of a list as a chain, you can add a new link on the left extremum (the head) or on the right (the tail); you can also add a link in the center, but you have to broke another link before.
You can add elements to a list using the LPUSH and RPUSH commands (L for left and R for right), get an element (removing it) with LPOP and RPOP and get a range of elements (without removing any) with LRANGE. You can also add an element on a given point with LSET, but, as for the chain example, this operation is much more slower than a simple push.
Redis sets are just like the homonymous mathematical structure, collection of distinct objects; in this particular case the objects are Redis strings. As you could argue, sets are different from lists for two main reasons: elements are unordered and they are all distinct, you can’t have two equal elements.
You can add an element to a set with the command SADD, remove an element with SREM, get (and remove) a random element with SPOP and perform the set operations of union, intersection and difference with SUNION, SINTER and SDIFF respectively.
Redis ordered sets are quite like normal sets, with the exception that every set element has an associated weight used to take this element in order with others.
You can work with sorted sets as with normal ones, what changes is just the names of the commands: ZADD to add an element, ZREM to remove it and so on. Two different commands are ZINCR and ZSCORE, the former is used to increment the score of an element and the latter to get its score.
So, working with Redis data is really different from what we’re used to with SQL tables, you haven’t a language to query the server, you just have some commands to manipulate the keys stored in the database. Commands are datatype-sensitive, you can’t use a set command on a list, otherwise you’ll get an error. Commands could be issued via the redis-cli or using one of the many wrappers for your preferred programming language. In the following, we’ll be agnostic about the way you’ll query Redis, we’ll just write the commands and you’ll issue them as you prefer.
Let’s think about a simple SQL table where we could save users for some application:
We want now to save the same data in Redis, so we have to design our database to fit this situation. Maybe, it’s better to think about the problem from an application-centric point of view; in the SQL version, our application would get users’ data issuing a SELECT statement having as clause the users’ id; in other words, we need a way to distinguish between different data entries. Additionally, we need to get user’s data with the sole knowledge of an unique identifier. This situation could be solved solved if we use the user id as a part of the redis key, so the records of the previous table would translate as
Well, given the id of an user, we could now get all its data reading the keys user:id:username, user:id:password, user:id:name and user:id:surname.
Our data are suitable for a login process, so we have to provide a way to gets the user’s id, given the username; in other words, we have to establish a relation between usernames and user ids. This can be done if whe add another redis key to our data design: user:username:id.
So, if Mario Rossi will try to login to our application, we can get his id knowing its username and so we’ll get all its data.
Another problem is the uniqueness of our ids. In the SQL world we say, for example, "id int primary key auto_increment", now we have to implement a similar behavior to ensure that we have a different id for each user. As in the previous case, Redis has a solution: we can simply create another key "user:next_id" and use it as a counter that we manipulate via the INCR command whenever we add a new user.
The next problem is to get the user list. We could think that what we’ve already done is enough to get this list: we could get the current value "user:next_id" and get users’ data from 0 to this value in one or more steps. Well, now let’s think about an user that has cancelled it’s subscription (our next problem), when we’ll traverse all the integers from 0 to user:next_id we’ll find its old id that now should not has any data.
Althought this is not a serious problem, we wouldn’t waste our time trying to get the data of users that don’t exist anymore, so we’ll create a new key "user:list" of list (or set) type in wich we’ll add all new user ids, removing them when needed. We prefer using a list because this give us the possibility to do a sort of "pagination" (maybe you’re thinking about "LIMIT"?) using the LRANGE command.
The last problem is a matter of "data integrity"; what would happen when we delete a user? We sould remove anything that references to it. In other words, we should delete all the keys user:id:*, user:username:id and the id in "user:list".
As an example of what we learned in this journey, let’s try to design an application that will work as a virtual library that let us group our books by topics. This example will be slightly more complex than the user’s table, but we’ll teach us how to handle relations with Redis.
In our application, we should be able to collect our books, storing informations such as title, author(s), topic(s), pages, price, ISBN and a description. Obviously, we could have a book with more than one author and that cover more than one topic at the same time (for example, a book that is about programming ad ruby). Additionally, one author could have written more than one book and a topic could be dealed with in more than one book. From what we’ve written it arises clearly that there’re a many-to-many relation between authors and books and between topics and books.
First of all, let’s try to design the SQL tables (with some data) that model this situation, in order to see how works the translation in the Redis idiom:
|1||Programming Ruby: The Pragmatic Programmers’ Guide||829||$26||0974514055||The bible of the ruby programming language|
|2||Erlang Programming||496||$42||0596518188||An interesting introduction to erlang|
|1||programming||Books about programming|
|2||ruby||Books about ruby|
|3||erlang||Books about erlang|
It should be already clear that the trhee tables Books, Authors and Topics are not a problem; we’ve already learned how to save this data in Redis in the previous leg (see the next image). The problem arises when considering the link tables Books-Authors and Books-Topics that implement the two many-to-many relations. How to implement them? Let’s think about topics only, the problem for authors is merely a twin (with a different name) of the former and we’ll use the same strategy.
For every book we have to know the topics that belongs to it and, in the same way, for every topic we have to know every book that deals with it. In other words, for each book we need a collection of the related topic ids and for each topic a collection of the related book ids. This is a problem where sets fit perfectly! We’re going to create two sets, book:id:topic and topic:id:books, in the former we’ll save the topic ids that match the book and in the latter we’ll save the ids of the books that match this topic. For example, referring to the data shown in the previous sub-leg, the book "Programming Erlang" (book id 2) will have a key book:2:topics of set type, with elements (1, 3); while the topic programming will have a set topic:1:books with elements (1, 2).
As said early, the same works for authors, so we could say that a SQL’s many-to-many relation translates in Redis as two sets. This is really useful, because it give us the ability to obtain an important feature for free: we could easily find a book that covers more than one topic doing a set intersection between all topic:id:books sets related to the topics we are interested on. So, for example, the intersection between topic:1:books (books about programming) and topic:2:books (books about ruby) will return the set (1), i.e. only the book "Programming Ruby" with id 1.
A special care should be used whenever we remove something from our database. As shown in the previous figure, for example, topics have a reference to books and books have a reference to topics, so, what to remove? Well, surely we should remove every book:id:* keys, but before doing this we have to traverse all topics to remove the book id from the set topic:id:books. Additionally, we should remove the book id from the book:list list. If we want to remove a topic, we have to procede in a very similar way: before removing each topic:id:* key we have to traverse all books referenced by the topic we want to delete and remove the topic id from the set book:id:topics. The same apply for authors.
We like fun and programming, so to lear by doing consider taking a look to the companion code of our virtual library. It is implemented in ruby, using ezmobius’ redis-rb. Both the code and the look’n’feel are very rough, so feel free to make a pull request if you have improvements.
Our journey is reaching its end, we’re quite back home and so now it’s time to pack our luggages and buy some souvenirs.
In the luggage we have to put all we’ve learned about Redis: datatypes, commands, oddities and so on.
The souvenirs are the three patterns we’ve learned:
Our journey is completed and now we’re definitely at home; we just hope to don’t suffer of post-holidays stress, but in any case we’ve the right cure: having fun coding free software!
Domenico Delle Side (domenico.delleside AT alcacoop.it), AlcaCoop (info AT alcacoop.it)