Redis: Storing time-series data in a sorted set

Along with a regular Set data type, Redis also has a Sorted Set data type. Like a regular set, you can add unique items to the set using the ZADD and fetch them back again using various other commands.

The main difference with a sorted set is that when you add an item, you must also provide a “score” which is used to determine the item’s order in the set. The score can be any integer, and can be completely unrelated to both the key and value being stored.

It is important to note that a sorted set is still a set! If you add identical values multiple times, the item will only appear once using the last score assigned. For this reason, you can’t store arbitrary data in a set unless you can guarantee each value is unique (i.e., it is either a unique identifier or is a data blob containing a unique identifier).

In my case, I wanted to store time-series data in a sorted set, but, since I can be pretty sure that the values won’t be unique, I can’t use a plain sorted set (i.e., if the value on Monday is 10, and the value on Wednesday is 10, then the Monday value gets clobbered since a set only stores unique values). However, I did figure out a handy work-around using Redis’s built-in scripting capabilities.

Why use a sorted set?

The very handy thing about sorted sets is that you can use a single command to fetch all the values within a certain time frame. Using ZADD, you store a given value in a key with a score which represents its sort order. For time-series data, the key can be anything you like, the score should be the time of the event (as an integer), and the value should contain the data.

Once the data is loaded, the ZRANGEBYSCORE command can fetch a sorted list of values whose scores fall within a certain range.

But what about uniqueness?

The problem is that a sorted set is still a set, and it won’t sort non-unique values (even if they have different scores). Let’s take an example. Suppose you’re measure the temperature outside your home each day. Let’s further suppose that on Monday, the temperature was 52°. Your command would be:

ZADD temp:house 1421308800000 52

On Tuesday, perhaps it was a bit warmer:

ZADD temp:house 1421395200000 65

But, on Wednesday, it got colder again:

ZADD temp:house 1421481600000 52

Now, we’ve got problems. Since uniqueness only pertains to the value, regardless of the score, you just changed the timestamp of 52° to Wednesday, thus deleting the data recorded on Monday.

How to create uniqueness

To work around this issue of uniqueness, I decided to add the timestamp to the data as well as the score. One could certainly do this on the client side (e.g., by encoding both temperature and timestamp in JSON), but I wanted to be able to access that data from within Redis’s scripting environment for other reasons.

It turns out, the Redis scripting language was also the key to encoding the data as I needed. To start, I used the LOAD command to store a LUA script:

local value = cmsgpack.pack({ARGV[2], ARGV[1]})
redis.call('zadd', KEYS[1], ARGV[1], value)
return redis.status_reply('ok')

The result of the command is to load the script and return an identifier you can later use to call it (which happens to be the SHA of the script). This uses the cmsgpack built-in library to create a very efficient binary-encoded representation of the two pieces of data.

Now, instead of just using ZADD to store my values, I invoke my script with the EVALSHA command:

EVALSHA <sha> 1 temp:house 1421481600000 52

Now, instead of storing 52° as my value, I’ve got a binary record combines 52 and 1421481600000, and therefore is unique to that moment in time.

Getting the data back again

In my case, I wanted to use the data by performing calculations on it in another LUA script. Without getting into too much detail, I basically just used the LOAD and EVALSHA commands to execute a LUA script containing:

for k, v in ipairs(redis.call('zrangebyscore', KEYS[1], ARGV[1], ARGV[2])) do
    local value = tonumber(cmsgpack.unpack(v)[1])

In this example, ARGV represents the arguments passed to the EVALSHA command, KEYS is the list of keys passed to the command, redis.call is a function used to run regular Redis command from within the LUA script, and cmsgpack.unpack takes the value and returns an array of the fields it contains.

From there, my LUA script goes to work!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s