Streams in Node.js, part 2: Object Streams

In my first post on streams, I discussed the Readable, Writable and Transform classes and how you override them to create your own sources, sinks and filters.

However, where Node.js streams diverge from more classical models (e.g., from the shell), is Object streams.  Each of the three types of stream objects can work with objects (instead of buffers of bytes) by passing the objectMode parameter set to true into the parent class constructor’s options argument.  From that point on, the stream will deal with individual objects (instead of groups of bytes) as the medium of the stream.

This has a few direct consequences:

  1. Readable objects are expected to call push once per object, and each argument is treated as a new element in the stream.
  2. Writeable objects will receive a single object at a time as the first argument to their _write methods, and the method will be called once for each object in the stream.
  3. Transform objects have the same changes as the both the other two objects.

 

Application: Tax Calculations

At first glance, it may not be obvious why object streams are so useful.  Let me provide a few examples to show why.  For the first example, consider performing tax calculations for a meal in a restaurant.  There are a number of different steps, and the outcome for each step often depends upon the results of another.  The whole thing can get very complex.  Object streams can be used to break things down into manageable pieces.

Let’s simplify a bit and say the steps are:

  1. Apply item-level discounts (e.g., mark the price of a free dessert as $0)
  2. Compute the tax for each item
  3. Compute the subtotal by summing the price of each item
  4. Compute the tax total by summing the tax of each item
  5. Apply check-level discounts (e.g., a 10% discount for poor service)
  6. Add any automatic gratuity for a large party
  7. Compute the grand total by summing the subtotal, tax total, and auto-gratuity

Of course, bear in mind that I’m actually leaving out a lot of detail and subtlety here, but I’m sure you get the idea.

You could, of course, write all this in a single big function, but that would be some pretty complicated code, easy to get wrong, and hard to test.  Instead, let’s consider how you might do the same thing with object streams.

First, let’s say we have a Readable which knows how to read orders from a database.  It’s constructor is given a connection object of some kind, and the order ID.  The _read method, of course, uses these to build an object which represents the order in memory.  This object is then given as an argument to the push method.

Next, let’s say each of the calculation steps above is separated into its own Transform object.  Each one will receive the object created by the Readable, and will modify it my adding on the extra data it’s responsible for.  So, for example, the second transform might look for an items array on the object, and then loop through it adding a taxTotal property with the appropriate computed value for each item.  It would then call its own push method, passing along the primary object for the next Transform.

After having passed from one Transform to the next, the order object created by the Readable would wind up with all the proper computations having been tacked on, piece-by-piece, by each object.  Finally, the object would be passed to a Writable subclass which would store all the new data back into the database.

Now that each step is nicely isolated with a very clear and simple interface (i.e., pass an object, get one back), it’s very easy to test each part of the calculation in isolation, or to add in new steps as needed.

Streams in Node.js, part 1: Basic concepts

When I started with Node.js, I started with the context of a lot of different programming environments from Objective C to C# to Bash.  Each of these has a notion of processing a large data sets by operating on little bits at a time, and I expected to find something similar in Node.  However, given Node’s way of embracing the asynchronous, I’d expected it to be something quite different.

What I found was actually more straight-forward than I’d expected.  In a typical stream metaphor, you have sources which produce data, filters which modify data, and sinks which consume data.  In Node.js, these are represented by three classes from the stream module: Readable, Transform and Writable.  Each of them is very simple to override to create your own, and the result is a very nicely factored set of classes.

Overriding Readable

As the “source” part of the stream metaphor, Readable subclasses are expected to provide data.  Any Readable can have data pushed into it manually by calling the push method.  The addition of new data immediately triggers the appropriate events which makes the data trickle downstream to any listeners.

When making your own Readable, you override the psuedo-hidden _read(size) function.  This is called by the machinery of the stream module whenever it determines that more data is needed from your class.  You then do whatever it is that you have to do to get the data and end by calling the push method to make it available to the underlying stream machinery.

You don’t have to worry about pushing too much data (multiple calls to push are handled gracefully), and when you’re done, you just push null to end the stream.

Here’s a simple Readable (in CoffeeScript) which returns words from a given sentence:

class Source extends Readable
    constructor: (sentence)->
        @words = sentence.split ' '
        @index = 0
    _read: ->
        if @index < @words.length
            @push @words[index]
        else
            @push null

Overriding Writable

The Writable provides the “sink” part of the stream metaphor.  To create one of your own, you only need to override the _write(chunk, encoding, callback) method.  The chunk argument is the data itself (typically a Buffer with some bytes in it).  The encoding argument tells you the encoding of the bytes in the chunk argument if it was translated from a String.  Finally, you are expected to call callback when you’re finished (with an error if something went wrong).

Overriding Writable is about as easy as it gets.  Your _write method will be called whenever new data arrives, and you just need to deal with it as you like.  The only slight complexity is that, depending up on how you set up the stream, your may either get a Buffer, String, or a plain JavaScript object, and you may need to be ready to deal with multiple input types.  Here’s a simple example which accepts any type of data and writes it to the console:

class Sink extends Writable

    _write: (chunk, encoding, callback)->
        if Buffer.isBuffer chunk
            text = chunk.toString encoding
        else if typeof(chunk) is 'string'
            text = chunk
        else
            text = chunk.toString()

        console.log text
        callback()

Overriding Transform

A Transform fits between a source and a sink, and allows you to transform the data in any way you like.  For example, you might have a stream of binary data flowing through a Transform which compresses the data, or you might have a text stream flowing through a Transform which capitalizes all the letters.

Transforms don’t actually have to output data each time they receive data, however.  So, you could have a Transform which breaks up a incoming binary stream into lines of text by buffering enough raw data until a full line is received, and only at that point, emitting the string as a result.  In fact, you could even have a Transform which merely counts the lines, and only emits a single integer when the end of the stream is reached.

Fortunately, creating your own Transform is nearly the same as writing a class which implements both Readable and Writable.  However, in this case instead of overriding the _write(chunk, encoding, callback) method, you override the _transform(chunk, encoding, callback) method.  And, instead of overriding the _read method to gather data in preparation for calling push, you simply call push from within your _transform method.

Here’s a small example of a transform which capitalizes letters:

class Capitalizer extends Transform
    _transform: (chunk, encoding, callback)->
        text = chunk.toString encoding
        text = text.toUpperCase()
        @push text
        callback()

✧✧✧

All this is very interesting, but hardly unique to the Node.js platform. Where things get really interesting is when you start dealing with Object streams. I’ll talk more about those in a future post.

Gotchas: Searching across Redis keys

A teammate introduced me to Redis at a prior job, and ever since, I’ve impressed with it. For those not familiar with it, it’s a NoSQL, in-memory database which stores a variety of data types from simple strings all the way to sorted lists and hashes, which nevertheless has a solid replication and back-up story.

In any case, as you walk through the tutorial, you notice the convention is for keys to be separated by colons, for example: foo:bar:baz. You also see that nearly every command expects to be given a key to work with. If you want to set and then retrieve a value, you might use:

> SET foo:bar:baz "some value to store"
OK
> GET foo:bar:baz
"some value to store"

Great. At this point, you might want to fetch or delete all of the values pertaining to the whole “bar” sub-tree. Perhaps, you’d expect it to work something like this:

> GET foo:bar:*
> DEL foo:bar:*

Well… too bad; it doesn’t work like that. The only command which accepts a wildcard is the KEYS command, and it only returns the keys which match the given pattern: not the data. Without getting into too much detail, there are legitimate performance reasons not to, but it was something of a surprise to me to find out.

However, all is not lost. Redis does support a Hash data structure which allows accessing some or all properties related to a specific key. Along with this data structure are commands both to manipulate individual properties along with the entire set.

Avoiding Us vs. Them in Problem Solving

I can’t count the number of times when I’ve seen two people trying to solve a technical problem where the real conflict is anything but technical.  Often, this starts when one person brings an idea to the table, and is trying to promote it to the group.  Then, someone else proposes a different idea.  Each goes back and forth trying to shoot down the other person’s idea and promote their own.  Perhaps there was a pre-existing rivalry, or some political maneuvering, or private agenda.  Whatever.  Sides form, and before long, the debate isn’t really about a technical solution anymore… it’s about who’s going to “win”.

Of course, no one “wins” such a contest.  It’s stressful to be part of.  It’s stressful to watch. And, worst of all, it kills the collaboration which could have produced a better solution than either “side” was proposing.  Fortunately, I’ve run across an excellent way to avoid the problem nearly 100% of the time.

The Technique in Theory

The key is to get everyone involved in the process to bring multiple ideas to the table.  This seems too simplistic to possibly work, but it does.  It could be that they come with these ideas ahead of time, or that they come up with them in a meeting.  That part doesn’t matter.  What matters is that each person comes up with more than one.  The reasons this works have a lot to do with how it forces people to approach the whole brainstorming / problem-solving process.

The Opened Mind

The first thing that happens if you insist on each person bringing multiple solutions is that it opens up each person’s mind to the possibility of there being multiple solutions.  If you, yourself, brought three different approaches to the table, it’s very hard to imagine there’s only one way to do it.  And, if you’ve already formed favorites among your own ideas, you’ve probably started to develop some criteria / principles to apply to making that judgement.  At that point, it becomes a lot easier to weight the pros & cons of another idea you didn’t originate, and fit it into the proper place among the rest of the ideas.

Breaking Up False Dichotomies

The second useful trait of this approach is that the decision is no longer an either-or decision (a “dichotomy”).  Instead, you, yourself, have already thought up a few different approaches, and your thought partners probably came with a bunch of their own.  With that many potential solutions on the table (even if some of them are obviously very bad) it becomes a lot easier to see the complexities of the problem, and how there are a whole range of ways solve it to a better or worse degree: or even solve parts of it to a better or worse degree in a whole variety of combinations.

Cross-Breeding Solutions

Another handy aspect of having a lot of potential solutions, and seeing the many different aspects of the problem they tackle, is being able to mix and match.  Perhaps you grab a chunk of your solution, throw in a little bit of someone else’s, and then add in a piece which neither of you thought of ahead of time.  And again: but with a different mix.  And again… and again.  Before long, you’ve all moved past the original set of ideas, and have generated a whole bunch of new ones which are each better than anything in the original set.  At this point, the question of “us vs. them” is impossible to even identify clearly.  And… you’ve got a much better solution than anyone generated alone.

Goodwill Deposits

In my last post about “goodwill accounting“, I talked about how fun experiences strengthen relationships.  The process of brainstorming with an eager partner who isn’t defensive, and who is eager to help is extremely exciting and fun.  This makes substantial “deposits” for the future.

 

The Technique in Action

In practice, the technique is dead simple.  Guide your problem solving with three questions:

  • How many ways could we do this?
  • What does a good solution look like?
  • How do these potential solutions stack up?

 

When I find myself in a brainstorming / problem solving context, I’ll often start out by literally saying: “Okay… so how many ways could we do this?”  Then, I’ll start to rattle off, as quickly as I can, every different solution I can imagine: even really obviously bad ones.  Sometimes, I’ve come with a few I already thought of, but very often not.  Others soon join in, tossing out more ideas.  I keep pushing everyone (especially myself) until everyone has given 2–3 solutions.  At this point, we really do want all the possible ways: even the ludicrous or impossible ones.

Once everyone is mute & contemplative, I’ll echo back all the ideas.  At this point, the order is so jumbled, it’s impossible to tell who added which.  Then I’ll ask: “So, what does a good solution look like?”  Now that we’ve had a chance to see some proposed solutions, it’s a lot easier to figure out what we like about some and not others.  This rapidly crystalizes into a set of criteria we can use to rank the various options.

At this point, I start pointing out how various solutions match up with our criteria.  Some get crossed off right away, and others take some discussion and even research.  We might even loop back to re-consider our criteria.  The major difference is that everyone is focused on the problem, and not on the personalities or politics.

✧✧✧

I use this technique a lot.  Early in my career, I was (embarrassingly) often one of the pig-headed people at the table insisting on getting my own way.  Now, I jump in with my “how many ways…” question, and I’m almost never even tempted.  My own relationships with my teammates are better, and I’m often able to use this technique to help smooth over difficulties between other members of my team.  I find it invaluable, and I think you will, too.

Goodwill Accounting on Teams

A while ago, I was working at a very early stage start-up with two friends.  It was just the three of us, but both of them were in San Francisco, while I was living in Seattle.   Every month or so, I’d travel down to stay in San Francisco for a few days so that we could catch up and spend some real-world time together.  It was during this time that I noticed the phenomenon I call “goodwill accounting”.

Naturally, starting a company is stressful for anyone, and we were no different.  The odd thing I noticed was that the longer it was since I was down in SF, the more irritated I’d get with my teammates over little things.  Then, when I would go down to SF, I’d feel a lot closer to my friends again, and it would be a lot easier to deal with the myriad little disagreements you naturally have to sort out when building a company.

What I noticed was that it was really the outside-of-work time which really made the improvement, and it started to occur to me that those times were filled with good feelings: often as a strong contrast to the stress of the work.  We’d go out to eat together, chat about things we share outside of work, laugh, joke around, get out an enjoy the city… all sorts of things.  And those events would fill up a kind of emotional reservoir of friendship and trust between us.  Or, to put it another way, we’d make a “deposit” in a “goodwill account”.

Then, when we got back to work, the natural rigors of the work would start to “withdraw” from that account.  Each disagreement, though perfectly natural and necessary for the business, would count as a “withdrawal”.  So long as we kept making “deposits”, things would generally be fine, and we could sustain the stress of building a business together.  It was only when we let the account drain off that our “checks” would start to “bounce”, and we’d start to have unpleasant disagreements and ill-feeling.

As things went along, I started actively thinking about my relationship with my two friend along these lines quite explicitly, and I’d start to deliberately arrange to make “deposits” in the accounts I had with each of them as much as possible.  Sadly, sometimes the rigors of founding a company would result in “bounced checks”, but it helped a lot to have this metaphor for understanding what was going on and a way to correct the situation.

✧✧✧

The lesson here is to keep in mind your “goodwill account” with the people around you.  This is part of the one-on-one relationship you have with each person you spend time with (whether a colleague, spouse, child, or friend).  Any time you do fun things, share something special, or enjoy time spent together, you make “deposits”.  Any time you disagree, get on each other’s nerves, or cause one another bad feelings, you make a “withdrawal”.  For some relationships, it’s easy to keep the books balanced, but for other relationships (especially work relationships), you have to be more deliberate.  When you notice a relationship starting to go south, consider whether you’re “in the red” with that person, and what you can do to add something back into the account.

Recipes: Chicken Tikka Masala

One of my other hobbies is cooking, and I’ve been collecting recipes for a while now.  As I try new recipes, I re-write the ones I like into my own online cookbook.  Naturally, being a software engineer, I created a little website to keep them organized.  Of course, having the source code on my hard drive, I was just one command-line command away from having a version-controlled repository using git.  From there, it was an easy step to publish it to GitHub, and from there to activate the GitHub Pages feature to make it a public website.

Besides being intensely geeky, this has actually been super helpful!  I’ve found myself at  a friend’s house where we decide we want to cook something together, or I’ve wanted to share a recipe with someone.  This makes it super easy to do so.  So… I’m sharing my latest addition with you!

http://andrewminer.github.io/recipes/chicken-tikka-masala

I didn’t actually make up the recipe (you can see the source at the bottom), but I have adjusted it a bit to suit my own taste, and I re-arranged it to fit into my cookbook’s design.  Enjoy!

Building a Board & Weekly Progress Reports

In starting out a self-funded company, I didn’t just get a board by virtue of taking on an investor, but I still found that I really wanted a way to get feedback and advice from more experienced people I know and trust.

To that end, I started asking friends and colleagues if they’d be willing to receive a weekly progress report from me, and occasionally field a few questions. Having worked with a number of generous and savvy people, I quickly got about two dozen people on my list.

Starting from my first week “on the job”, I sent an email to this group roughly¹ once a week detailing how things were going.  This email has three broad sections:

  • A narrative description of the big issue(s) I dealt with that week. This might be describing some engineering issue, a new feature, or how some recent customer interviews went.  If applicable, it contained pictures and/or videos of recent progress.
  • A bullet points list in three sections: what I accomplished, what I’m still working on, and what’s next
  • An evaluation of my own emotional state (optimistic? disheartened? frustrated? elated?) and a brief outline of why and how I got that way.

Including the center section was especially helpful as it forced me to get very clear about past/present/future each week.  It also gave me an impetus to really complete things so I can include them in the accomplishments section, and thereby gives me a sense of accountability.

When I sent these emails out, I almost always got back a few responses which range from actual solutions to issues I was having, to simple words of encouragement (which were still much appreciated!).


¹ except when nothing much had happened, and I was just working through the same plan as the previous week

Redis: Storing time-series data in a sorted set

Along with a regular Set data type, Redis also has a Sorted Set data type. Like a regular set, you can add unique items to the set using the ZADD and fetch them back again using various other commands.

The main difference with a sorted set is that when you add an item, you must also provide a “score” which is used to determine the item’s order in the set. The score can be any integer, and can be completely unrelated to both the key and value being stored.

It is important to note that a sorted set is still a set! If you add identical values multiple times, the item will only appear once using the last score assigned. For this reason, you can’t store arbitrary data in a set unless you can guarantee each value is unique (i.e., it is either a unique identifier or is a data blob containing a unique identifier).

In my case, I wanted to store time-series data in a sorted set, but, since I can be pretty sure that the values won’t be unique, I can’t use a plain sorted set (i.e., if the value on Monday is 10, and the value on Wednesday is 10, then the Monday value gets clobbered since a set only stores unique values). However, I did figure out a handy work-around using Redis’s built-in scripting capabilities.

Why use a sorted set?

The very handy thing about sorted sets is that you can use a single command to fetch all the values within a certain time frame. Using ZADD, you store a given value in a key with a score which represents its sort order. For time-series data, the key can be anything you like, the score should be the time of the event (as an integer), and the value should contain the data.

Once the data is loaded, the ZRANGEBYSCORE command can fetch a sorted list of values whose scores fall within a certain range.

But what about uniqueness?

The problem is that a sorted set is still a set, and it won’t sort non-unique values (even if they have different scores). Let’s take an example. Suppose you’re measure the temperature outside your home each day. Let’s further suppose that on Monday, the temperature was 52°. Your command would be:

ZADD temp:house 1421308800000 52

On Tuesday, perhaps it was a bit warmer:

ZADD temp:house 1421395200000 65

But, on Wednesday, it got colder again:

ZADD temp:house 1421481600000 52

Now, we’ve got problems. Since uniqueness only pertains to the value, regardless of the score, you just changed the timestamp of 52° to Wednesday, thus deleting the data recorded on Monday.

How to create uniqueness

To work around this issue of uniqueness, I decided to add the timestamp to the data as well as the score. One could certainly do this on the client side (e.g., by encoding both temperature and timestamp in JSON), but I wanted to be able to access that data from within Redis’s scripting environment for other reasons.

It turns out, the Redis scripting language was also the key to encoding the data as I needed. To start, I used the LOAD command to store a LUA script:

local value = cmsgpack.pack({ARGV[2], ARGV[1]})
redis.call('zadd', KEYS[1], ARGV[1], value)
return redis.status_reply('ok')

The result of the command is to load the script and return an identifier you can later use to call it (which happens to be the SHA of the script). This uses the cmsgpack built-in library to create a very efficient binary-encoded representation of the two pieces of data.

Now, instead of just using ZADD to store my values, I invoke my script with the EVALSHA command:

EVALSHA <sha> 1 temp:house 1421481600000 52

Now, instead of storing 52° as my value, I’ve got a binary record combines 52 and 1421481600000, and therefore is unique to that moment in time.

Getting the data back again

In my case, I wanted to use the data by performing calculations on it in another LUA script. Without getting into too much detail, I basically just used the LOAD and EVALSHA commands to execute a LUA script containing:

for k, v in ipairs(redis.call('zrangebyscore', KEYS[1], ARGV[1], ARGV[2])) do
    local value = tonumber(cmsgpack.unpack(v)[1])

In this example, ARGV represents the arguments passed to the EVALSHA command, KEYS is the list of keys passed to the command, redis.call is a function used to run regular Redis command from within the LUA script, and cmsgpack.unpack takes the value and returns an array of the fields it contains.

From there, my LUA script goes to work!

Gathering early-stage product feedback

Early in the process of figuring out my product, I conducted a series of customer interviews. I started the business to build a product I personally wanted, so I had a pretty good idea about how I would use it, but I really wanted to know whether my experience was common with anyone else.

To that end, I did some brainstorming on the types of people who would be at all likely to use my product (not necessarily just those I was aiming for to start with), and then went through my LinkedIn connections for colleagues who fit into those categories. From an initial list of close to 30 people, I narrowed it down to six people who gave me a very widely diverse group in terms of professional training, day-to-day responsibilities, and industry.

Next, I wrote down a set of questions I wanted to ask each person. The overall structure was to start by asking questions to see if the person felt the pain my product attempts to solve, proceed into questions which try to discover how they solve it today, and finally to find out how well they like that solution.

At this point, I stopped with my questions, and give the pitch for my product. At the time, I didn’t have a live demo or even mock-ups, but I definitely would have used them if I had them. I let the interviewee ask as many questions as they liked, and then proceeded to the next batch of questions:

  • Does my product sound like something they could use?
  • What would be the biggest obstacle to adopting it?
  • What could I do to make adoption as easy as possible?
  • How much would you expect to have to pay for this product?

The information I received from these interviews was highly informative, and while it didn’t change my core vision for my product, it definitely changed a lot of my thinking about the details, and how to get it to my customers.

Building my own Workbench

When I first set up my wood-shop, I was using an old door thrown across two sawhorses as my work bench:

IMG_0829.JPG

It was wobbly, the door tended to slide off the horses, and you really couldn’t attach a vise (or even a clamp) to it to hold down a piece of work.  So, I decided I needed an upgrade.  After drawing up some plans, I got to work cutting out the initial stock for the legs, rails, and other components for the frame.

IMG_0800.JPG

Then came joining all the pieces together.  As a challenge for myself, I decided I wouldn’t use any metal fasteners anywhere in the whole table.  However, I’m still more-or-less a novice at wood working, so I decided to go with a few simple joints, and use dowels as a surrogate for metal fasteners.

IMG_0803.JPG

For the table top, I picked some nice ¾” plywood I had from another project, and cut it to be just a little larger than the frame (to give myself somewhere to place clamps).  To fasten this to the frame, I cut holes at a 45° angle, and drove dowels in with some glue.  Since each set of dowels were driven in at 90° to one another, this prevents any movement of the surface in any direction.

IMG_0805.JPG

I also had some beautiful oak flooring left over from a different project, so I decided to put it to good use here.  The first step was just sorting through all the scraps to make sure I had enough pieces of sufficient size to actually cover the entire table.

IMG_0810.JPG

At this point, they’re only clamped into place to test the fit, so the next step was to actually glue everything down and trim the edges to match the plywood surface underneath.

IMG_0813.JPG

Next, I decided I wanted a super durable finish which would protect against dropped tools, heavy pieces of work, clamp marks, and the other abuses a workbench would suffer.  So, I did a bunch of research and decided to go with a liquid epoxy finish.

The stuff is fiendishly difficult to work with.  Following some advice I found in the internet, I allowed the first coat to drip over the sides of the table-top as I spread it out… big mistake.  Even with a drop-cloth, it made a huge mess, and was both slippery and sticky underfoot.  Even worse, I didn’t make enough in my first batch, so I wound up with lots of little “pockets” as it cooled and the surface tension pulled it away from those areas.

IMG_0818.JPG

It’s hard to actually see the effect from a distance (thankfully!), but here’s a closer view so you can really see what I’m talking about:

IMG_0819.JPG

For the second coat, I disregarded the earlier advice, and built up a little dam around the edges using some very sturdy duct tape.

IMG_0820.JPG

With the second coat on, things looked a lot better!  Still a few “pockets” and a few areas where the grain of the wood allowed air bubbles to continually be re-introduced to the epoxy, but these are very minor in the overall surface.

IMG_0823.JPG

At this point, there was only applying some polyurethane to the frame to be done.

IMG_0828.JPG

And, then, we have the final result.

IMG_0834.JPG