The Interview Schedule

This is part 3 of a series on interviewing.  Check here for part 1 on setting up an interviewing team.

✧✧✧

It’s important that every candidate be treated like you’d treat a VIP customer.  Even if you don’t wind up making an offer, they will have a deeper interaction with your company and your staff than the vast majority of the general public, and you want them to go away wishing they’d gotten an offer.  They are going to tell people about their experience with your company, and you want the story they tell to be about how awesome a place it seems to be, rather than how they dodged a bullet by not getting an offer!

There are a long set of interactions which happen prior to a candidate coming in for an interview loop which I’ll talk about in other posts.  Here, I want to focus on the in-house interview.  For that, he first step of that is to make sure the candidate has a full  schedule of their visit.  This should be delivered to them along with all the initial travel arrangements.  Ideally, it should include:

  • an initial meet & greet / tour segment
  • a list of each person they’re going to meet along with their role and email
  • a schedule of when important events are happening throughout the day (e.g., each interview, lunch, happy hour, etc.)

The first part, the meet & greet, serves two important purposes.  First, it’s important to always bear in mind that most people find interviews extremely nerve-wracking.  A short tour of the office, and a chance to chit-chat with a few people helps the candidate unwind a bit.  Second, it gives you a time buffer to absorb any unexpected delays in the candidate’s travels.  Whether it’s traffic, parking, a flat tire, a late subway train… whatever.  It’s easy enough to just cut this period a bit short so the candidate can get started on time, and you can avoid messing up the rest of the day’s schedule.

The second point, giving a list of interviewers, deserves a bit more explanation.  For a mediocre candidate, this information won’t matter.  However, for an exceptional candidate, it’s an opportunity for them to show their enthusiasm and their diligence.  Really exceptional candidates will do some homework on their interviewers, and will often have some interesting question, anecdote, or topic to discuss with each interviewer.  Such candidates will also generally avail themselves of the opportunity to individually follow up with each interviewer to thank them for their time.

Finally, providing a schedule allows the candidate to mentally (and perhaps physically) prepare themselves for the expected duration and expectations for the full day.  I’ve had a number of candidates comment to me over the years at how unexpectedly rigorous / lengthy an interview was.  I’ve also had experiences as a candidate where I wasn’t able to make some other commitment because the full extent of my time commitment wasn’t clear.

✧✧✧

At the end of the day, you want your candidate to walk away thinking well of your company, the people they met, and how they were treated as a guest at your office.  One of the easiest things you can do to ensure that happens is to avoid surprising them, and by giving them a chance to do their homework up front.  And, you’ll be pleasantly surprised by your best candidates when they actually do.

Being bisexual in a straight marriage

I was 37 years old when I started to call myself bisexual.  In hindsight, it’s pretty clear I was always bisexual, but it took a very long time for me to put a concrete name to it.

At least early on, it had a lot to do with where and when I grew up.  Calling someone a “fag” was virtually a daily insult among my peers.  HIV/AIDS was still very recent, very scary, and very much a “gay” disease.  And, most importantly, none of the recent acceptance of homosexuality had even begun to surface in my world.  Life was super, super hard for gay people back then, and if you had those impulses you sure as hell didn’t act on them if you could manage not to.  Plus, I was definitely interested in women, so I couldn’t be gay, right?

The second reason it took so long is my wife, and for all the very best reasons.  She is the most extraordinary person I’ve ever met: of any gender.  She is the most joyful, nurturing, and loving person I’ve ever met.  She’s wonderfully smart and dedicated.  Most of all, she’s honest, calm, patient, and rational.  I cannot even imagine finding another person I’d be more completely content to share my life with.  It’s been over 20 years so far, and, happily, I see no change in sight.  And, of course, being a male who is throughly in love with a female means you’re straight, right?

Nope.

What does it mean to be bi?

I can only really answer for myself, but I think of it this way.  When my physical / emotional gut reaction considers whether a person is attractive or not, their gender simply doesn’t matter very much.  To me, seeing a person who is confident, cheerful, well-groomed, and reasonably fit immediately puts them in the “attractive” category.  If you and I were just sitting in a coffee shop people-watching together, I’d find more women attractive than men—just by the numbers—but I suspect that’s because women tend to look after their appearance more carefully.  Naturally, the exact physical traits I find attractive for each gender are different, but so long as the general traits I mentioned are present, the gender isn’t especially important.

When it comes to having sex, the same indifference to gender applies.  I find the prospect of being with a man and finding mutual pleasure as appealing, exciting, and arousing as being with a woman.  I also find the idea of being the passive partner in sex as appealing, exciting, and arousing as being the active one: regardless of the gender of my partner.  To use the term from gay circles, I just consider myself to be a little extra versatile.

What doesn’t it mean to be bi?

Most importantly, being bisexual does not mean that I want to have multiple sexual partners (polyamorous).  The interest / capability / potentiality of having a male partner is certainly there, and if I weren’t already in a relationship, I would be very much open to it.  However, I am extremely happy in my current relationship, and I have no desire to change it.

Being bisexual also does not mean that I don’t think of myself as male.  I definitely think of myself as male, and am not at all personally attracted by the idea of being transgender, transsexual, transvestite, etc.  To be clear, I don’t have a strong opinion on those things, and to be perfectly honest, I know very little about them.  I just know they don’t appeal to me, personally.

What difference does it make?

It… doesn’t?  I think?  I’m a man in a happy, life-long, monogamous relationship with a woman.  Therefore, it’s easy to say: “Are you even bisexual?  What difference does it make, anyway?”  Believe me, I’ve been asking myself those questions over and over for years now.  As I’ve pondered, I’ve arrived at a few important answers.

Definitely bisexual

Even though I’ve found joy in a traditionally straight relationship, the way I react sexually has always been a part of me, and is still an undeniable part of how I experience other people.  I very definitely find both men and women sexually attractive, and I definitely could be quite happy in a serious long-term relationship with either.

Know thyself

There’s a deep satisfaction in being truly honest with oneself, and in understanding the genesis of one’s emotions.  Understanding, accepting, and talking about being bisexual releases uncertainty, anxiety, and tension I didn’t even notice I’d been carrying my whole life.

Relating to others

Identifying myself as bisexual, and coming out to other people changes how those relationships work.  Fortunately for me, I haven’t yet had a negative reaction, but I also haven’t come out to very many people.  In those cases where I have come out to someone, the experience has generally deepened the connection and helped the other person be more sincere, honest, and open with me (especially with other LGBT folks).

Freedom to explore

The last big difference this has made to me is that it removes a whole set of inhibitions about sex, sexuality, and attraction to other people.  It’s a relief to feel like I can think, write, and talk about these experiences.  And, within the context of my existing relationship, I feel empowered to explore possibilities I wouldn’t have been open to before.

Why tell anyone about it?

Every single person I’ve talked to about this blog post has asked this question, so I want to try to explain why I don’t just do the easy thing and keep it to myself.

It has been over two years since I first “came out” to my wife, my son, my parents, and—most importantly—to myself.  Since then, I’ve been slowly coming out to family and close friends.  It’s been an astonishing sense of relief.  My whole life, I’ve known this truth about myself was there, but I’ve pushed it away.  At every hint, and with every impulse, I’ve felt confused, embarrassed, or ashamed, and then promptly buried those feelings.  Again, and again, and again.  It felt way easier to just ignore that side of my sexuality, and why not?  I’m a guy in love with a gal; why make things complicated?

Why?  Because it’s a lie to say that I’m straight, and because it’s deeply distressing to continually lie to yourself.  The really hard part for people in my situation is that our sexuality is, for all practical purposes, invisible.  To me at least, that makes this life-long journey to understand and accept myself feel incomplete.  Knowing that I’m bisexual and doing nothing feels exactly the same as the hiding and repressing I’ve always lived with—and I’m done with that.

So, first and foremost, this essay is me permanently rejecting the closet in the only way I can.  Being frankly, openly, explicitly bisexual rejects the seemingly easy path of hiding in plain sight, and forever shatters any possibility of continuing to repress that side of myself.

Second, I’m seeking to find community with others who have shared this or similar experiences.  I am grateful to have amazingly supportive friends and family—straight and gay.  But they, nevertheless, can’t entirely relate to my experiences.  I would love to connect with other people who can.

Third, I’m hoping that sharing my experience of coming out to myself will prove useful to others.  Specifically, I hope this helps people who, like me, have struggled with their sexuality for many years before figuring things out: most especially to other bisexuals (closeted or not) who have struggled as I have.

Finally, I also hope this helps everyone else who doesn’t really know what bisexuality is, and who thinks they don’t know someone who is.  You probably do, and didn’t even know it.

Short-Circuit Statements

What is a short-circuit statement? In this case, I’m not talking about the language feature related to boolean comparisons, but instead I’m talking about statements which cause a method to return as soon as some conclusion has definitely been reached. Here’s an example from a very simple compareTo method in Java:

public int compareTo(Foobar that) {
  if (that == null) return -1;
  if (this._value < that._value) return -1;
  if (this._value > that._value) return +1;
  return 0;
}

In this example, each line (except the last one) would qualify as a short circuit statement; that is, they all return as soon as a definite answer is determined, thus leaving some code in the method un-run.  If we weren’t using short circuit statements, then the code may look like this:

public int compareTo(Foobar that) {
  int result = 0;
  if (that == null) {
    if (this._value == that._value) {
      if (this._value < that._value) {
        result = -1;
      } else {
        result = +1;
      }
    }
  }
  return result;
}

For something this simple, there isn’t a huge difference in the complexity between the two functions, but it still demonstrates the point.  Many people ardently recommend always having a single return statement in any function, and would strongly advocate using the second example over the first.  However, I would argue that the first is superior because it better respects the unit economy of the reader.

Short circuit statements allow a reader to take certain facts for granted for the remainder of a method.  In the first example, after reading the first line of the method, the reader knows that they will never have to worry about the that variable having a null value for the rest of the method.  In the second example, the reader will have to carry the context of whether he is still reading code within the first if statement.  Every short circuit statement removes one item from the set of things one must consider while reading the remainder of the method.

Naturally, this example is pretty simplistic, and it’s a stretch to claim that either method is more complicated than the other.  However, consider if this weren’t a simple example. If this were a formula to compute an amortization table for a home mortgage, then the first few lines may look like this:

public AmortizationSchedule computeSchedule(
int principle, float rate, int term) {
  if (principle <= 0) throw new IllegalArgumentException(
      "Principle must be greater than zero");
  if (rate < 0.0) throw new IllegalArgumentException(
      "Rate cannot be less than zero");
  if (term <= 0) throw new IllegalArgumentException(
      "Term must be greater than zero");
  // Here is the 20 or so lines to compute the schedule...
}

In this case, there may be a substantial amount of code following this brief preamble, and none of it has to consider what may happen if these invariants are broken. This greatly simplifies the act of writing the code, the logic of the code itself, and the process of maintaining the code later on.

What flipped my understanding of “white privilege”

My White Friend Asked Me on Facebook to Explain White Privilege. I Decided to Be Honest
by Lori Lakin Hutcherson

✧✧✧

I read this all the way to the end, and found it extremely well-written and enlightening. Even if you don’t want to read the whole thing, you can get a lot of value from just the opening and closing few paragraphs.

I’ve always been troubled by the phrase “white privilege” for exactly the reasons stated in the opening of the piece by the author’s friend.  And her answer to him resonates with me.  To paraphrase her excellent formulation: white privilege isn’t a positive benefit that white people receive overtly, some kind of undeserved handout, or something white people need to feel guilty about.  Instead, it’s a lack of the kind of hostility, skepticism, and type-casting directed exclusively at people who are different by not being white.

Another illuminating point comes out as she enumerates a small sample of racist experiences she’s personally encountered.  She’s not saying that all white people are racist… nor even that very many of them are.  But it’s common enough that the rare instances of it are relentless, and that makes every interaction with white people tinged with a subtle fear of yet another nasty encounter.

It’s fantastic how far we have come at eliminating the most horrific forms of institutional and official racism (largely due to the heroic efforts of black men and women of generations past).  But, accounts like Ms. Hutcherson’s make it clear that it’s not completely gone by a long shot.  I think this is an essential read to understanding the character of the racism that yet remains in our society.

Finally, her advice to people who want to fight this remaining racism also resonates with me.  It’s not enough to merely not be racist to help with the fight.  Good on you if you aren’t, but it’s not enough to move things forward.  It requires that special effort to keep an eye out for the little digs, skeptical remarks, and subtle insults and to challenge them.  Sometimes they’re not meant as hostile remarks, and the offender only needs a polite reminder.  Sometimes it’s not so benign.  But, especially if you happen to be white and see another white person doing these things, be the one to have the courage to call bullshit.  That’s what helps and can make a difference.

Breaking Out of Consensus Deadlock

Back when I was working in a very small start-up with only two other people, we would often have a great deal of difficulty coming to decisions.  We would have seemingly endless meetings where each person would express their viewpoint, then someone would rebut that viewpoint, then the first person would express it again in slightly different words.  On and on, and sometimes about things which wound up being pretty trivial.  However, since there were only three of us, it felt like we ought be able to come up with some kind of consensus, but seldom was it that easy.

We had fallen into the trap of consensus deadlock.

I think of consensus deadlock as being any time a group of people gets stuck on making a decision because:

  1. there’s not enough data to make a clear decision obvious
  2. everyone feels that a unanimous decision is necessary
  3. no one is willing to ceed the decision to someone else
  4. there’s no clear owner for the decision

Naturally, there are a bunch of techniques to break out of the deadlock, and they all involve removing one of the things in the list.  The best thing to do, when it’s possible, is to figure out what information would tip the scales, and then go get that data (#1).  Of course, that’s not always feasible.  In that case, you might stop to consider how strongly you each feel about the issue, and choose to go with the approach someone feels most strongly about (#2).  You might also try thinking over the reversibility of the decision, and allowing someone else to give their idea a shot (#3).

Those are all good ways to break the deadlock, but perhaps the best way is to assign the decision to someone, and then hold them accountable for making a sound decision.  Of course, that doesn’t mean letting them simply go do whatever they want.  In order for this to end well, owning a decision involves a lot of responsibilities.

The decision owner’s job is really to pull everything together, and then follow the data where it leads.  First, the owner needs to ensure that all the available data has been gathered and is available for consideration by the team.  Depending upon the decision, this may be more easily said than done.  Second, the owner needs to collect feedback from everyone affected by the decision and give full consideration to each point of view.  In particular, this means that they should be able to represent each point of view as well as they represent their own.  Finally, the owner must be able to weigh the pros and cons of all those positions, measure them against the available data, and then present a clearly reasoned rationale for choosing a particular course along with its strengths and weaknesses.

The job of each other team member is to help the decision owner reach a good decision.  This may be offering up what data they have (all of it… not just what leads to a certain conclusion they support).  This might be to pose questions which are critical to understanding the problem.  This might be to help brainstorm various possible options and their pros and cons.  At all times, though, each team member needs to remain supportive of the process and the decision owner’s authority over that particular decision.

✧✧✧

The only way this technique works is for every member of the team to act in a mature and responsible manner.  The decision owner has to be able to step back from a pet idea to consider all possibilities, and the rest of the team has to trust that the decision owner is honestly considering all points of view.  If, at any point, someone on the team feels this isn’t the case, a discussion becomes necessary about whether process is being followed faithfully—not about the particular details of the decision at hand.  That makes it much easier for everyone to start from a place of agreement, and to get back on track.

Scales of Preference

I’m constantly in situations where I’m working with another person to try to make some decision.  This could be figuring out where to go out to eat with my wife, or which vendor to go with for a major purchase at work.  Either which way, when there are multiple people involved in the decision, it’s easy to find yourself at an impasse where everyone has a different preference.

One of the tricks I’ve learned for making these situations easier is to routinely give an indication of how strongly I feel about any particular option.  Of course, that can range from absolute certainty that something is a terrible idea to a positive and unshakable conviction that it’s the best thing ever.  So, when talking about how I feel about a certain option, I try to use language which gives a clear indication of where I am on the scale.  For example:

  • I’m vehemently opposed to __________.
  • I completely disagree with __________.
  • I don’t think __________ is right.
  • I’d prefer not to __________.
  • I’m not convinced that __________ is a good idea.
  • I’m not convinced that __________ is the best option.
  • I don’t really have a preference about __________.
  • I’m slightly inclined toward __________.
  • I think __________ is the best option on the table.
  • I think __________ is a good plan.
  • I really like the idea of doing __________.
  • I’m super excited about going with __________!
  • I think __________ is the perfect choice!

As you can tell, these are arranged to scale from strong disagreement to strong agreement.  And, of course, this is barely more than a starting point for the kind of language you can use to place yourself on the scale.  While there are certainly a whole lot of other excellent options, there are a few things these particular ones all have in common:

  1. They express my own opinion of the idea without judging the person who suggested the idea by starting with “I…”.  This makes it clear that I’m only expressing my own opinion: not passing judgement on someone else.
  2. They provide a wide range of shading on how much you like or don’t like the idea: not merely whether you’re in agreement or not.

These are both incredibly important when trying to come to a decision with other people.  The first one attempts to ensure that the conversation says friendly.  It’s much harder to come to a win-win decision with another person when you’ve managed to get them pissed off at you.  The second allows you to each gauge whether there’s a large disparity in passion.  If one person is strongly in favor of an idea, while another person is mildly against it, the best course of action may well be to just go with it (so long as the decision is sufficiently reversible).  If one person is violently opposed, when the other person is so-so… it’s almost certainly best to give it a pass.

✧✧✧

I first thought of using this technique when I was in a start-up with just two other people.  It was incredibly helpful in unravelling decisions where we didn’t have anywhere near enough data for any of us to really convince the others objectively.  In those cases, it was often just each of us with our own intuition about how a certain course would turn out, and this tool made it a lot easier for us to express how strongly that “gut” feeling was.

Since then, I’ve used it quite a bit on software engineering teams when trying to figure out exactly how best to build various features or solve certain technical challenges.  Again, these were often cases where clear, objective answers were hard to come by (e.g., what would users think about X change to a feature?).  Using this technique allowed each person to weigh their own ideas against the others in a productive way.

Canvas vs. SVG

When I began my start-up, I knew the product was going to focus heavily on drawing high-quality graphs, so I spent quite a while looking at the various options.

 

Server Side Rendering

The first option, and the one that’s been around the longest, is to do all the rendering on the server and then send the resulting images back to the client.  This is essentially what the Google Maps did (though they’ve added more and more client-rendered elements over the years).  To do this, you’ll need some kind of image rendering environment on the server (e.g., Java2D), and, of course, a server capable of serving up the images thus produced.

I decided to skip this option because it adds latency and dramatically cuts down on interactivity and the ability to animation transitions.  Both were very important to me, so this solution was clearly not a good fit.

 

Canvas

The second option was to use HTML5’s Canvas element.  This allows you to specify a region of your webpage as a freely drawn canvas which allows all the typical 2D drawing operations.  The result is a raster image which is drawn completely using JavaScript operations.

While Canvas has the advantages of giving a lot more control on the client-side, you’re still stuck with a raster image which needs to be drawn from scratch for each change.  You also lose the structure of the scene (since it’s all reduced to pixels), and therefore lose the functionality provided in the DOM (e.g., CSS and event handlers).

 

SVG

The final option I considered (and therefore the one I chose) was using Scalable Vector Graphics (SVG).  SVG is a mark-up language, much like HTML, which is specifically designed to represent vector graphics.  You pretty much all the same primitives as with Canvas, except these are all represented as elements in the DOM, and remain accessible via JavaScript even after everything is rendered.  In particular, both CSS and JavaScript can be applied to the elements of an SVG document, making it much more suitable for the kind of interactive graphics I had in mind.

As an added incentive for using SVG, there is an excellent library called Data Driven Documents (D3) which provides an amazing resource for manipulating SVG documents with interactivity, animations, and a lot more.  More than anything else, the existence of D3 decided me on using SVG for my custom graphic needs.

Reversibility and Fast Decision Making

There are a number of different circumstances when it’s important to distinguish between reversible and irreversible decisions.  First, though may seem pretty obvious, let me be clear by what I mean by each of those.

A decision is only irreversible if there’s absolutely no way to take it back again.  The somewhat clichéd example is that you can’t un-ring a bell.  However, there are plenty of more consequential decisions which are also quite permanent.  Drive drunk?  You may not have an opportunity to repent of that decision.  Unprotected sex?  You might have no way to undo the damage you’ve done to your body.

Fortunately, most decisions aren’t completely irreversible: although they may be more or less difficult / expensive to fix.  Get married to the wrong person?  That could have massive consequences, but it’s possible to fix.  Paint your house a color which turns out to be ugly?  Probably less difficult to unwind, but still not consequence-free.  Get a bad haircut?  You’re out a little bit of money, but it will fix itself in time.

And, of course, there are a huge number of decisions we make every day which are completely and readily reversible.  Not enjoying the channel you’re watching on TV?  Just flip to the next one.

✧✧✧

One way to use this distinction is to guide you in how much effort to spend ahead of time trying to make a certain decision.  When I find myself faced with a decision, part of the process is to make exactly this evaluation.  If it’s a pretty reversible decision, I won’t let myself get too caught up in making it.  I choose the first option which seems pretty reasonable, and I move along.  On the other hand, I’ll spend quite a bit of time evaluating houses before buying one, and even more when considering changing jobs.

Another useful way to use this distinction is when you’re responsible for guiding another person (e.g., as a parent, guardian, coach, manager, executive, etc.).  If a decision is reversible, and your other person is set on making a choice you’re skeptical of, perhaps you let them go ahead anyway.  If they’re wrong, then they’ll learn something from the attempt in much deeper way they might have done.  If they’re right, then you’ve learned something instead.  On the other hand, for a more irreversible decision, you may decide you need to intervene (e.g., a toddler climbing upon on a coffee table vs. running out into the street).

Finally, this distinction can also be useful when trying to make a decision as a group.  As it often happens, different people will offer up differing suggestions on how to proceed, and the best answer isn’t always clear.  In such cases, it can be very helpful to gauge how reversible the decision is.  When it’s pretty easy to back away from, it’s fine to just pick a solution (probably from the most insistent person in the group), and see how it works out.  On the other hand, if it’s a fairly irreversible decision, you may all way to slow down, gather more data, and try to be a lot more careful.

Streams in Node.js, part 2: Object Streams

In my first post on streams, I discussed the Readable, Writable and Transform classes and how you override them to create your own sources, sinks and filters.

However, where Node.js streams diverge from more classical models (e.g., from the shell), is Object streams.  Each of the three types of stream objects can work with objects (instead of buffers of bytes) by passing the objectMode parameter set to true into the parent class constructor’s options argument.  From that point on, the stream will deal with individual objects (instead of groups of bytes) as the medium of the stream.

This has a few direct consequences:

  1. Readable objects are expected to call push once per object, and each argument is treated as a new element in the stream.
  2. Writeable objects will receive a single object at a time as the first argument to their _write methods, and the method will be called once for each object in the stream.
  3. Transform objects have the same changes as the both the other two objects.

 

Application: Tax Calculations

At first glance, it may not be obvious why object streams are so useful.  Let me provide a few examples to show why.  For the first example, consider performing tax calculations for a meal in a restaurant.  There are a number of different steps, and the outcome for each step often depends upon the results of another.  The whole thing can get very complex.  Object streams can be used to break things down into manageable pieces.

Let’s simplify a bit and say the steps are:

  1. Apply item-level discounts (e.g., mark the price of a free dessert as $0)
  2. Compute the tax for each item
  3. Compute the subtotal by summing the price of each item
  4. Compute the tax total by summing the tax of each item
  5. Apply check-level discounts (e.g., a 10% discount for poor service)
  6. Add any automatic gratuity for a large party
  7. Compute the grand total by summing the subtotal, tax total, and auto-gratuity

Of course, bear in mind that I’m actually leaving out a lot of detail and subtlety here, but I’m sure you get the idea.

You could, of course, write all this in a single big function, but that would be some pretty complicated code, easy to get wrong, and hard to test.  Instead, let’s consider how you might do the same thing with object streams.

First, let’s say we have a Readable which knows how to read orders from a database.  It’s constructor is given a connection object of some kind, and the order ID.  The _read method, of course, uses these to build an object which represents the order in memory.  This object is then given as an argument to the push method.

Next, let’s say each of the calculation steps above is separated into its own Transform object.  Each one will receive the object created by the Readable, and will modify it my adding on the extra data it’s responsible for.  So, for example, the second transform might look for an items array on the object, and then loop through it adding a taxTotal property with the appropriate computed value for each item.  It would then call its own push method, passing along the primary object for the next Transform.

After having passed from one Transform to the next, the order object created by the Readable would wind up with all the proper computations having been tacked on, piece-by-piece, by each object.  Finally, the object would be passed to a Writable subclass which would store all the new data back into the database.

Now that each step is nicely isolated with a very clear and simple interface (i.e., pass an object, get one back), it’s very easy to test each part of the calculation in isolation, or to add in new steps as needed.

Streams in Node.js, part 1: Basic concepts

When I started with Node.js, I started with the context of a lot of different programming environments from Objective C to C# to Bash.  Each of these has a notion of processing a large data sets by operating on little bits at a time, and I expected to find something similar in Node.  However, given Node’s way of embracing the asynchronous, I’d expected it to be something quite different.

What I found was actually more straight-forward than I’d expected.  In a typical stream metaphor, you have sources which produce data, filters which modify data, and sinks which consume data.  In Node.js, these are represented by three classes from the stream module: Readable, Transform and Writable.  Each of them is very simple to override to create your own, and the result is a very nicely factored set of classes.

Overriding Readable

As the “source” part of the stream metaphor, Readable subclasses are expected to provide data.  Any Readable can have data pushed into it manually by calling the push method.  The addition of new data immediately triggers the appropriate events which makes the data trickle downstream to any listeners.

When making your own Readable, you override the psuedo-hidden _read(size) function.  This is called by the machinery of the stream module whenever it determines that more data is needed from your class.  You then do whatever it is that you have to do to get the data and end by calling the push method to make it available to the underlying stream machinery.

You don’t have to worry about pushing too much data (multiple calls to push are handled gracefully), and when you’re done, you just push null to end the stream.

Here’s a simple Readable (in CoffeeScript) which returns words from a given sentence:

class Source extends Readable
    constructor: (sentence)->
        @words = sentence.split ' '
        @index = 0
    _read: ->
        if @index < @words.length
            @push @words[index]
        else
            @push null

Overriding Writable

The Writable provides the “sink” part of the stream metaphor.  To create one of your own, you only need to override the _write(chunk, encoding, callback) method.  The chunk argument is the data itself (typically a Buffer with some bytes in it).  The encoding argument tells you the encoding of the bytes in the chunk argument if it was translated from a String.  Finally, you are expected to call callback when you’re finished (with an error if something went wrong).

Overriding Writable is about as easy as it gets.  Your _write method will be called whenever new data arrives, and you just need to deal with it as you like.  The only slight complexity is that, depending up on how you set up the stream, your may either get a Buffer, String, or a plain JavaScript object, and you may need to be ready to deal with multiple input types.  Here’s a simple example which accepts any type of data and writes it to the console:

class Sink extends Writable

    _write: (chunk, encoding, callback)->
        if Buffer.isBuffer chunk
            text = chunk.toString encoding
        else if typeof(chunk) is 'string'
            text = chunk
        else
            text = chunk.toString()

        console.log text
        callback()

Overriding Transform

A Transform fits between a source and a sink, and allows you to transform the data in any way you like.  For example, you might have a stream of binary data flowing through a Transform which compresses the data, or you might have a text stream flowing through a Transform which capitalizes all the letters.

Transforms don’t actually have to output data each time they receive data, however.  So, you could have a Transform which breaks up a incoming binary stream into lines of text by buffering enough raw data until a full line is received, and only at that point, emitting the string as a result.  In fact, you could even have a Transform which merely counts the lines, and only emits a single integer when the end of the stream is reached.

Fortunately, creating your own Transform is nearly the same as writing a class which implements both Readable and Writable.  However, in this case instead of overriding the _write(chunk, encoding, callback) method, you override the _transform(chunk, encoding, callback) method.  And, instead of overriding the _read method to gather data in preparation for calling push, you simply call push from within your _transform method.

Here’s a small example of a transform which capitalizes letters:

class Capitalizer extends Transform
    _transform: (chunk, encoding, callback)->
        text = chunk.toString encoding
        text = text.toUpperCase()
        @push text
        callback()

✧✧✧

All this is very interesting, but hardly unique to the Node.js platform. Where things get really interesting is when you start dealing with Object streams. I’ll talk more about those in a future post.