Forming a legal entity (part 1)

A while back, I founded a startup. I’ve since closed it and moved on to first one company, and then to another.  However, I learned a lot in the process I thought would be worth sharing.

One of the first things I had to learn when starting my company was how to actually start a company. There are a lot of choices which all have to be made before you can incorporate, and there’s a definite order in which you need to get things done. Here’s the sequence I followed:

  1. Find a lawyer who knows startups
  2. Choose a name for the company (but not the product)
  3. Purchase the company’s domain name
  4. File incorporation papers
  5. Open a business checking account
  6. Sign up for email / calendar services
  7. Sign up for other technical services (e.g., source control, hosting)
  8. Print business cards
  9. Get to work!

All in all, there was more waiting for things to happen than actual work, and a lot more reading about legal entity types, researching banks, and that sort of thing than I’d expected. I’ll write more about each step in separate posts.

How to stop hating to write tests

Pretty nearly every developer I’ve ever worked with either hates writing automated tests, or doesn’t do it at all.  And why shouldn’t they?  After all, it’s a ton of tedious work which doesn’t impress anyone looking at the final product.  Yeah, yeah, it improves quality a bit, but still… it take so much time and effort in the first place, and even more effort to keep them from breaking all the time.  Right?

Of course not.

The problem is that we’ve mostly not been taught to write tests, and our testing frameworks tend to lead us in the wrong direction.  For example, consider this made-up little example which follows a pattern I’ve seen all too often:

class MyObnoxiousUnitTest(TestBase):

    @before
    def setup():
        # Do a little work to set things up.  Maybe this is creating
        # a database connection, maybe clearing out a directory of
        # stale test results, etc.
    
    @test
    def test_something(self):
        # here's about 5-10 lines of code to set up some test data
        # ...
        # ...
        # ...
        # ...

        # and here's another 5-10 lines of code to verify the results
        # ...
        # ...
        # ...
        # ...

        # now let's have another 3-4 lines of code to tweak some
        # little thing
        # ...

        # and now another one or two lines to verify that
        # ...

Had enough? And that’s just one test… what about your next one?  I suspect it will look very much the same, and be documented just as well.  Except, you’ll copy-n-paste a little bit from the first setup block, and tweak it some so it looks similar without being quite the same.  The same for the next one… and the next…  And good luck if someone else wrote the tests in the first place.

Before long, you’ve got a test file which is hundreds of lines long with code which has been copy-pasted into existence, but none of which is documented or easy to follow.  So, now what happens when you want to add another test?  More copy-pasta?  Probably.  And the problem gets even worse.  No wonder everyone hates testing.

 

A Better Way

Fortunately, this isn’t the only way to write automated tests, and there are even a number of frameworks which can help (e.g., rspec in Ruby, mamba in Python, or mocha in JavaScript).  This “better” style of testing grew out of a movement called Behavior-Driven Development (BDD).¹

Let’s start with a pretty typical example testing a hypothetical CSV reader class², and then pick it apart:

with description("with a CSV file full of valid data") as self:

    with before.each:
        self.csv_doc = CsvDoc("my-test-data.csv")

    with it("should contain a list of headers"):
        self.csv_doc.headers.should.equal(["alpha", "bravo"])

    with it("should contains only elements of the right form"):
        self.csv_doc.data.should.be.a(list)
        for element in self.csv_doc.data:
            element.should.be.a(dict)
            sorted(element.keys()).should.equal(["alpha", "bravo"])

    with description("when the data is modified and re-read"):

        with before.each:
            self.csv_doc.data.append({"alpha": "a", "bravo": "b"})
            self.csv_doc.save("saved-data.csv")

            self.csv_doc2 = CsvDoc("saved-data.csv")

        with it("should have the same contents as the first doc"):
            self.csv_doc2.should.equal(self.csv_doc)

The first thing you’ll notice is that this approach is a lot more structured. There isn’t just one big test function with a bunch of code in it.  Instead we have a definite pattern where we:

    1. Define, in English, what the state we’re testing is (i.e., with description)
    2. Write some code to make that state true (i.e., with before.each)
    3. State, in English, one specific thing which should be true now (i.e., with it)
    4. Write some code to verify that really did happen.

You can see that exact pattern repeated several times in this example.  This makes the tests much easier to follow, and gives a great deal of built-in documentation as to exactly what conditions are being tested, and what the expected outcomes are—in plain English.

The second thing you’ll notice is that this pattern not only repeats, but becomes progressively more nested.  Each nesting means that any state which happened in the outer layers will also be applied to the inner layers.  So, in our final test case (i.e., the last with it statement) we get both of the with before.each statements run before our test.  This provides an exceptionally easy way to share state between individual tests, thus saving us the massively problematic copy-pasta in the more conventional approach.

Finally, the third thing you’ll probably notice is that each with it block is super short.  Since each one only has assertions, and each one only asserts a single condition (described with an English sentence), it really doesn’t need much code.  This makes the tests both extremely well-documented, and very easy to modify.  Stop for a moment, and think how eager you would be to add a missing test case at 16:45 on a Friday with each approach…

✧✧✧

The important take-away here is that the architecture of your tests matters.  We’re often fed the line that test code is throw-away code, and therefore the same rules don’t apply as when writing “real” code.  This is a colossal mistake.  Written badly, your test code will massively slow down a development team, and be a major source of conflict among its members.  Written to the same standards as any other code, it can be fast to write, easy to change, and save you a ton of time and trouble.

 


¹ While BDD gets the credit for originating this mode of testing, it recommends going way, way beyond what I personally do or would recommend. ↩︎

² I’m using Python along with the mamba and sure libraries in this example only because that’s what I happen to be working in these days. ↩︎

On the Naming of Methods

Methods are functions which are bound to a particular object.  The exact mechanism changes depending upon the language, but the general idea is that you don’t ever call them without knowing what object they belong to, and the that code inside them has access to the object.

Conceptually, this “binding” means that calling a method can more accurately be thought of as making a request of the object.  Imagine a class like this one:

class Dog(Animal):
    def __init__(self, name):
        self.name = name

    def speak(self):
        print("Woof!")

fido = Dog("Fido")
fido.speak()

In this example, you have Fido, a dog, whom you have asked to speak. Fido goes about that in a fashion typical of dogs. The method, speak, therefore should be phrased like a request or command so that we mentally hear the echo of the actual phrase we might use with a real dog: “Fido, speak!”

However, real methods often get much more complicated.  First, we often have a lot more of them, even on a single object.  Second, we often find that we have many similar methods where we want to be able to distinguish between them.  To that end, method names should take the form of short, imperative sentences with the (programming) object the method is bound to as the (grammatical) object of the sentence.  In practice, the desire to keep the sentences as simple as possible tends to lead to using this progression of sentence forms:

  • verb (render)
  • verbNoun (renderPlot)
  • verbAdverb (renderGradually)
  • verbAdjectiveNoun (renderTopAxis)
  • verbPrepositionNoun (renderWithBorder)
  • verbPrepositionAdjectiveNoun (renderWithGreenBorder)

So, when faced with picking a name for a given method, I try to start at the top of this list, and then work my way down until I can find a method name which meets these criteria:

  • unambiguous in the context of this object
  • re-uses common words and phrases already used in related code
  • fits the pattern of other methods in the object and related code
  • uses language common in the real-world domain the code is about

Roughly of half the time, the first form works just fine: as it did for our dog example.  If that name would be ambiguous within that class, then adding a single word will often do the trick.  So, for example, if I have a render method on a class already, and I want to refactor it into multiple parts, I might add methods named renderAxis and renderPlot into which I’ll refactor the pieces of the original method.

In all of this, I always use fully and correctly spelled words.  The shorter, simpler and more common the words the better. That way, I take advantage of the common agreement on spelling to help prevent errors from dissimilar misspellings / abbreviations.  The only exception, is when an abbreviation is more common than the long-form version (e.g., HTML is far more common than hyper-text markup language).

✧✧✧

The crucial thing I try to keep in mind when choosing method names is that I’m writing for the sake of other humans, and that methods should be tiny, imperative sentences directed at the object containing the method.  The next reader of the code may be a new developer learning the code for the first time, a colleague who needs to make a bug fix in this area, or even myself a few months down the road when the details aren’t so fresh.  Following a definite pattern aimed at both keeping things simple, and keeping this clear goes a long way towards making your code its own documentation.

Why do I need GTD?

In case you haven’t heard of it, Getting Things Done (most often just called GTD) is a personal organization system developed by David Allen.  He first published the book of the same name in 2002, and I first ran across it in 2007.  It may be clichéd to say it (doesn’t make it less true), but this book changed my life.

The central premiss of GTD is to avoid keeping things in your head.  What things?  Everything.  The most obvious is to-do items, but just as important are: long-term goals, reference material, reminders of future events, quotes for a future article… whatever.  Due to the limited ability of our brains to keep track of a lot of things (see: Crow Epistemology), we need a way to track things that doesn’t rely on our limited brains to do the job.  Or, as David Allen said:

“Your brain is a great place to have ideas, but a terrible place to manage them.” — David Allen

So, the core principle is GTD is to get everything out of our heads and into a trusted system where everything is written down and organized by a certain set of principles.  Each of these principles covers a potential hole in the funnel from when you first notice something interesting to the time when you’ve completed whatever action came from it.  GTD also includes principles to help you organize the plethora of individual activities into a coherent plan for one’s life as a whole.

◰ ◱ ◲ ◳

To me, the most compelling observation Allen makes has to do with lists.  Think back to some time when you were feeling totally overwhelmed with too many things to do.  Remember that feeling of being stressed, worried that you’d forget something, panicked that you wouldn’t have enough time to get it all done.  And then you made a list.

Just that simple act of writing out what needed to be done brought a huge sense of relief. Except, why?  It’s a little crazy when you think about it.  Not only didn’t you get anything  done, you spent some of your precious time making the list!  So… why the relief?

It’s because you were experiencing the relief of not having to burn up your mental resources remembering, sifting, sorting, and obsessing over the stuff that went on the list.  Now it’s all there in a permanent form, so your mind can relinquish the task of keep hold of it all.  Now, you’ve got a clear space in your head to actually think and do. Just imagine what it would be like to live that way all the time, and you get an idea of how I feel now that I’ve got GTD in my mental toolkit.


For further reading:

  • Allen, David. “Getting Things Done: The Art of Stress-Free Productivity” (Amazon)

Designing my Own Workbench

For a long while now, I’ve been dealing with less-than-ideal work surfaces in my wood shop. I’ve got a blanket over my table saw so I can work there (when I’m not using it). I’ve also got a tiny table built up against one wall, and, finally, I’ve got a old door thrown over two sawhorses.  None of these is really adequate.  So, I decided my next project in the shop is going to be making myself a proper work table.

This is the design for the basic table itself. After having watched a bunch of furniture-makers on YouTube (especially Matt Estlea and Matthias Wandel), I’ve got a bunch of ideas for extras it should have (i.e., clamps and such), but I think it will be a nice upgrade to start with.


In case you’re looking for a nice CAD package for hobbyist-level work, Autodesk Fusion 360 has an extremely robust feature set and is free for hobbyists and start-ups.

Diving Deep on Coupling

Last time, I described how Cohesion applies to everything from writing a single line of code all the way up to designing a remote service. Now, let’s consider the same thing for Coupling.

Recall that Coupling is the mental load required to understand how a particular component relates to another compoent. If we take a line of code as a single component, then what defines how it is connected to the lines around it? For a start: the local variables it uses, methods it calls, conditional statements it is part of, the method it is contained in, and exceptions it catches or throws. The more of these things a single line of code involves, the more coupled it is to the rest of the system.

As an example, consider a line of code which uses a few local variables to call a method and store the result. This could be more or less coupled depending upon a number of factors. How many local variables are needed? Are any of the variables static or global variables? Is the method call private to the class, a public method on another class, or a static method defined somewhere? Is the result being stored in a local variable, an instance variable, or a static/global variable? Depending upon the answers to these questions, that one line may be more or less coupled to the other lines around it.

The implication of having coupling which is too tight for a single line of code is that you have to understand a lot of other lines in order to understand that one. If it uses global variables, then you have to also understand what other code modifies the state of those variables. If it uses many local variables, then you have to understand the code which sets their values. If it calls a method on another object, then you have to understand what impact that method call will have. All of these things increase the amount of information you need to keep in mind to understand that line of code.

Now, consider what coupling would mean for a remote service which is part of a large distributed system (e.g. Amazon.com). The connections such a service has are defined by the API it offers, the other services it consumes, and how their APIs are defined. For the service’s own API, consider the following: does the API respond to many service calls or just a few? Do the service calls require a lot of structured data to be passed in? How easy is it for a caller to obtain all the necessary information? How much is the service’s internal implementation attached to the API it presents? How common is the communication protocol clients must implement? For the other services it consumes, consider: how many other services does it use? How are their APIs defined (considering the questions above)? Just as with a single line of code, the answers to these questions will define how tightly coupled a service is to the rest of the system around it.

Having coupling which is too tight for a remote service carries troubles, too. Changes to downstream systems may force the service to need an update. Any change to the API may require upstream services to change as well. It may be impossible to change the service’s implementation if it is too tightly coupled to its own API. Finally, it may be difficult to break the service into separate services as it grows in scope. It can be a costly and painful mistake down the road to allow too much coupling between services in a distributed environment.

Diving Deep on Cohesion

The concepts of coupling and cohesion apply at all levels of programming from writing a single method all the way up to planning the architecture of Amazon.com. As you build each piece (an individual line of code, a method, an object, or an entire remote service), you have to make sure it has strong cohesion and loose coupling. Does the component do exactly one thing which is easy to describe and conceptualize? Does it have relatively few, easy-to-understand connections to the other components around it?

Consider what it means to have strong cohesion for a single line of code. To have good cohesion, it would need to produce a single clear outcome. On the other hand, a line of code with poor cohesion will tend to have multiple side effects, or calculate many values at once:

int balance = priorBalances[balanceIndex++] - withdrawals[withdrawalIndex++];


float gravitation = UNIVERSAL_G *
    (bodyA.mass * KG_PER_LB) *

    (bodyB.mass * KG_PER_LB) /

    ((bodyA.position.x - bodyB.position.x) *

    (bodyA.position.x - bodyB.position.x) +

    (bodyA.position.y - bodyB.position.y) *

    (bodyA.position.y - bodyB.position.y));

In both of these cases, the code is doing multiple things at once, and in order to understand what is going on, you have to mentally pull it apart, understand each piece, and then integrate them back together. Both of these lines can easily be re-written as several lines which each demonstrate much better cohesion:


int balance = priorBalances[balanceIndex] - widthdrawals[withdrawalIndex];


balanceIndex++;
withdrawalIndex++; 
float massA = bodyA.mass * KG_PER_LB;

float massB = bodyB.mass * KG_PER_LB;

float xRadiusPart = bodyA.position.x - bodyB.position.y;

float yRadiusPart = bodyA.position.y - bodyB.position.y;

float radiusSquared = xRadiusPart * xRadiusPart + yRadiusPart * yRadiusPart;
float gravitation = UNIVERSAL_G * massA * massB / radiusSquared;

Each of the re-written examples has statements which are simpler, easier to understand, and clearly accomplish a single result.

At the far other end of the size spectrum, consider what strong cohesion means for a single service in a massively distributed system (e.g. Amazon.com). In Amazon’s earliest days, there was a single, central piece of software, called Obidos, which was responsible for everything from presenting HTML to calculating the cost of an order, to contacting the UPS server to find out where a package was. This ultimately resulted in single program which constantly broke down, was impossible to understand fully, and actually took over a day to compile. The crux of the problem is that Obidos tried to do too much, and wound up with terrible cohesion. There was no way anyone could get their head around the essential functions it performed without dropping all kinds of important information.

That was many years ago, and since then, Amazon has considerably improved its situation. As an example, there is now a single service whose sole purpose is to compose the totals and subtotals for an order. It communicates with other services which each compute individual charges (e.g. shipping charges, tax, etc), and all it does is put them together in the right order. This new service is much easier to understand, far easier to describe, and much, much easier to work with on a daily basis.

 


“Thank you” to Adam M. for pointing out an error in the code example!  It’s been fixed up now.

Coupling & Cohesion

In my previous post, I discussed how the mind is naturally limited in the number of things it can consider at once, and how we create abstractions to increase the range of our thinking. By using abstractions, we can hide the details of how something works, thereby allowing ourselves to handle more information and still only have to keep in mind a small number of discrete items. This concept is called unit economy.

A consequence of this limitation is that we naturally design complex systems by breaking them down into simpler pieces. If any one piece is still too complex to build, then we break that piece down even further. The act of breaking a system into pieces serves the same function in engineering that creating abstractions does in thinking. Both allow us to ignore the details how a part of the system works, and just keep in mind the overall notion of what it does.

In order for this decomposition to work, however, we must follow two principles: coupling and cohesion.

Coupling is the extent to which two components are interconnected. This connection can be defined in terms of actual connections in the final design, but, for our purposes, consider it in terms of how much one has to know about one component in order to understand the function of the other. The crucial point is that coupling describes the mental load required to understand the relationship between the two components.

To take some examples in the physical world, consider a toaster and a gas stove. A toaster is loosely coupled to the rest of the kitchen. It has a single plug, which is an industry standard, and which is shared by nearly every other electrical appliance in the kitchen. On the other hand, a gas stove is tightly coupled to the rest of the kitchen. It requires a gas main, a vent to be installed above it, an exhaust pipe, and it must be mounted flush with the rest of the cabinetry. When installing a toaster, you simply have to find a flat surface near a plug. When installing a gas stove, you need to understand quite a bit about the structure of the whole kitchen. The mental effort required to understand how a toaster is connected to the rest of the kitchen is far less than that required for the stove.

Cohesion is the extent to which all the parts of a component serve a unified purpose. For the purposes of computing mental load, we measure this by how easily we can come up with a single sentence which describes the essence of what the component does, and by whether each part of the component is needed to accomplish that task. The crucial point in terms of unit economy is that we are able to come up with a simple abstraction for the component which allows us to ignore the details of how the component works.

For some examples of strong and weak cohesion, consider a television set and a swiss army knife. In the television set, the description of what it does is pretty simple: “A television set converts a TV signal into a visible picture”. On the other hand, describing a swiss army knife isn’t nearly so simple. Attempting to come up with a similar statement gets pretty awkward: “A swiss army knife is a multi-function device which provides the ability to conveniently store and reveal tools to: cut things in a variety of ways, drive screws of various kinds, etc.” When considering building some kind of system with these things, it’s much easier to keep in mind a simple definition (like the TV set) than a rambling, complex one (like the swiss army knife).


For further reading:

  • McConnell, Steve. “Code Complete: A Practical Handbook of Software Construction, 2nd Edition”. Chapter 2 (Amazon)

Crow Epistemology

Our brain is an amazing organ, capable of truly astounding feats of abstraction and generalization, particularly when compared to a computer. On the other hand, it measures up pretty poorly when it comes down to managing a lot of information at once.  Ayn Rand, a 20th century philosopher, described this phenomenon as Crow Epistemology with the following story (paraphrased).

Imagine there are a bunch of crows sitting in the tree-tops at the edge of a forest. A pair of hunters pass them on their way into the forest, and all the crows get really quiet. One hunter comes out alone, but the crows stay quiet because they know there’s still another one in the forest. At little while later, the second hunter comes out, and as soon as he’s out of sight, the crows relax and start cawing again.

Now, imagine that a group of 20 hunters goes into the forest. The crows get all quiet, just like before. After a while, 15 hunters come out again. As soon as they’re out of sight, the crows start up with their cawing again. The crows could keep track of two hunters, but 20 was just too many for them to track by sight.

Of course, humans have the same problem. To address this, we create abstractions (like numbers) which allow us to group things together and keep track of them as a single unit. In our example, a boy sitting at the edge of the forest could simply count the hunters, and just remember one thing (the number 20). That way, he could easily know whether all the hunters had left the woods or not.

It turns out, programming is a lot harder than counting, and to do it, we need to keep all kinds of information stuffed in our heads. Naturally, the rules are no different, so we, as programmers, use lots of abstractions to keep everything straight. No one could possibly think about all the electrical signals racing around in a computer while they were designing a game. Even when designing a simple game, we need to break the program up into pieces, and complete each piece one at a time, and then stitch the parts together.

This process of using higher and higher abstractions to manage complexity is known as unit economy.  By grouping complex things together into a single unit, we can forget about how it works, and just remember what it does.  You don’t have to remember how a transistor works to understand what it does, just as you don’t need to remember how to implement a hash table to understand what it does.

The concept of unit economy is behind everything we do in our daily work as programmers.  Not too surprisingly, abusing a reader’s unit economy is the foremost way to make your code unreadable. I’ll have more to say on actually applying this principle in future posts.


For further reading:

  • Leroy, Charles Georges. “Letter VII. On the Instinct of Animals. The Intelligence and Perfectibility of Animals from a Philosophic Point of View: With a Few Letters on Man”. pgs 125-126. (Google Books)
  • Miller, George. “Magical Number Seven, Plus or Minus Two…”  (Wikipedia)
  • Rand, Ayn. “Introduction to Objectivist Epistemology” (Amazon)

What am I doing here?

You’re likely here because you know me, and I shared this site with you. Chances are, if you’re not one of my friends or family, we work together, and I mentioned that I occasionally like to write about things which occupy my mind. Mostly, that’s to do with my work, but occasionally will stray to my hobbies as well.

One of the main things which fascinates me is the relationship between epistemology (how the mind works), and how to write better code. I firmly believe that poor coding style, consistently applied, is better than a free-for-all. However, I also—even more firmly—think we can do a lot better than that. Deliberately matching up how the human mind works with how we write code can yield objective answers to what are generally thought of as subjective preferences in coding style. You can expect many posts exploring that subject here.

I’m also fascinated by trying to improve the processes in our daily lives. This can be anything from brushing your teeth to managing a team.  Of course, just the word “process” has a stigma against it as guaranteed to make simple things complicated. I couldn’t disagree more. Process is merely the way something gets done, and a good process is one which minimizes the effort required.  This has lead to a strong passion for Getting Things Done (GTD), and I’m sure that will come up often as well.

Finally, there are a bunch of other things which occupy my time and my thoughts: being a good husband & father, listening to and playing jazz & classical music, woodworking, and playing Minecraft.  You may well see posts on all of those subjects as well.

So, if any of that appeals to you, bookmark, subscribe, follow, or otherwise just keep an eye on this space.