I spent a few hours today trying to track down why a change password form I created wasn’t working. Ultimately, it turned out to be a surprising behavior from the way Vue.js (v2.6.14) computed properties are refreshed.
First, let’s have a quick reminder about how computed properties work. Each computed property runs some code and returns a result as the current value of that property. Vue will cache the value, and automatically re-compute it only when some data it depends upon changes. Normally, this works perfectly, and is completely transparent to the end user. Somehow, Vue just knows which other properties it depends upon.
So, on to my problematic example where it doesn’t just know. Let’s start with a simplified version of the component’s code:
Seems pretty straight-forward, no? Well, it doesn’t work.
I have an event handler (not shown) which changes this.password and this.passwordConfirm each time the user types in an input element. For each letter the user types in the “password” box, you see the console statement. However, no matter what the user types into the “confirm” box, the console statement never appears.
The problem, it turns out, was on line 3:
result = this.password && this.passwordConfirm;
Since this appears inside a computed property, behind the scenes Vue.js is trying to work out what other properties are being used so that it can figure out when this property will need to be refreshed. You would expect it would have have something like this in its dependency graph:
canSubmit -> password, passwordConfirm
Except, it doesn’t. For some reason, when both this.password and this.passwordConfirm appear on the same line, Vue.js fails to recognize the dependency on this.passwordConfirm, and produces only this dependency graph:
canSubmit -> password
Therefore, you see the console statements only when this.password changes, and not when this.passwordConfirm changes.
The fix is pretty easy. Just change canSubmit to this:
canSubmit: function() {
p = this.password
c = this.passwordConfirm
return p && c
}
Having the references to this.password and this.passwordConfirm on separate lines appears to do the trick, and causes Vue.js to produce the expected dependency graph.
Why is this happening? To be honest, I don’t know. Here are a few theories…
The method Vue.js uses to detect dependencies:
doesn’t work with multiple properties on the same line
gets confused by the similarity of the two names
get confused by the logical operators
To be honest, I have a hard time accepting that any of these are true, but they’re the only plausible explanations I can come up with. If anyone has a better theory, please leave it in the comments below!
I’m currently working on a feature for runbooks.app which allows users to upload images for their runbooks. I’m using the Python boto3 library to make a PutObject API requests. Simply provide the bytes, the target bucket, and object key, and you should be all set. However, to my considerable frustration, I spent most of the morning trying to figure out why I was getting this error:
botocore.exceptions.ClientError:
An error occurred (AccessDenied)
when calling the PutObject operation:
Access Denied
Not super helpful, as there are, of course, a whole host of things which could be wrong.
Access Keys
I wanted my bucket to only be available to a specific IAM user I set up for my application code. This user should only be given permissions for the specific API operations I want my code to perform: and nothing else. When you set up the user, you’re given an Access Key and a Secret Access Key. The former is a jumble of letter which identifies the account, and the latter is a shared secret so AWS can be sure the request comes from a trusted source.
I’m using Heroku, so I went to my application’s settings page to verify that my Config Vars contained the correct values. They did. So, that wasn’t the problem.
User Policy
Each new user you create in IAM has a section which specifically lists what things that user is permitted to do. There are many ways to assign permissions. You can create a separate policy and associate it with that user directly. You could make such a policy, associate it with a group, and then add the user to the group. You could assign the policy to a role, and then have that user don that role temporarily to do its work. You can even create an inline policy which is merely attached directly to that one user.
It turns out, you don’t need any of that if the user is specifically called out in the access policy for the object being accessed. So, in the end, I could simply remove all policies, groups, and roles from the application’s user so long as the S3 bucket call it out specifically.
Bucket Policy
Each S3 bucket can have its own security policy which specifically lists what each user (group, role, etc.) is permitted to do. As I before, I wanted to limit this user’s access to just those functions I knew my code was going to try to perform. So, I created a bucket policy which looked like this:
I knew I was only ever calling the PutObject API, so I didn’t want to grant any more permissions than that. However, no matter what I did, I kept getting the error message at the top of this post stating that I didn’t have permissions to do exactly the action listed!
After much fiddling about and reading of StackOverflow, I found the solution. I needed to grant more permissions!
Even though I was only calling PutObject myself, the implementation of that API endpoint was both trying to set up the Access Control List (ACL) for the newly uploaded object, and, since I included some tags in my PutObject request, it was trying to set those as well. It was failing at both because I hadn’t set up those permissions.
The thing which threw me off the scent was the error message. It was merely saying that it couldn’t perform the PutObject API request because some permission was lacking. If it had clearly stated which permission was actually lacking, I would have been past this whole thing in a second. Instead, I erroneously assumed that the “PutObject” part of the message was referring to what I need to know in order to solve the problem (i.e., the name of the missing permission), not what I already knew (i.e., the name of API endpoint)! Since they have the same exact name, I thought telling me that it was lacking the one permission it actually did have!
So, a few lessons learned:
User policies aren’t actually needed if the user is specifically mentioned in the bucket policy.
If you get an access-denied message from AWS, the error will only mention the API which it couldn’t perform, not the actual permission it is lacking.
API calls may require several permissions beyond simply the one which shares the name of the API call. Some of these are conditional, depending upon which additional parameters you provide in your call.
AWS continues to have the most byzantine APIs and documentation in the business.
In all the coding books I’ve read, I don’t recall any of them specifically talk about organizing the different members of a class. However, in all the code I’ve read, the lack of any standard or order to how members are organized has been the single biggest obstacle to feeling at home with a new codebase.
The reason this matters so much has to do with our brain’s limited ability to keep track of a lot of things at once. If we can only keep a few things in our head at a time, then it becomes hugely important for the readers of our code that we provide a hierarchy which doesn’t demand they understand more than a half-dozen or so sections at a time. Moreover, it’s super important that these be immediately recognizable, and that they be in a consistent order so that they can be easily found.
Using Dividers to Create Hierarchy
The way I create an immediately recognizable sense of order is to create visual markers which clearly mark each section of a file. So, for example, let’s consider just the bare framework of a module in Python which contains two classes:
from some_package import a_module
############################################################
KNIGHTS_SAY = "Ni"
EXPECT_SPANISH_INQUISITION = False
############################################################
class TheMainObject(object):
def __init__(self):
# do constructor stuff
# Properties #################################
@property
def alpha(self):
return self._alpha
@alpha.setter
def alpha(self, value):
self._alpha = value
# Public Methods #############################
def calculate(self):
# calculate a value here
# Private Methods ############################
def _private_calculation(self):
# do some private calculation
############################################################
class SomeHelperObject(object):
# etc, ...
The first section (lines 1–3) contains all the imports. One of my favorite things about Python is that this gives you a complete listing of external dependencies, and an exact listing of where they are all coming from. Someone reading module can, in an instant, scan this very first section to see everything that’s coming in from the outside.
Next, I put in a full-width marker line with no label. As this indicates the top level of your hierarchy, it should be the full line width your team allows. That way, it’s immediately clear that you’re moving from one top-level part of the hierarchy to another when you see one of these lines.
The next section contains a few constants. These are clearly to be used across all the major sections of this module, and possibly even outside it.
Next, we have the class initialization section for the first class (lines 11–16). This is where the actual class declaration lives, along with any constructors. The semantic meaning of this section, while not explicitly labeled, it pretty clear: we’re setting up the class here.
Next, we have a subdivider labeled “Properties” followed by all of this class’s property declarations. If we had multiple properties here, they would each be listed in alphabetical order (more on this later). It’s important to note that this is a sub-divider, and is therefore shorter than the primary dividers since it represents the second level of the hierarchy. It’s important that it be enough shorter (15–20%) than the primary dividers that it be easy to see the difference even at a quick glance.
Next are the public methods, followed by private methods. As additional members of the second layer of the hierarchy, they use the same dividers. This pretty much covered your bases for most languages.
Sequencing and Sorting
With the visual hierarchy in place, then next thing I consider is the ordering of each element of the hierarchy. Since code is read far more than it is modified, my general approach here revolves around attempting to make the code as easy to read as possible. To that end, it’s important to keep the various levels of the hierarchy in as predictable and clear an order as possible.
For the top level of the hierarchy, we’re talking about sections for defining classes, constants, imports, and (depending upon your language) free-standing functions. In many languages, you’re required to list imports first, plus, that ordering has the advantage of making it clear what this code depends upon, so it’s useful to keep them up front. Next, I list the constants which are going to be used throughout this file. Next comes the primary class in the file (if there is one). Finally, come the sections for helper classes. In most cases, the part of the file you can readily see without scrolling down contains the imports, constants, and the start of the primary class section. So, in a single glance, the reader of the code can easily see what is the primary purpose of this file and what its dependencies are.
In the second level of the hierarchy, at least for a class, we’re mostly looking at various groupings of methods. I arrange these sections in order from most public to least. The reasoning is that a reader of the code is most likely to want to use the class, and then to inherit from it, and finally to actually change its implementation. This ordering addresses those various needs in order.
The third (and final) layer of the hierarchy consists of the members themselves. In 99.99% of cases, I simply sort these in alphabetical order. It is extremely tempting to try to order these in some “logical” order, but I have found that almost always leads to chaos. The reason is pretty simple. At first, the “logic” is pretty clear. The original author knows what the rational is, and puts things in that order. However, when the next author comes along, perhaps the logic of the ordering isn’t super clear. Maybe the next author is a little lazy. Perhaps they need to add a member which doesn’t fit into the logical scheme the other members follow. Inevitably, though, members start to be added wherever it “feels” like they seem to fit: or merely at the end of the list. Sooner or later, the only organizational scheme a new author can discern is: random.
Alphabetizing the list of members solves all these problems. It’s obvious from very brief inspection what the scheme is. That makes it obvious where to find an existing member, and where new ones should be added as there is only one correct place for each member. Moreover, it helps keep code diffs very easy to read and understand as there is a lot less movement of code from one change to another. Finally, it makes it extremely easy to tell whether a certain member is present or not (e.g., whether a certain method has been overridden).
✧✧✧
I have worked in at least a dozen different languages from PowerPC Assembly to C++ to Python to Go. In each one, I apply the principles of establishing visual hierarchy and creating an objectively correct ordering of elements in the file to ensure that the ultimate goal of making the code obvious and readable are satisfied. It doesn’t matter what the language is, whether it’s Object-Oriented, or what other conventions the language encourages. I find applying this framework makes my code immediately recognizable and earns praise from other developers for being among the most orderly and rational code they’ve ever seen.
✧✧✧
Want to see these techniques in practice? Check out some of my code at github.com/andrewminer.
One occasionally hears of some programming zealot who swears up and down that methods should be kept to n lines or less. The actual value may vary, but the arbitrary nature of the claim remains. However, there is a kernel of truth there, and it has to do with preserving unit economy.
We all know that the longer a method is, the more we have to keep in our minds to understand it. There are likely to be more local variables, more conditional statements, more exceptions caught and thrown, and more side-effects of all those lines of code. Furthermore, the problem grows faster and faster as there are more lines of code since they all potentially can have an impact on one another. Keeping methods short has a disproportionately large benefit.
Of course, claiming that there’s some absolute “correct” number is clearly nonsensical. The same number of lines in C, Lisp, Java, Assembler, or Ruby will accomplish radically different things, and even how one legitimately counts lines will change dramatically. What does not change, though, is the need for the reader (and author) of the code to understand it as a whole. To this end, one should strive to keep the number of discrete tasks a method accomplishes to within the range of what people generally can remember at once: between one and six.
Each task within a method may have several lines of code of its own; how many tends to vary widely. Consider the process of reading a series of rows from a database. There may be a task to establish a database connection, another to create the query, another to read the values, and perhaps one more to close everything down. Each of these may be composed of anywhere from one to many lines of code.
Tasks may even have subtasks. Consider the example of building a login dialog. At some point, there is likely to be some code which creates a variety of controls, and places them on the screen (e.g. an image control for the company logo, a text field to capture the user name, etc). In the method which does this, one may consider the process of creating the components a single task which has a number of subtasks: one for each component.
In both cases, the important consideration is how organizing the method into tasks and subtasks helps preserve unit economy. By creating tasks which have strong cohesion (i.e. you can name what that group of code does) and loose coupling (i.e. you can actually separate that group of lines from the others), you give the reader ready-made abstractions within your method. In the first example, the reader can readily tell that there’s a section for setting up the connection, and be able to mentally file that away as one unit without the need to remember each line of code in it. In the latter example, the reader can categorize the series of UI element creation subtasks as a single “build all the UI components” task, and again be able to abstract the entire thing away under a single unit. Even if there are a dozen or more individual components, it still can be considered a single task, that is, a single mental unit.
This ability to create abstractions within a single method is why there is no absolute “right” size for a method. Since grouping like things into tasks and subtasks preserves the reader’s (and author’s) unit economy, it is quite possible to have a method which is very long in absolute terms, and still quite comprehensible. It also implies that a fairly short method can be “too long” if it fails to provide this kind of mental structure. The proper length will always be determined by the amount of units (tasks) which one has to keep in mind, and the complexity of how those tasks are interrelated.
Before Object-Oriented programming (OOP), error conditions were commonly reported via a return code, an OS signal, or even by setting a global variable. One of the most useful notions introduced by OOP is that of an ‘exception’ because they drastically reduce the mental load of handling error cases.
In the old style of error handling, any time a function was called which could result in an error, the programmer had to remember to check whether an error occurred. If he forgot, the program would probably fail in some mysterious way at some point later down the line. There was no mechanism built into the language to aid the programmer in managing all the possible error conditions. This often meant that error handling was forgotten.
The situation is much improved with Exceptions, primarily because they offer a fool-proof way to ensure that error handling code is invoked (even if it is code for reporting an unhandled exception). This both makes it unnecessary to remember to check for errors, and it increases the cohesion of such code (i.e. it can be gathered into “catch” blocks instead of mixed in with the logic of the function). Both of these help preserve the unit economy of the author and reader of the code.
Unfortunately, despite being such a useful innovation, exceptions are often abused. We’ve all seen situations where one must catch three different exceptions and do the same thing for each. We’ve all seen situations where only a single exception is thrown no matter what goes wrong, and it doesn’t tell us anything about the problem. Both ends of the spectrum reflect a failure to use exceptions with the end user of the code in mind.
When throwing an exception, one should always keep two questions in mind: “Who is going to catch this?” and “What will they want to do with it?”. With this in mind, here are a number of best practices I’ve seen:
Each library should have a superclass for its exceptions.
Very frequently, users of a library aren’t going to be interested in what specific problem occured within the library; all they’re going to want to know is that the library either did or didn’t do its job. In the latter case, they will want the process of error handling to be as simple as possible. Having all exceptions in the library inherit from the same superclass makes that much easier.
Create a new subclass for each distinct outcome.
Most often, exception subclasses are created for each distinct problem which can arise. This makes a lot of sense to the author, but it usually doesn’t match what the user of the code needs. Instead of creating an exception subclass for each problem, create one for each possible solution. This may mean having exceptions to represent: permanent errors, temporary errors, errors in input, etc. Try to consider what possible users of the component will want to do with the exception, not what the problem originally was.
Remember that exceptions can hold data.
In most languages, exceptions are full-fledged classes, and your subclasses can extend them like any other parent class. This means that you can add your own data to them. Whether it is an error code for the specific problem, the name of the resource which was missing, or a localization key for the error message, including specific data in the exception object itself often is an invaluable means for communicating data which would otherwise be inaccessible from the ‘catch’ block where the exception is handled.
Exceptions should be self-describing in logs.
In most applications, when an exception is finally caught (i.e., not to be re-thrown or wrapped in another exception), it should be logged. The output produced should be as descriptive as possible, including:
This post is part of a Git 201 series on keeping your commit history clean. The series assumes some prior knowledge of Git, so you may want to start here if you’re new to git.
✧✧✧
The rebase tool in git is extremely powerful, and therefore also rather dangerous. In fact, I’ve known engineers (usually those new to git) who won’t touch it at all. I hope this will convince you that you really can use it safely, and it’s actually very beneficial.
What is rebase?
To start, I’ll very briefly touch on what a rebase actually is. So, very briefly then, a rebase is a way to rewrite the history of a branch. Let’s assume you’re starting out with a standard working branch from master:
At this point, let’s say you want to update your branch to contain the new commits on master (i.e., D and F), but you don’t want to create a merge commit in the middle of your work. You could rebase instead:
git rebase master
This command will rewrite your current branch as though it had originally been created starting from F (the tip of master). In order to that, though, it will need to re-create each of the commits on your branch (C, E, and G). Remember that a commit is the difference applied to some prior commit. In our example, C is the changes applied to B. E contains changes applied to B, and G contains changes applied to E. Rebasing will be that we need to change things around so that C actually has F as a parent instead of B.
The problem is that git can’t just change C’s parent because there’s no guarantee that the changes represented by C will result in the same codebase when applied to F instead of B. It might be that you’d wind up with some completely different code if you did that. So, git needs to figure out what result C creates, and then figure out what changes to apply to F in order to create the same result. That will yield a completely new commit which we’ll call CC. Since E was based upon C, which has been replaced, git will need to create a new commit using the same process, which we’ll call EE. And, since E has been removed, that means we’ll need to replace G with GG. Once all of the commits have been created, git moves the branch pointer to the end of the newest commit:
While all this seems complicated, it’s all hidden inside of git, and you don’t really have to deal with any of it. In the end, using rebase instead of merge just means changing a single command, and your commit history is simpler because it appears as though you created your branch from the right place all along. If you’d like a much fuller tutorial with loads of depth, I’d recommend you head over here.
Rebasing across repos
If you’re working on a branch which only exists locally, then rebasing is pretty straight-forward to work with. It’s really when you’re working across multiple clones of a repo (e.g., your local clone, and the one up on GitHub) that things become a little more complicated.
Let’s say you’ve been working on a branch for a while, and somewhere along the way, you pushed the branch back to the origin (e.g., GitHub). Later on, though, you decide you want to rebase to pick up some changes from master. That leaves you in the state we see in this diagram:
If you were to pull right now, git would freak out just a bit. Your local version of the branch seems to have three new commits on it (CC, EE, and GG) while it’s missing three others (C, E, and G). Then, when git checks for merge conflicts, there’s all sorts of things which seem to conflict (C conflicts with CC, E conflicts with EE, etc.). It’s a complete mess.
So, the normal thing to do here is to force git to push your local version of the branch back to origin:
git push -f
This is telling git to disregard any weirdness between the local version of the branch and origin’s version of the branch, and just make origin look like your local copy. If you’re the only one making changes to the branch, this works just fine. The origin gets your new branch, and you can move right along. But… what if you aren’t the only one making changes?
Where rebasing goes wrong
Imagine if someone else noticed your branch, and decided to help you out by fixing a bug. They clone the repository, checkout your branch, add a commit, and push. Now the origin has all your changes as well as the bug fix from your friend. Except, in the meantime, you decided to rebase. That would mean you’re in a situation like this:
Now you’re stuck. If you pull in order to get commit H, you’re going to have all sorts of nasty conflicts. However, if you force push your branch back to origin (to avoid the conflicts), you’re going to lose commit H since you’re telling git to disregard the version of the branch on origin. And, if your friend neglected to tell you about the bug fix, you might do exactly that and never even realize.
Solution 1: Communicate
The best way to fix the problem is to avoid it in the first place. Communicate clearly with your teammates that this branch is a working branch, and that they shouldn’t push commits onto it. It’s a good idea for teams to adopt some clear conventions around this to make this kind of mistake hard to make (e.g., any branch stating with a username should only be changed by that user, branches with “shared”, “team” or some other prefix are expected to have multiple contributors).
If you can’t be sure you’re only one working on a branch, the next best thing is, before starting the rebase, talk with anyone who might be working with the branch. Say that you’re going to rebase it, and what they should expect. If anyone speaks up that they’re working on changes to that branch, then you know to hold off.
Once everyone has pushed up any outstanding changes, pull down the latest version of the branch, rebase, and then push everything back up as soon as possible. That looks like this:
Once you’ve finished, you’ll want to tell the other people working on the branch that they need to get the fresh version of the branch for themselves. That looks like:
If you find yourself having just rebased and only then learn there are upstream changes you’re missing, the simplest way out of this difficulty is to simply ditch your rebase. Go back, and pull down the changes from the origin, and start over (after referring to solution 1). That would look something like this:
This will switch back to master (1), so that you can delete your local copy of the branch (2), and then grab the most recent version from the origin (3). Now, you can re-apply your rebase (4), and then push up the rebased branch before anyone else has a chance to mess things up again (5).
Solution 3: Start a new branch
If you find that you want to rebase right away, and don’t want to wait to coordinate with others who might be sharing your branch, a good plan is to isolate yourself from the potentially shared branch first, and then do your rebase.
git checkout mybranch
git checkout -b mybranch-2
At this point, you’ve got a brand new branch which only exists on your local machine, so no one else could possibly have done anything to it. That means you can go ahead and rebase all you like. When you push the branch back up to origin (e.g., GitHub), it will be the first time that particular branch has been pushed.
Of course, if someone else has added a commit to the old branch, it will still be stuck over there, and not on your new branch. If you want to to get their commit on your new branch, use git’s cherry pick feature:
git cherry-pick <hash>
This will create a new commit on your branch which will have the exact same effect on your branch as it did on the old one. Once you’ve rescued any errant commits, you can delete the old branch and continue from the new one.
✧✧✧
I’m hope this makes rebasing less scary, and helps you get a sense of when you’d use it and when not. And, of course, should things go wrong, I hope this gives you a good sense of how to recover.
Two last bits of advice… First, before rebasing, create a new branch from the head of the branch you’re going to rebase. That way, should things go completely wrong, you can just delete the rebased branch, and use your backup. And, finally, if you’re in the middle of a rebase which seems to be going a little nuts, you can always bail out by using:
Estimating most projects is necessarily an imprecise exercise. The goal of this post is to share some tools I’ve learned to remove those sources of error. Not all of those tools will apply to every project, though, so use this more as a reminder of things to consider when estimating, rather than a strict checklist of things you must do for every project. As always, you are the expert doing the estimating, so it is up to your own best judgement.
Break things into small pieces
When estimating, error is generally reduced by dividing tasks into more and smaller pieces of work. As the tasks get smaller, several beneficial things result:
Smaller tasks are generally better understood, and it is easier to compare the task to one of known duration (e.g., some prior piece of work).
The error on a smaller task is generally smaller than the error on a small task. That is, if you’re off by 50% on an 8 hour task, you’re off by 4 hours. If you’re off by 50% on an 8 day task, you’re off by 4 days.
You’re more likely to forget to account for some part of work in a longer task than a shorter one.
As a general rule, it’s a good idea to break a project down into tasks of less than 2 days duration, but your project may be different. Pick a standard which makes sense for the size of project and level of accuracy you need.
Count what can be counted
When estimating a large project, it is often the case that it is made up of many similar parts. Perhaps it’s an activity which is repeated a number of times, or perhaps there’s some symmetry to the overall structure of the thing being created. Whichever way, try to figure out if there’s something you already know which is countable, and then try to work out how much time each one requires. You may even be able to time yourself doing one of those repeated items so your estimate is that much more accurate.
Establish a range
When estimating individual tasks (i.e., those which can’t be further subdivided), it is often beneficial to start out by figuring out the range of possible durations. Start by asking yourself: “If everything went perfectly, what is the shortest time I could imagine this taking?” Then, turn it around: “If everything went completely pear-shaped, what shortest duration I’d be willing to bet my life on?” This gives you a best/worse-case scenario. Now, with all the ways it could go wrong in mind, make a guess about how long you really think it will take.
Get a second opinion
It’s often helpful to get multiple people to estimate the same project, but you can lose a lot of the value in doing so if the different people influence each other prematurely. To avoid that, consider using planning poker. With this technique, each estimator comes up with their own estimate without revealing it to the others. Then, once everyone is finished, they all compare estimates.
Naturally, there are going to be some differences from one person to the next. When these are small, taking an average of all the estimates is fine. However, when the differences are large, it’s often a sign that there’s some disagreement about the scope of the project, what work is required to complete it, or the risks involved in doing so. At this point, it’s good for everyone to talk about how they arrived at their own estimates, and then do another round of private estimates. The tendency is for the numbers to converge pretty rapidly with only a few rounds.
Perform a reality check
Oftentimes, one is asked to estimate a project which is at least similar to a project one has already completed. However, when coming up with a quick estimate, it’s easy to just trust to one’s intuition about how long things will take rather than really examining specific knowledge of particular past projects to see what you can learn. Here’s a set of questions you can ask yourself to try to dredge up that knowledge:
The last time you did this, how long was it from when you started to when you actually moved on to another project?
What is the riskiest part of this project? What is the worst-case scenario for how long that might take?
The last time you did this, what parts took longer than expected?
The last time you did this, what did you forget to include in your estimate?
How many times have you done this before? How much “learning time” will you need this time around?
Do already you have all the tools you need to start? Do you already know how to use them all?
There are loads of other questions you might ask yourself along these lines, and the really good ones will be those which force you to remember why that similar project you’re thinking of was harder / took longer / was more expensive than you expected it to be.
Create an estimation checklist
If you are planning to do a lot of estimating, it can be immensely helpful to cultivate an estimation checklist. This is a list of all the “parts” of the projects you’ve done before. Naturally, this will vary considerably from one kind of project to the next, and not every item in the checklist will apply to every new project, but they can be immensely valuable in helping you not forget things. In my personal experience, I’ve seen more projects be late from the things which were never in the plan, than from things which took longer than expected.
✧✧✧
Estimation is super hard, and there’s really no getting around that. You’re always going to have some error bars around your estimates, and, depending upon the part of the project you’re estimating, perhaps some considerably large ones. Fortunately, a lot of people have been thinking about this for a long while, and there are a lot tricks you can use, and a lot of books on the subject you can read, if you’d like to get better. Here’s one I found particularly useful which describes a lot of what I’ve just talked about, and more:
This post is part of a Git 201 series on keeping your commit history clean. The series assumes some prior knowledge of Git, so you may want to start here if you’re new to git.
✧✧✧
As I work on a branch, adding commit after commit, I’ll sometimes find that there’s a piece of work I forgot to do which really should have been part of some prior commit. The first thing to do—before fixing anything—is to save the work I’m doing. This might mean using a work-in-progress commit as described previously, or simply using the stash. From there, I have a few options, but in this post I’m going to focus on using the “fixup” command within git’s interactive rebase tool.
The essence of the process is to make a new commit, and then use git to combine it with the commit I want to fix. The first step is to make my fix, and commit it. The message here really doesn’t matter since it’s going to get replaced anyway.
> git add -A
> git commit -m "fix errors"
This will add a new commit to the end of the branch, but that’s not actually what I wanted. Now I need to squash it into the older commit I want to fix. To do this, I’m going to use an “interactive rebase” command:
> git rebase -i master
This is telling git that I want to edit the history of commits back to where my branch diverged from master (if you originally created your branch from somewhere else, you’ll want to specify that instead). In response to this request, git is going to create a temporary file on disk somewhere and open up my editor (the same one used for commit messages) with that file loaded. It will wind up looking something like this:
pick 7e70c43 Add Chicken Tikka Masala
pick e8cc090 Remove low-carb flag from BBQ ribs
pick 9ade3d6 Fix spelling error in BBQ ribs
pick b857991 fix errors
# Rebase 1222f97..b857991 onto 1222f97 ( 5 TODO item(s))
#
# Commands:
# p, pick = use commit
# r, reword = use commit, but edit the commit message
# e, edit = use commit, but stop for amending
# s, squash = use commit, but meld into previous commit
# f, fixup = like "squash", but discard this commit's log message
# x, exec = run command (the rest of the line) using shell
#
# These lines can be re-ordered; they are executed from top to
# bottom.
#
# If you remove a line here THAT COMMIT WILL BE LOST.
#
# However, if you remove everything, the rebase will be aborted.
#
# Note that empty commits are commented out
All of these commits are those which I’ve made since I cut my branch: ordered from oldest (on the top) to the newest (on the bottom). The commented out parts give the instructions for what you can do in this “interactive” portion of the rebase. Assuming that the fix is for the Chicken Tikka Masala recipe, I’d want to edit the file to look like this:
pick 7e70c43 Add Chicken Tikka Masala
fixup b857991 fix errors
pick e8cc090 Remove low-carb flag from BBQ ribs
pick 9ade3d6 Fix spelling error in BBQ ribs
When I save the file and quit my editor, git is going to rebuild the branch from scratch according to these instructions. The first line tells git to simply keep commit 7e70c43 as-is. The next line tells git to remove the prior commit, and replace it with one which is the combination of the prior commit and my fix-up commit, b857991. The other two commands tell git to create two new commits which result in the same end state as each of the old commits, e8cc090 and 9ad3d6.
As a bit of an aside… Why does git have to create new commits for the last two commands? Remember that commits are immutable once created, and that part of the data which makes up the commit is the parent commit it was created from. Since I’ve asked git to replace the parent commit, it will now need to create new commits for everything which follows on that same branch since each one now has to have a new parent: all the way back to the commit I replaced.
At the end of all this, if we were to inspect the log, we’d see:
> git log --oneline master..
6a829bc3 Add Chicken Tikka Masala
29dd3231 Remove low-carb flag from BBQ ribs
0efc5692 Fix spelling error in BBQ ribs
In essence, everything appears to be the same, except that the original commit includes my fix. However, looking a little closer, you can see that each commit has a different hash. The first one is different because I modified the “diff” portion of the commit (i.e., I added in my fix) and therefore a new commit was needed (commits are immutable, so adding the fix required making a new one). The other two needed to be re-created because their parent commit disappeared, and therefore new commits were needed to preserve the effect of those two commits, starting from my fixed-up commit as the new parent.
✧✧✧
There is one caveat I have to warn you about when using this method. Any time you rebase a branch, you’ve changed the commit history of the branch. That is, you’ve thrown out a whole bunch of commits and replaced them with a completely new set. This is going to mean loads of difficulties and possibly lost work if you’re sharing that same branch with other people, so only use this technique on your own private working branches!
This series is going to get very technical. If you’re not already familiar with git, you may want to start over here instead.
✧✧✧
Most people I’ve worked with treat git somewhat like they might treat an armed bomb with a faulty count-down timer. They generally want to stay away from it, and if forced to interact with it, they only do exactly what some expert told them to do. What I hope to accomplish here is to convince you that git is really more like an editor for the history of your code, and not a dark god requiring specific incantations to keep it from eating your code.
As I’m thinking of this as a Git 201 series, I’m going to assume you are familiar with all bunch of 101-level terms and concepts: repository, index, commit, branch, merge, remote, origin, stash. If you aren’t follow this link to some 101-level content. Come on back when you’re comfortable with all that. I’m also going to assume you’ve already read my last post about creating a solid commit.
As you might expect, there’s a lot to be said on this topic, so I’m going to break this up into multiple commits… err posts. To make things easy, I’m going to keep updating the list at the end of this post as I add new ones in the series. So, feel free to jump around between each of them, or just read them straight through (each will link to the next post, and back here).
This post is part of a Git 201 series on keeping your commit history clean. The series assumes some prior knowledge of Git, so you may want to start here if you’re new to git.
✧✧✧
My work day doesn’t naturally divide itself cleanly into commits. I’m often interrupted to work on something new when I’m half-way done with my current task, or I just need to head home in the evening without quite having finished that few feature. This post talks about one technique I use in those situations.
To rewind a bit though, when I’m starting a new piece of work (whether a new feature, bug fix, etc.), I first check out the source branch (almost always master), and run:
> git pull --rebase
This grabs the latest changes from the origin without adding an unnecessary merge commit. I then create myself a new branch which shows my username along with a brief reminder of what the change is for:
> git checkout -b aminer/performance-tuning
At this point, I’m ready to start working. So, I’ll crack open my favorite editor and make my changes. As I’m going along, I’ll often find that I need to stop mid-thought and move on to a different task (e.g., fix something else, go home for the day, etc.). It may also be that I’ve arrived at a point where I’ve just figured out something tricky, but I’m not through with the overall change yet. So, I’ll create a work-in-progress commit:
> git add -A > git commit -n -m wip
This is really more of a “checkpoint” than an actual commit since I very likely haven’t fully finished anything, probably don’t have tests, etc. The -n here tells git not to run tests or linters. This is fine for now because this isn’t a “finished” commit, and I’m going to get rid of it later. For the time-being, though, I keep on working. If I reach another checkpoint before I’m ready to make a real commit, I’ll just add on to the checkpoint commit:
> git add -A > git commit -n --amend
The first command pulls all my new work into the index while the second removes the prior work-in-progress commit, and replaces it with a new one which contains both the already-committed changes along with my new ones.
When I’ve actually finished a complete unit of work, I remove the work-in-progress commit:
> git reset HEAD~1
This will take all the changes which were in the prior commit, and add them back to my working directory as though they were there all along. The commit history on my branch looks just as though the commit had never been there.