Getting the most from your log file

Think back on the last time you urgently needed to figure out what a live, production piece of software was doing. It’s very likely that nearly the first thing you did was to pull up it’s log. It’s also very likely that the majority of the content was either useless, unintelligible, or flat-out misleading.

The reason most logs fall into this state is that they are a piece-meal accumulation of statements which were thrown in over time. This is a problem because the only possible reason to waste the computer’s resources producing them is to be as correct, intelligible, and useful as possible. In short, log files should be explicitly written for consumption by people.

As with anything else, clear writing requires the author to carefully select the content, and to create it with the reader in mind. In the case of log files, we know other engineers (or scripts they write) are going to read them, and they are going to want to know how the server has been behaving. Moreover, no one reads log files unless something is going wrong. Therefore, log files should be written to make it as easy as possible to debug why a server isn’t behaving as it should. To that end, here are a few specific guidelines I’ve found most useful:

Make it easy to track down errors

It should go without saying, but it is imperative to make it as easy as possible to identify where – in the code – an error occurred. In Java this may be printing out the stack trace; in C/C++/Objective-C, this may be using the __FILE__ and __LINE__ macros. Most languages have some similar feature. Along these same lines, try to avoid having logging statements which look too much alike. This makes it easy to misinterpret the log, and completely identical statements make it impossible to know exactly where an error occurred.

Log all inputs and outputs

The fastest way to track down the reasons for strange output is to know whether the input was strange, too. This is especially true for a server (i.e. a malformed request), but it applies to other programs as well, provided you have a sufficiently broad view of “inputs” and “outputs”.

I consider an input/output to be any data which enters (or leaves) the program’s runtime context. This includes: user input (mouse/keyboard events and the like), files read/written to disk, data sent/received on a socket, or requests/results from a database query. I recommend at least logging something every time such an input or output occurs, and if reasonable, the actual content itself.

Logging inputs and outputs provides a number of major benefits. First, you know exactly what the system was asked to do. This allows you to immediately know whether a problem is in your program or outside of it. Second, it gives you a pretty clear indication of what will reproduce a problem (if there is one). Finally, it allows you to report errors to the owners of the “other” system in case you received an unexpected input.

Record how long it takes

Errors are frequently just that a system is taking too long to respond. Being able to quantify the time required for each request is an extremely valuable way to isolate the problem. However, it is not sufficient to simply record how long your system took. You must also record timings for each request you make outside of your application’s runtime context. That way, you’ll know right away if your program appears slow because it’s waiting for something else: whether it’s a web service call or reading a file from disk.

Don’t log the expected case

If something is supposed to happen, you generally don’t need to know about in the log (unless it’s one of the things mentioned above). Bear in mind that no one is interested in reading a program’s log unless the program is misbehaving. At that point, the person wants to read as little of the log as possible. Including a lot of “business as usual” statements in the log only makes the process more difficult.

Use log levels to indicate action required

Since a log file’s purpose is to assist in figuring out why a program isn’t running correctly, it’s extremely valuable to use log levels to indicate how “badly” things are going. I recommend thinking of the action required as a guide to what log level to use:

  • FATAL: page someone immediately
  • ERROR: send an email immediately
  • WARN: send in a daily report
  • INFO: include for later debugging purposes

Depending upon your logging system, the various levels often have different names, but these should map to whatever you use. Even if you don’t actually have an automated system to page people based upon your log file, it’s extremely clarifying to think about each level in these terms.

Make it easy to parse

Finally, bear in mind that most uses of a log file involve searching or parsing them using automated tools. Be sure to include standard fields in each line so that these tools can easily extract information from them. Here are some examples of what fields you may want to include:

  • time (down to milliseconds and time zone)
  • the number of the line in the log statement (to ensure each line is unique)
  • the current thread
  • the current user ID / IP address / other client-related identifier
  • the log level
  • the file/class

There’s a lot more to be said on logging, but if every log I’ve read implemented these few principles, my life would have been a lot easier. If there’s some piece of wisdom that’s too good to be missed, feel free to add it in the comments!

The Right Size for a Method

One occasionally hears of some programming zealot who swears up and down that methods should be kept to n lines or less. The actual value may vary, but the arbitrary nature of the claim remains. However, there is a kernel of truth there, and it has to do with preserving unit economy.

We all know that the longer a method is, the more we have to keep in our minds to understand it. There are likely to be more local variables, more conditional statements, more exceptions caught and thrown, and more side-effects of all those lines of code. Furthermore, the problem grows faster and faster as there are more lines of code since they all potentially can have an impact on one another. Keeping methods short has a disproportionately large benefit.

Of course, claiming that there’s some absolute “correct” number is clearly nonsensical. The same number of lines in C, Lisp, Java, Assembler, or Ruby will accomplish radically different things, and even how one legitimately counts lines will change dramatically. What does not change, though, is the need for the reader (and author) of the code to understand it as a whole. To this end, one should strive to keep the number of discrete tasks a method accomplishes to within the range of what people generally can remember at once: between one and six.

Each task within a method may have several lines of code of its own; how many tends to vary widely. Consider the process of reading a series of rows from a database. There may be a task to establish a database connection, another to create the query, another to read the values, and perhaps one more to close everything down. Each of these may be composed of anywhere from one to many lines of code.

Tasks may even have subtasks. Consider the example of building a login dialog. At some point, there is likely to be some code which creates a variety of controls, and places them on the screen (e.g. an image control for the company logo, a text field to capture the user name, etc). In the method which does this, one may consider the process of creating the components a single task which has a number of subtasks: one for each component.

In both cases, the important consideration is how organizing the method into tasks and subtasks helps preserve unit economy. By creating tasks which have strong cohesion (i.e. you can name what that group of code does) and loose coupling (i.e. you can actually separate that group of lines from the others), you give the reader ready-made abstractions within your method. In the first example, the reader can readily tell that there’s a section for setting up the connection, and be able to mentally file that away as one unit without the need to remember each line of code in it. In the latter example, the reader can categorize the series of UI element creation subtasks as a single “build all the UI components” task, and again be able to abstract the entire thing away under a single unit. Even if there are a dozen or more individual components, it still can be considered a single task, that is, a single mental unit.

This ability to create abstractions within a single method is why there is no absolute “right” size for a method. Since grouping like things into tasks and subtasks preserves the reader’s (and author’s) unit economy, it is quite possible to have a method which is very long in absolute terms, and still quite comprehensible. It also implies that a fairly short method can be “too long” if it fails to provide this kind of mental structure. The proper length will always be determined by the amount of units (tasks) which one has to keep in mind, and the complexity of how those tasks are interrelated.

Response to “Diversity and Justice”

A day or so after publishing Diversity and Justice, I received the following comment from a female acquaintance:

I appreciate you offering this patch! Some code comments from a more senior developer in the area:


Good start with “there are traits which matter and traits which don’t”! Your error is that you’re assuming the only variables that matter are the ones in your Engineering function, but you’re using classes that are defined globally. Banana-gorilla-jungle stuff. 


That’s why you’ve got the common misconception “since ‘female’ doesn’t directly come into play in ‘Engineer’, an object descended from both ‘female’ and ‘Engineer’ is the same as an object descended from both ‘male’ and ‘Engineer’.” In fact, because female is a trait that matters globally, anyone in the female Engineer group has had to pass through a filtering function “stand_up_to_assumption_of_incompetence” that requires that engineer be confident in their knowledge, or patient, or stubborn, or lucky, or sociopathic, or possessing some other extra trait; most of which are favorable for Engineering prowess.

If you have two pools of engineers and one is pre-screened, it makes logical sense to exhaust the pre-screened pool before turning to the random pool!

Also, most people who work in this field use a fork of the Justice framework that comes with a bunch of open source tools specifically for working with these filtering functions; like Feminism or Anti-Racism.

Again, thanks for weighing in on the project!

To be honest, my first reaction was to feel annoyed and patronized. I was tempted to ignore the comment and just move along with my day. But I couldn’t, in good conscious, let it go so easily.

First, while my first reaction was to view this as a “negative” comment, I’m not 100% sure that was the intention. I’m perfectly willing to accept a well-reasoned and friendly rebuttal to my ideas. In fact, that’s one of the reasons I publish things publicly, instead of just keeping them to myself. However, I’m a lot less interested in debating a point with someone who is already hostile. But, knowing the person who posted the comment (I attended her wedding!), and being good friends with her husband, I am much more willing to give her the benefit of the doubt as to her intentions.

Second, if you look past the tone of the message to its actual argument, I think there’s validity to her point. That is, any woman who is still in the software field after any length of time has had to put up with a certain amount of bullshit which gives her a certain talent and toughness above the average candidate. That being the case, you could consider female candidates even more attractive than the average male candidate. I think there’s some truth there, and I was even considering various similar ideas when I wrote the original article. But, to keep things concise, I decided to exclude them.

However, the reason I ultimately decided to respond with a new post is because I think this comment actually demonstrates the third part of the trap, as I outlined in my original post:

Finally, it’s a trap for everyone trying to correct the problem. By framing the situation as one group oppressing the other, it creates a powerful “us vs. them” mentality which is hard to overcome. People in the minority group are tempted to identify all people in the majority group as oppressors. Allies in the majority group feel unfairly accused, and are less likely to want to remain allies. Progress in correcting the situation is impeded to no one’s benefit.

Andrew Miner

Whether it was the commenter’s intention or not, my reaction to the comment was exactly what I described: an ally who was being unfairly attacked. To put it more bluntly, I felt I was effectively being told: “Get out of here; you don’t know what you’re talking about.” with a little pat on the head for at least trying not to be a sexist asshole.

✧✧✧

While I wish the comment were more unambiguously friendly in tone, I appreciate the time my acquaintance took to write it, and I think she makes a good point. I would prefer, though, if we all were to follow Lori Lakin Hutcherson’s admirable example when she (a black woman) responded to her friend (a white man) who was struggling with the idea of “white privilege”:

I truly thank you for wanting to understand what you are having a hard time understanding… because I realized many of my friends—especially the white ones—have no idea what I’ve experienced/dealt with…

Lori Lakin Hutcherson

Lori embraced her friend’s attempt to figure things out and be an ally. She was willing to accept his professed good intention (despite his ignorance), and help him understand her own point of view in a friendly, sincere, and very powerfully honest way. I think we should all aspire to follow her example.

Diversity and Justice

I have been thinking about diversity a good deal lately. In fact, at work, I just attended the first meeting of a group to support and promote women in the workplace. I fully support the effort, but it got me thinking… what is the deeper issue here?

At my current employer, we’re trying to build airplanes. That requires considerable diversity: most especially in talent and experience. We need aerodynamicists, structural analysts, inlet designers, electrical engineers, program managers… on and on. We also need people at all levels of experience in their professions. But do we need people with blue eyes? Honestly… does eye color matter when building an airplane? Couldn’t we build a plane without anyone who has blue eyes? What about people with freckles? Do we actually need people to have uteruses? What about dark skin? Do we really need to have a man on staff who’s attracted to other men?

For those first few categories, I can hear you thinking… “Well, I guess not… but, what difference does it make?” But, by the end of that list, I would hope that you were starting to get offended. I think this is the really important point. While it’s likely that you could build an airplane without people having a certain eye color, complexion, gender, race, or sexuality, we should still feel outraged to exclude people based on those traits.

I think the deeper issue here is one of justice. When trying to assemble a group of people for whatever purpose, there are traits which matter and traits which don’t: based upon the purpose at hand. It is manifestly unjust, therefore, to exclude someone based on inessential traits: most especially when they do have the traits necessary for the endeavor.

I think focusing on diversity of race, gender, etc. instead of justice is a trap which bites in three ways.

First, it’s a trap for the company. Imagine a field where 80% of its members are men, and 20% women. Assuming an even distribution of talent, if you were trying to hire the top 10 people in the field, you’d get 8 men, and 2 women. However, if you are obliged to hire 50/50, you’d get 5 men and 2 women from the group of the 10 best people in the field, and then have to hire 3 more women who were not in the top 10. This is clearly not the best outcome for the company.

Second, it’s a trap for the people being hired. First, there are the 3 men who were excluded. It’s unfair, and we definitely sympathize with them. However, it’s the women who actually face the nastier part of the trap. First, the women who were hired won’t know whether they were hired for their talent, or to fill a quota. This is demoralizing all on its own. Even worse, no one else in the company will know either. One can easily imagine the nasty bias and mistreatment which will ensue from people who were already skeptical. With this combination of factors, a reasonable person would feel extremely unwelcome and move along rather quickly.

Finally, it’s a trap for everyone trying to correct the problem. My framing the situation as one group oppressing the other, it creates a powerful “us vs. them” mentality which is hard to overcome. People in the minority group are tempted to identify all people in the majority group as oppressors. Allies in the majority group feel unfairly accused, and are less likely to want to remain allies. Progress in correcting the situation is impeded to no one’s benefit.

I think the better approach is to focus on eliminating injustice: for any group which experiences it, and in whatever form it takes.

This approach requires and supports a great many of the actions one would take with a pro-diversity agenda. Unbalanced representation of women speakers in a conference you’re planning? It is just and pro-diversity to look a little harder to find more qualified speakers who happen to be female. Engineering staff looking a little too white? It is both just and pro-diversity to set up a recruiting trip to universities whose color balance isn’t so pale to hire talented newcomers who might face prejudice elsewhere. Notice though, that it is only just in that you’re selecting people who are qualified through their skills, talents, and accomplishments: not because of some inessential trait. The difference is that you’re putting in the extra effort required to avoid and/or unwind unjust biases against those people to promote justice towards them.

Paradoxically, I think this approach is also more inclusive. Promoting justice is a cause which anyone can support, and for any other person. Instead of creating innumerable “us vs. them” pairings, holding justice as the primary creates only one: those in favor of justice vs those who are not.

Using Exceptions Robustly

Before Object-Oriented programming (OOP), error conditions were commonly reported via a return code, an OS signal, or even by setting a global variable. One of the most useful notions introduced by OOP is that of an ‘exception’ because they drastically reduce the mental load of handling error cases.

In the old style of error handling, any time a function was called which could result in an error, the programmer had to remember to check whether an error occurred. If he forgot, the program would probably fail in some mysterious way at some point later down the line. There was no mechanism built into the language to aid the programmer in managing all the possible error conditions. This often meant that error handling was forgotten.

The situation is much improved with Exceptions, primarily because they offer a fool-proof way to ensure that error handling code is invoked (even if it is code for reporting an unhandled exception). This both makes it unnecessary to remember to check for errors, and it increases the cohesion of such code (i.e. it can be gathered into “catch” blocks instead of mixed in with the logic of the function). Both of these help preserve the unit economy of the author and reader of the code.

Unfortunately, despite being such a useful innovation, exceptions are often abused. We’ve all seen situations where one must catch three different exceptions and do the same thing for each. We’ve all seen situations where only a single exception is thrown no matter what goes wrong, and it doesn’t tell us anything about the problem. Both ends of the spectrum reflect a failure to use exceptions with the end user of the code in mind.

When throwing an exception, one should always keep two questions in mind: “Who is going to catch this?” and “What will they want to do with it?”. With this in mind, here are a number of best practices I’ve seen:

Each library should have a superclass for its exceptions.

Very frequently, users of a library aren’t going to be interested in what specific problem occured within the library; all they’re going to want to know is that the library either did or didn’t do its job. In the latter case, they will want the process of error handling to be as simple as possible. Having all exceptions in the library inherit from the same superclass makes that much easier.

Create a new subclass for each distinct outcome.

Most often, exception subclasses are created for each distinct problem which can arise. This makes a lot of sense to the author, but it usually doesn’t match what the user of the code needs. Instead of creating an exception subclass for each problem, create one for each possible solution. This may mean having exceptions to represent: permanent errors, temporary errors, errors in input, etc. Try to consider what possible users of the component will want to do with the exception, not what the problem originally was.

Remember that exceptions can hold data.

In most languages, exceptions are full-fledged classes, and your subclasses can extend them like any other parent class. This means that you can add your own data to them. Whether it is an error code for the specific problem, the name of the resource which was missing, or a localization key for the error message, including specific data in the exception object itself often is an invaluable means for communicating data which would otherwise be inaccessible from the ‘catch’ block where the exception is handled.

Exceptions should be self-describing in logs.

In most applications, when an exception is finally caught (i.e., not to be re-thrown or wrapped in another exception), it should be logged. The output produced should be as descriptive as possible, including:

  • a plain-English description of what happened
  • the state of any relevant variables in play
  • a full stack trace of where the error occurred