ABC - Always Be Coding: July 2011

I recently rewrote some code that calculates the distance between 2 latitude/longitude coordinates. I was using a method that used Pythagoras, which assumes Earth is flat. Apparently this works well for distances under 20 km, but my app needs to deal with all of Ontario, which is larger that 20 km. I came across the haversine formula, which assumes a spherical Earth. Still not 100% accurate (the Earth is an ellipsoid, and has mountains and valleys) but much closer.

Some of the example implementations I found in JavaScript (about 1 screen down) and C# were oddly similar. Same variable names, no code comments that indicate an understanding of the algorithm.

I sat with the Wikipedia article open in one window and the JavaScript version on the other until I understood what was going on. (Basically, haversin(x) = (sin(x/2))^2 so you can use a language’s sin function to calculate it.) Though, I still don’t understand why atan2 is used to calculate the inverse of haversin and not arcsin (or even what atan2 does). My inverse implementation uses arcsin, but I wrote a parallel one that uses atan2 and gives the same result for the few unit tests I wrote. Given that these are trig functions I ‘m pretty confident this isn’t a coincidence.

I was happy with my implementation, but a few things still bothered me. What’s the relationship between atan2 and arcsin? What’s the derivation of the haversine formula? It should be high school or first year trig, but I don’t get it. I could probably keep Googling and refresh my knowledge at Khan Academy, but I’ve spent enough time for now. I have a house and family to tend to too. (This was a personal project so no employer time was harmed in the writing of this article.)

But it got me wondering about the thinking behind the C# example I found vs. what I did. I think you could tell the difference between 4 types of programmers with this interview question:

You’re writing an app that needs a mathematical algorithm. Nothing too fancy, but the math is just a little beyond you because it’s been 10 to 15 years since you really understood trig/calculus/algebra/stats. You Google for it and find the following:

Info on the theory behind the algorithm on, say, Wikipedia
Some code example in your programming language on various websites

How do you proceed?

This shouldn't be presented as multiple choice. You’re looking for answers of the following types:

A. Copy the code examples into your code and move on.
B. Use the code examples as inspiration for you own version, perhaps with a little refactoring and variable renaming and move on.
C. Use the Wikipedia article to write your own implementation from scratch, adding comments to document what each complex bit does, and move on.
D. Do C and spend some time trying to understand how the algorithm was derived and why it works.

If you answered A you now have 2 problems. You now have code that you can’t support because you don’t understand it, and it’s probably a copyright violation. SCO will find you. Next applicant please.

If you answered B you’ve avoided the copyright problem but you still don’t understand the algorithm. Junior programmer material.

If you answered C, nice work. You have code that you understand and that others can understand. It’s a well known algorithm so it’s not vital that you can explain why it works. Just keep the URLs to your research handy, or add them to your comments for justification. Welcome, new senior developer.

If you answered D there is a follow up question – how much time did you spend? An hour or two, or did you get lost down a rabbit hole and spend a whole afternoon (as I did)? If you spent just a couple of hours you get the team lead job, whether you stopped because you figured things out or you knew enough to not spend any more time. If you spent a whole afternoon, you still get the senior developer position, but you need to learn some hardcore time management skills before advancing.

Diving deep into a problem is pretty typical of good coders but the very best keep this tendency in check. This is where a personal system of organization can save you from rabbit holes. Or at least let you get your code out the door and look into the lingering details at an appropriate time.

PS:

I now understand the relationship between arcsin and atan2, and why the JavaScript examples are written the way they are.

First, the inverse of haversin(x) is

haversin^-1(x) = 2sin^-1(√x)

Second, the relationship between arcsin and arctan is

sin^-1(x) = tan^-1(x/√(1-x²))

So,

haversin^-1(x)= 2tan^-1(√x/√(1-x))

When x > 0 the atan2(y,x) function in most languages gives you

tan^-1(y/x)

So,

haversin^-1(x) = 2*atan2(Math.Sqrt(x)/Math.Sqrt(1-x))

I suppose this is a handy definition for those languages that don’t have an inverse sine function but do have inverse tan.

I still don’t understand why the haversine function works, but that’s fine. I’m now confident my implementation that uses inverse sine is equivalent and I’m moving on.

All project documentation can be given one of the following classifications:

Bridesmaid dress
Christmas tree
Monument

The first 2 were inspired by a line from Fight Club (about 1/4 of the way down). Here’s how to tell which type you’re working on, with examples and how much time you should spend on each.

Bridesmaid Dress

A bridesmaid dress is important only to the bride, costs serious money and is thrown away after 1 wearing. A bridesmaid dress document is a document that is just as expensive to create as any other document, whose sole purpose is to fulfill a process requirement, and read only once, if at all.

Like any other project document someone has to spend time filling in the control sections (e.g. version history, links to other docs, approver list, distribution list), in addition to crafting the real content. The control sections take longer to fill out than the real content. The content usually comes from a source to which project members already have access and could look up themselves if they really cared. Project members must attend a meeting to review it. Real time is spent, not just 1 person’s afternoon.

It is important only to the project manager because she is the one responsible for championing the process. Mind you, it’s not her fault that it has to be completed, she’s just doing her job. No one else really cares about it, though.

Only a few lines are relevant; many other pages won’t be read. Even the relevant sections will be read only once, and only by the people for whom it really is relevant. It might require 5 people to sign off, but only 1 will actually care about the contents. Bridesmaid dress documents could easily be replaced with a maximum 10 line email, but no one creates template emails for projects. Only a 10+ page Word template with a ton of control content will do.

An example is a test stage exit report. Only the page with the defect status count summary is read by those interested.

As little time as possible should be spent crafting such a doc. If you are starting from a template leave as many sections “N/A” as you can. Only fill them out if someone asks, and even then with the bare minimum.

Christmas Tree

A Christmas tree costs time and money to assemble. But the cost is borne because it is vital to those celebrating Christmas, and everyone benefits. It’s enjoyed for weeks at a time, but after Christmas is thrown away or put back in the box for another year.

As with any project documentation, it costs real time and money to produce. The template has all the same control sections as any other doc, like the bridesmaid dress doc, but the content takes much longer to produce than the control sections. Team members review it, make changes that make it better and really read it before signing off. People spend serious time on the doc but it is well justified.

A document of this type is a real necessity for the project. Without it you couldn’t have the project. People look at multiple sections multiple times throughout the duration of a project because it has useful, important information that project members must reference . This doc is the first time in which this content appears – it’s not just a summary of other content. The information feeds people’s efforts. But it’s context is still limited to the project itself.

At the end of the project, though, the document itself is forgotten. The content itself should be incorporated into a monument document. Failure to do so will cause many future headaches because information is spread across multiple documents that no one can find.

An example is project business requirements for an application or system. There’s nothing for a project to do without requirements. They’re specific to the project, though, so they only express the changes, not the whole application or system. By themselves they’re pretty useless to other projects, especially after the same requirement has been changed in a few projects. They need to be added to a monument document to make sense across projects.

It’s totally appropriate to spend time on a Christmas tree document. Everyone must remember, though, that it will be tossed aside or put away after the project and that the content must be put into a monument document to stay relevant. Link to as many other project and monument documents as possible – don’t copy and paste content. This is a waste of your readers’ time, and you’ll just have to spend time updating your copy when the source content changes.

Monument

A monument costs a lot of money, is relevant to many people and is used and lasts a long time. Think war memorials, gravestones, the pyramids.

A monument document template has all the control sections of the other two, but are a tiny fraction of the total content. The cost of producing the information may have been paid in previous projects. Indeed, if you’re doing things right the content should already exist in Christmas tree documents, short of a complete rewrite of an existing monument document. You’ll pay some time incorporating and reviewing new information, especially if the new info supersedes old info. It’s well worth the cost, though, because people will be able to reference this one true source of information years later.

And people will reference it. This will be the authoritative source of knowledge about what a system should do, how it does it, how to test it or how to use it. It will be your starting point for creating project Christmas trees and bridesmaid dresses. Folks will actually enjoy reading it because it will be the one doc that others really care about and want to ensure its usefulness. When you update it you’ll feel like you’re making a real contribution to something that will last longer than your employment at the company.

Which leads into the lifespan of a monument document. It will be around as long as the thing it describes. This could span a decade, even in technology. I personally have worked on live code that is literally 10+ years old. If a design doc had been written when the code was written people would still be updating it today.

Examples of monument documents are application or system design, or system/application requirements. It doesn’t have to literally be a document, either. A database of test cases or requirements would count too. Really, anything that aggregates project-to-project changes and is kept up to date.

This is where anyone should spend serious time. This is the most valuable type of document a company can own. It will cut down on new hires ramp up time. It will be invaluable when you rewrite an application 5 years from now. It will save you from mistakenly asking for a change that will affect changes made last year. Such documents ARE your business.

This has been a summary of the three types of project document. Hopefully it will help you spot the differences so that you spend the right amount of time on each one.

Sunday, July 10, 2011

Four Types Of Programmers

Friday, July 1, 2011

The Three Types of Project Documents

Bridesmaid Dress

Christmas Tree

Monument