Friday, August 28, 2015

Why it's important to plan a project properly

My last post was about the commonly expounded phenomenon of shaving Yaks : http://cogito.jonesmz.com/2015/08/yak-shaving.html

That post used, as an illustrative example, a project that I'm working on professionally.  Please understand that this post isn't intended to be critical of any particular person, project, company, or event, but is instead a gestalt of my experiences all throughout my career so far. Every company can have absolutely amazing people, customers, projects, and culture, and every company can have awful people, customers, projects, and culture. Frankly, a lot of companies have a mix of both good and bad, just like everything else in life. So please don't think I'm trying to criticize any particular organization, this post, and most of the posts that I have on my blog, are intended to be philosophical waxing, not criticism.

Even farther back in the past, I wrote a piece on how the industry that my father is in, residential construction, is remarkably, even eerily, similar to the field of Software Engineering. : http://cogito.jonesmz.com/2014/06/nails-and-sawdust-on-your-floor.html

I had a conversation with my father about the yak shaving activities I was participating in (and not enjoying) at work, while he had no idea what i meant by yak shaving, as soon as I explained the whole deal around a task that has to be completed so that a task can be completed, so that a task can be completed, so that...... so that my actual assignment can move forward, he instantly recognized what I was talking about and was able to match my own frustrated rant with one of barely contained intensity born of 30+ years of shaving his own yaks!

An example of said yak shaving, (which involves significant amounts of me getting the details mixed up, and adding embellishments :) ).

He's at a customer site, building, say, a porch or something similar. Turns out that the customer's door frame is rotting out, and the customer agrees it needs to be replaced before the construction of the porch can continue. So dad calls up the local hardware store and asks about a specific model door frame, the guy on the phone assures him they have it. So he goes the half hour drive out to the store, only to find out they didn't have it! Next store closer is another half hour drive, luckily they have it in stock, but half way there he runs into road construction and has to take a detour. Yada yada yada, ultimately almost an entire day is blown on this project of building a porch before a single nail can be hammered into a single board!

Clearly yak shaving is common in lots of different fields.

We got to talking about mitigation strategies, and we were able to delve into the realm of timeline estimation, foreseeing problems, standard operations that helped prevent problems ahead of time, and so on.

On the subject of time estimates, clearly the answer is "it'll take as long as it'll take!", which is the howl of a hell hound for people trying to timebox a project, or a component of a project. My own time estimation is so notoriously bad that even doubling what my gut tells me is consistently an underestimate.

Over the years, my dad's developed a set of procedures for how to undertake operations, either big or small. Going into the details of all the procedures would be kind of hard to do, there are so many of them, and they're all gut knowledge that would probably take decades for him to convey to me, but the general gist is that he seems to build in risk avoidance at every turn. This consistently adds more time to the project compared to the best-case timeline, but that's not what concerns him. The biggest risk for a construction project, a risk that could see the death of his business, and the endless nightmare of pissed off customers and angry lawyers, is always the worst-case timeline of a project. The kind of worst-case involving a house collapsing while building it or similar disaster.

The first step, obviously, is proper planning. Assessing what the end result needs to be, visualizing it, and modeling it. Getting feedback from relevant stakeholders, such as the customer, an architect, possibly a designer or realator, and asking the relevant subcontractors what various options in the plan will do to the bottom line. On a project that will take up to a year of boots-on-the-ground time, this first phase of planning and review can easily take 3 months, or longer, especially if local government has additional concerns to throw into the mix.

For a software project, where the "agile" methodologies movement has been trying to push release dates from years to days, saying that you want to spend over a quarter of the total project timeline doing planning sounds like a sure way to get canned! Without proper project planning, you can certainly reduce your best-case timeline, but at the cost of dramatically increasing your worst-case. That's the reason for doing things this way, and it's to stave off the worst-case scenario of the project not working at all. In the software world, the cost to fix a defect in the software project drastically increases the later on in the project it's discovered. (http://www.isixsigma.com/industries/software-it/defect-prevention-reducing-costs-and-enhancing-quality/)

What's worse, if I remember my schooling properly, the defect injection rate of a project is at it's absolute highest at the beginning of a project, decreasing over time until the last phase of the project, upgrade / replace / discontinue. (What, you thought the last phase was delivery to customer? Ha!). So, the conclusion that I draw from this is that, since the most defects are introduced into the project when it's the least expensive to fix, and that the longer these defects stay in the project, the more costly they become, you should put as much effort into fixing defects in the early phases of the project as you possible can. In this scenario, a dollar of prevention can be worth a grand of cure!

The best, and cheapest, phase to correct defects is the project inception and planning phase!


Back to the point of shaving yaks, I've recently run into quite a few situations where poor project planning has cost a drastic amount of time late in the project. I went over some of the specifics about what's going on in the project that's causing so much yak shaving, but I really want to put emphasis on my feeling that the delays and yaks are, fundamentally, caused by a lack of defect correction early in the development process.

Of course, everything in the software world is interconnected, just like it is with the construction world. My dad recently did a basement remodel, and while working on it, over and over and over again, ran into huge delays that had nothing to do with the original scope of work. For example, a gas line needs to be moved out of the way, but to do that some electrical work also needs to be done, however doing that properly needs a new circuit put into the breaker and holes town in part of the ceiling that wasn't part of the original plan to route around an unexpected waterline. Get the point? Yaks.

Now, with a construction project at least you can tell just by looking that things are happening, but it can be really difficult to tell when a software project is stalled. "Boy, I haven't heard from my engineer in a couple days. What's he (or she) doing that I'm not seeing?" Well, it could be that your engineer just went off on a vacation without mentioning it, or it could be that he's spent 16 or more hours a day for an entire work week refactoring a delicate piece of complex C++ template code that's running up against compiler bugs that aren't documented very well and he just can't figure it out. Or it could be that he's had to rewrite the unit test framework for the project to account for an unanticipated incompatibility with the relevant standard, or it could be any number of things.

The real question is, knowing that these kinds of unexpected delays and "side tracks" are going to happen, what can you, as a project leader, do to either mitigate or prevent these issues. My opinion is that planning and documentation is the only proven method to handle it. If you're implementing a specific internet standard, for example, frankly the first step is going to need to be translating the incomprehensibly poorly written RFC documents into an actionable requirements document. Next step, involves the design of a project architecture, and spending at least several weeks working on that, and reviewing it, chewing on it, implementing small prototypes of the stuff in that architecture, and just in general ruminating. Why? Because, again, the early stages of a project are the source of the vast majority of project defects!

You wouldn't allow a home to be built without a blueprint. You wouldn't allow a blueprint that hasn't been reviewed by an architect. You wouldn't dig a foundation, ground well, or septic system without properly surveying the ground to ensure that you know whether to add special considerations or not. You certainly wouldn't start framing a house before seeing the blueprint, and you damn sure aren't going to start wiring the house for electricity without consulting with the customer and designer on how the rooms are designed. So why do we accept software projects that are started with no design, no specific details about what it needs to do, no review, no sign off, and components that depend on other components are started before the component they depend on are even built?

I don't know, I really don't.

But I do know this. That defect that managed to hide in your code, back when you forgot to do the inception and planning phase? That defect had it's teen and college years back in the initial development phase, it partied HARD with the other defects that you managed to kill while developing with whiskey and pizza, giving birth to it's own baby defects and puking all over your code at the same time, but this isn't a little defect anymore. It's a giant behemoth of a defect. A bureaucrat of the defect world, grown fat and lazy, but all the more powerful (like Jabba the Hutt), by the lack of attention that you've paid it over it's life.

This defect is your Yak. As soon as you hit the integration phase, this Yak is going to walk right up to you and lick your face. And it's going to keep doing that, invisible to your boss, but staring you in the face from all of 2 inches away for months, until you shave it. Try as you might, you're going to eventually shave this Yak, because it'll either drive you so crazy that you do it yourself over a weekend or 20, or it'll completely prevent progress on your project until it's been satisfied with how thoroughly you shave it.

Shave that Yak!



Wednesday, August 26, 2015

I shaved a yak, but I didn't like it

It seems to be a long standing tradition of every computer science professional to eventually write about the concept of Yak Shaving.

Before I get too far into this, I should point out that while I happen to be using a recent project that I'm working on professionally as an example, it's intended to be only for illustrative purposes. Please understand that this post isn't intended to be critical of any particular person, project, company, or event, but is instead a gestalt of my experiences all throughout my career so far. Every company can have absolutely amazing people, customers, projects, and culture, and every company can have awful people, customers, projects, and culture. Frankly, a lot of companies have a mix of both good and bad, just like everything else in life. So please don't think I'm trying to criticize any particular organization, this post, and most of the posts that I have on my blog, are intended to be philosophical waxing, not criticism.

That being said, instead of writing my own rendition of the definition of this prodigious term, I'll simply point you at another writer who already did a pretty good job describing the phenomenon (though I don't know if I agree with his conclusions): http://sethgodin.typepad.com/seths_blog/2005/03/dont_shave_that.html

Another take from Urban Dictionary: https://en.wiktionary.org/wiki/yak_shaving

And an excellent clip from the show Malcolm in the Middle: http://imgur.com/gallery/D0daEJW

Suffice to say that so many times while conducting a project, an engineer, ultimately, needs to go shave a yak. It's tedious, frustrating, and *very* hard to properly account for up front.

I want to expound a bit on a particular yak that I've had the distinct displeasure of shaving recently.

I'm building an Interactive Connectivity Establishment (RFC 5245) protocol implementation, in library form using C++. This particular implementation has some very specific performance constraints which make the process quite challenging.

I don't want to make this post about the performance constraints, but I will point out the biggest one, which explicitly forbids dynamic memory allocation or deallocation inside the library. This is something that a lot of C++ programmers take for granted. Every STL container can allocate memory dynamically internally, smartpointers, inherently, are about managing dynamic memory, and countless frameworks out there expect the ability to dynamically allocate memory whenever they want. It doesn't seem like it, but, this can become very hard to do, especially when combined with other performance considerations.

So anyway, what's the Yak?

This ICE library isn't my creation from start to finish. I was handed a 75% baked implementation, and asked to finish it. Originally, we thought this would only take a couple months, but it's been on-going for a year now. The most recent challenge involves the dynamic memory allocation issue. The library, as per the ICE standard, notifies it's API consumer whenever it discovers a new connection pathway. The specific mechanism that it was using was to allocate a chunk of memory that was around 5 kilobytes in size. In a normal environment, this would be no big deal, but I'm only able to get pre-allocated (and thus safe to use) chunks no larger than 128 bytes!

Skipping over a lot of details related to the HOW, the What ended up requiring *massive* changes to the library, and the code that uses the library. First, I had to redesign the struct containing the information on this communication path to only include information on one path at a time, instead of four at a time. Then redesign the communication flow between the client code and the ICE library, such as requiring the client to generate authentication information and pass that in at constructor time instead of having the ICE library generate that information and send it to the client, and various other changes. Then, had to create some new datastructures to play nicely with the code that uses this library, and rewrite the code that generates and/or uses these information structures on both sides of the API.

At each step of the way, not only did the change have to be made to the library itself, but the unit testing framework needed to be adjusted, the code that uses the library needed to be adjusted, and hundreds of lines of documentation needed adjustment for each item.

To sum up, since each change required 4 different chunks of code/docs be updated, at a minimum (and, like I said, a few times required changes to client code framework datastructures !), this particular Yak of making my library not allocate memory dynamically turned into a HUGE undertaking. Just adding up the very broad steps undertaken above, there were, at least, 4*4 = 16 individual large refactorings that had to happen to do what, conceptually, could have been a very simple undertaking. A few of them were difficult enough to work on that I got stuck on them for a week or more at a time. Nothing can demoralize a software engineer than a Yak that refuses to be shaved!

Consider also that, in general, each of these changes took well over a day of work, and very quickly, a simple task of "Go do this change" can explode into a month or more of engineering time.

This isn't a cautionary tale of not telling your engineers to refactor their code. Our performance requirements are VERY strict, and there's no way that we could use this code without having made these changes. This is more an illustration of how quickly the list of "high level" or "manager view" items that need to be addressed for business and technical reasons can magnify into dozens or hundreds of individual items that an engineer needs to address.

I shaved that yak, but I didn't like it!