Wednesday, August 26, 2015

I shaved a yak, but I didn't like it

It seems to be a long standing tradition of every computer science professional to eventually write about the concept of Yak Shaving.

Before I get too far into this, I should point out that while I happen to be using a recent project that I'm working on professionally as an example, it's intended to be only for illustrative purposes. Please understand that this post isn't intended to be critical of any particular person, project, company, or event, but is instead a gestalt of my experiences all throughout my career so far. Every company can have absolutely amazing people, customers, projects, and culture, and every company can have awful people, customers, projects, and culture. Frankly, a lot of companies have a mix of both good and bad, just like everything else in life. So please don't think I'm trying to criticize any particular organization, this post, and most of the posts that I have on my blog, are intended to be philosophical waxing, not criticism.

That being said, instead of writing my own rendition of the definition of this prodigious term, I'll simply point you at another writer who already did a pretty good job describing the phenomenon (though I don't know if I agree with his conclusions): http://sethgodin.typepad.com/seths_blog/2005/03/dont_shave_that.html

Another take from Urban Dictionary: https://en.wiktionary.org/wiki/yak_shaving

And an excellent clip from the show Malcolm in the Middle: http://imgur.com/gallery/D0daEJW

Suffice to say that so many times while conducting a project, an engineer, ultimately, needs to go shave a yak. It's tedious, frustrating, and *very* hard to properly account for up front.

I want to expound a bit on a particular yak that I've had the distinct displeasure of shaving recently.

I'm building an Interactive Connectivity Establishment (RFC 5245) protocol implementation, in library form using C++. This particular implementation has some very specific performance constraints which make the process quite challenging.

I don't want to make this post about the performance constraints, but I will point out the biggest one, which explicitly forbids dynamic memory allocation or deallocation inside the library. This is something that a lot of C++ programmers take for granted. Every STL container can allocate memory dynamically internally, smartpointers, inherently, are about managing dynamic memory, and countless frameworks out there expect the ability to dynamically allocate memory whenever they want. It doesn't seem like it, but, this can become very hard to do, especially when combined with other performance considerations.

So anyway, what's the Yak?

This ICE library isn't my creation from start to finish. I was handed a 75% baked implementation, and asked to finish it. Originally, we thought this would only take a couple months, but it's been on-going for a year now. The most recent challenge involves the dynamic memory allocation issue. The library, as per the ICE standard, notifies it's API consumer whenever it discovers a new connection pathway. The specific mechanism that it was using was to allocate a chunk of memory that was around 5 kilobytes in size. In a normal environment, this would be no big deal, but I'm only able to get pre-allocated (and thus safe to use) chunks no larger than 128 bytes!

Skipping over a lot of details related to the HOW, the What ended up requiring *massive* changes to the library, and the code that uses the library. First, I had to redesign the struct containing the information on this communication path to only include information on one path at a time, instead of four at a time. Then redesign the communication flow between the client code and the ICE library, such as requiring the client to generate authentication information and pass that in at constructor time instead of having the ICE library generate that information and send it to the client, and various other changes. Then, had to create some new datastructures to play nicely with the code that uses this library, and rewrite the code that generates and/or uses these information structures on both sides of the API.

At each step of the way, not only did the change have to be made to the library itself, but the unit testing framework needed to be adjusted, the code that uses the library needed to be adjusted, and hundreds of lines of documentation needed adjustment for each item.

To sum up, since each change required 4 different chunks of code/docs be updated, at a minimum (and, like I said, a few times required changes to client code framework datastructures !), this particular Yak of making my library not allocate memory dynamically turned into a HUGE undertaking. Just adding up the very broad steps undertaken above, there were, at least, 4*4 = 16 individual large refactorings that had to happen to do what, conceptually, could have been a very simple undertaking. A few of them were difficult enough to work on that I got stuck on them for a week or more at a time. Nothing can demoralize a software engineer than a Yak that refuses to be shaved!

Consider also that, in general, each of these changes took well over a day of work, and very quickly, a simple task of "Go do this change" can explode into a month or more of engineering time.

This isn't a cautionary tale of not telling your engineers to refactor their code. Our performance requirements are VERY strict, and there's no way that we could use this code without having made these changes. This is more an illustration of how quickly the list of "high level" or "manager view" items that need to be addressed for business and technical reasons can magnify into dozens or hundreds of individual items that an engineer needs to address.

I shaved that yak, but I didn't like it!

No comments:

Post a Comment