Thursday, September 10, 2015

Why C++ Lambdas and std::algorithm is better than a hand-written for-loop.


A conversation at work recently sparked an investigation on whether C++11 lambdas, combined with the use of functions from the standard library like std::for_each, or std::accumulate, etc, are better than writing hand-tuned for-loops. Note, however, that this post completely ignores the use of range-based for loops, which can be significantly more readable than either the old style loops, or the lambda version that I'm endorsing in this post. Your mileage may vary.

I'm personally on the side of using C++ lambda's whenever and wherever practical. Obviously don't go and rewrite every single line of code in your codebase, but if you're already going to be making changes to a function, and that function has a few hand-written for-loops, it'll probably be worth it to convert a couple of them to use std::for_each or similar. 

I'll try to touch on some major categories of reasons why I think this way.

Performance

The biggest concern a lot of people will have is that of performance.

An example usage of std::for_each.
void foo(const std::vector<int> & vect)
{
    std::for_each(vect.begin(), vect.end(), [](const int val) -> void { printf("%d", val); });
}
I know that an example this simple could be easily accomplished with std::bind as well, but I'm just trying to explain the general concept at this point :-).

Consider the STLPort (A very popular cross-platform STL implementation) implementation of std::for_each
// for_each.  Apply a function to every element of a range.
template <class _InputIter, class _Function>
_STLP_INLINE_LOOP _Function
for_each(_InputIter __first, _InputIter __last, _Function __f) {
  for ( ; __first != __last; ++__first)
    __f(*__first);
  return __f;
}
Leaving aside investigations of what the _STLP_INLINE_LOOP macro will do to the resulting code, we can already see that this is a template function with template parameters for the iterator type, and the function object type.

Anywhere that this template function is used, the compiler will have 100% perfect knowledge of exactly what code is involved. Nothing is hidden in another translation unit, and there's no uncertainty about how the function is implemented at all. Your compiler should be able to inline this loop, in it's entirety, every single time it's used, even without link-time-optimization or similar fancyness. In fact, assuming you're not using an experimental or poorly implemented compiler, if it doesn't inline it's likely because of specific compilation options you've chosen, or it's decided that not-inlining will somehow be faster.

With our example, the inlined version should look something like this.
void foo(const std::vector<int> & vect)
{
    std::vector<int>::iterator begin = vect.begin();
    const std::vector<int>::iterator end = vect.end();
    auto function =  [](const int val) -> void { printf("%d", val); };
    for ( ; begin != end; ++begin)
    {
        function(*begin);
    }
}
We're getting pretty close to what most programmers would write for a for-loop already.

What about the "auto function = ..." line?

Essentially, a lambda function is simply a syntax shortcut in the language to define a new "functor" object, which is a struct that implements the operator() function. Unwinding the C++ syntax for lamdbas, we get code looking roughly like this.
struct randomly_generated_name_here
{
    void operator()(const int val)
    {
        printf("%d", val);
    }
};
void foo(const std::vector<int> & vect)
{
    std::vector<int>::iterator begin = vect.begin();
    const std::vector<int>::iterator end = vect.end();
    randomly_generated_name_here function;
    for ( ; begin != end; ++begin)
    {
        function(*begin);
    }
}
Clearly the compiler knows all of the implementation details of the "struct_randomly_generated_name_here" object. It might decide to inline the code for the operator() function of that object into the places where it's called. Additionally, a lambda's name is hidden from the programmer, and the compiler guarantees that there will never be name collisions. Unless you've captured the type of the lambda using a variable, such as I did in the examples above, or using either the decltype keyword, or a template function's variable type deduction capability, then once the scope that contains the lambda ends, that's it. As far as your program is concerned after that point the lambda never existed.

This is pretty compelling reason to believe that, for a lambda function that only gets used once, the compiler will simply inline it and be done with it. If you compile your program in debug mode, the compiler will almost certainly include various symbols related to lambda's that have been inlined in this kind of way, but a release mode compile, with optimizations turned on, and you might not even have any symbols for this lambda in your program at all. It may ultimately be inlined out of existence. In fact, I just compiled a small test program, the lambda's in the symbol table a few places for the debug version, but completely gone on the release version.

Here's a slightly more inlined version of our function
void foo(const std::vector<int> & vect)
{
    std::vector<int>::iterator begin = vect.begin();
    const std::vector<int>::iterator end = vect.end();
    for ( ; begin != end; ++begin)
    {
        printf("%d", *begin);
    }
}
One might argue that they would write the function like this
void foo(const std::vector<int> & vect)
{
    for(std::vector<int>::iterator begin = vect.begin();
         begin != vect.end();
         ++begin)
    {
        printf("%d", *begin);
    }
}

But I think that for almost all situations, this version of the function is essentially identical to the other.

In fact, one might be able to make an argument that this second version is slightly worse, depending on the behavior of std::vector<int>::end(). If this function is const, then yes they should result in the same objectcode. Otherwise, the compiler might not be able to determine that end() has no side effects, and as such, it'll call end() every iteration through the loop, possibly at large computational expense if we're using a custom data structure instead of std::vector<int>. Using such a structure, that had an expensive end() function, would actually make the fake-manual-compiler-inlined version of the function measurably faster than the hand-written version that a human would write.

Now, you might argue that the resulting assembly code might be, or will be different for the two versions. There's every possibility. I can't guarantee what the compiler does with the code you hand it, obviously. However, even if the assembly output is different for the two versions, I would argue that in almost all situations, a properly optimizing compiler, doing optimizations just like what we did here by hand, should be able to produce code that is every bit as performant as a hand-tuned loop for almost any lambda.

I've written a few test programs and looked at the resulting assembly for them, and so far I haven't seen anything that makes me question the comparable performance of the two versions.

Composability

C++ standard library algorithms are designed to be composable. That is to say that the functions in <algorithm> and <numeric> are intended to be combined with each other to produce more interesting behavior. 

For example, consider the "remove erase" pattern for a datastructure. 
void remote_if_greater_than(std::vector<int> & vec, const int comp)
{
    vec.erase(std::remove_if(vec.begin(), vec.end(), std::bind(std::greater, _1, comp)), vec.end());
}
Or with a lambda
void remote_if_greater_than(std::vector<int> & vec, const int comp)
{
    vec.erase(std::remove_if(vec.begin(), vec.end(),
        [comp](const int val)
        {
            return val > comp;
        }), vec.end());
The std::remove_if function rearranges the ordering of items in the collection so that the items to be removed are all at the end. It then returns an iterator to the first item of the "to be removed" set. std::vector<int>::erase has two versions, one for erasing only a single item, and another for erasing a set of items.

Basically, we rearrange the contents of the vector so that things to be removed are grouped at the end, and then we simply chop the end of the vector off. Fairly simple. 

Or perhaps we want to remove anything above the average?
void remove_above_average(std::vector<int> & vec)
{
    vec.erase(std::remove_if(vec.begin(), vec.end(), std::bind(std::greater, _1, std::accumulate(vec.begin(), vec.end(), 0)/vec.size()), vec.end());
}
Or with a lambda
void remove_above_average(std::vector<int> & vec)
{
    vec.erase(std::remove_if(vec.begin(), vec.end(),
        [comp = std::accumulate(vec.begin(), vec.end(), 0)/vec.size())] (const int val) -> bool
        {
            return val > comp;
        }), vec.end());
}

Now, clearly lambda's aren't everything, given the comparative readability of the lambda code versus, but this post isn't about extolling the virtues of lambda's over various built-in functions, and this section is only intended to demonstrate the composibility of STL algorithms.

Consider a problem I was facing not too long ago, where I needed to compare two sequences. If sequence A had a member that sequence B did not, I needed to call function 1 with the member from sequence A as parameter. If sequence B had a member sequence A didn't, then I needed to call function 2 with the member from sequence B as parameter.

Essentially, I'm replacing sequence A with sequence B, and I need to "unhook" the items that are going away, and "hook up" the items that are incoming.

That could look something like this in psuedocode.
collection_t A;
collection_t B;
for each item in A, see if there is an equivalent item in B. If there is not, call function 1.
for each item in B, see if there is an equivalent item in A. If there is not, call function 2.

If I had to do this kind of comparison, writing the same forloop logic several times for minorly different data types could start to get annoying and error prone.

So I wrote a simple template function to do the work for me.
    /*
     * \brief Finds the first element in the collection for which pred returns false.
     * @param collOne [in] One of the collections to reconcile.
     * @param collTwo [in] One of the collections to reconcile.
     * @param pred    [in] A binary function object that accepts as the first argument, items from collOne, and the second argument, items from collTwo.
     * @param tranOne [in] A unary function object that accepts as argument items from collOne. This function will be called on any item from collOne for which pred matches NONE of the items from collTwo
     * @param tranTwo [in] A unary function object that accepts as argument items from collTwo. This function will be called on any item from collTwo for which pred matches NONE of the items from collOne
     */
    template<typename ITERATOR_ONE_T, typename ITERATOR_TWO_T, typename COMPARITOR_T,
             typename TRANSFORM_ONE_T, typename TRANSFORM_TWO_T>
    void reconcile_sequences(const ITERATOR_ONE_T & beginOne, const ITERATOR_ONE_T & endOne, const ITERATOR_TWO_T & beginTwo, const ITERATOR_TWO_T & endTwo,
                             COMPARITOR_T pred, TRANSFORM_ONE_T tranOne, TRANSFORM_TWO_T tranTwo)
    {
        std::for_each(beginOne, endOne,
                 [pred, tranOne, beginTwo, endTwo](const decltype(*beginOne) & one) -> void
                 {
                    if(std::none_of(beginTwo, endTwo, std::bind(pred, one, _1))
                    {
                        tranOne(val);
                    }                            
                 });
        for_each(beginTwo, endTwo,
                 [pred, tranTwo, beginOne, beginOne](const decltype(*beginTwo) & two) -> void
                 {
                    if(std::none_of(beginOne, endOne, std::bind(pred, _1, two))
                    {
                        tranTwo(val);
                    }                            
                 });
    }

Now, for any situation where I need to reconcile two different sequences, of any type, I can just do this.

collection_t A;
collection_t B;
reconcile_sequences(A.begin(), A.end(),
                    B.begin(), B.end(),
                    [](const A::value_type & aval, const B::value_type & bval) -> bool { /* comparison code */ },
                    [](A::value_type & aval) { /* A not found in B. Do the work on A here. */}
                    [](B::value_type & bval) { /* B not found in A. Do the work on B here. */});
Clean, concise, to the point. The details of how we iterate over the sequences are left to the imagination, but we only need to rely on a single loop implementation somewhere that takes function objects, and we're golden. 

One can imagine all sorts of simple algorithms that can either build off of existing std::algorithm functions to create more complex behavior, or that work well in conjunction with the existing ones. 

While none of this is, strictly speaking, a direct part of the lambda function concept, the ability to use these pre-built loops can make the life of a programmer significantly easier. Further, as my simple examples have shown, these functions are drastically enhanced by the flexibility of lambda functions to fit themselves into the nooks and crannies of any existing incompatibility.

It's like the Reese's peanut butter cup comercials from many years ago. A lot of people enjoy peanut butter, and a lot of people enjoy chocolate. I'd say that a lot of people think that the two together is just downright awesome.

A lot of people enjoy std::algorithm functions, and a lot of people enjoy lambda functions. I'd say that a lot of people think that the two together is just downright awesome!

Evolability

Consider that reconcile_sequences function that I wrote in the previous section. If you had a slew of hand-written forloops that, effectively, accomplished the same task as my template function, and your manager came to you and said that the reconciliation process needed to be re-written to automatically use all available processing cores, how would you proceed?

Would you write some kind of shared function to centralize the multi-threading parts of the task into one place? Would you implement the same copy-pasted code in each place instead?

I'd assume that most programmers would end up writing a function that implemented all of the shared multi-threading logic in one place. The function may even have a signature that's very similar to my template function above. But to get to that point, they have to go back through all of the their hand written loops, and replace them with calls to a function like that. If you didn't have access to the std::algorithm collection, depending on what you're trying to do, this could be somewhat hard to pull off.

Now, assuming that you're using a relatively new version of GCC (one that supports the -fopenmp commandline option), pretend that you simply wrote a template function like I did, and used lambdas to do all of your calculations. All you'd need to do is ensure that your predicates had no side effects, and that your transform functions were thread-safe. If so, the work to switch from single threaded to multi-threaded (though, admittedly not perfectly multi-threaded) could be as simple as copying the function (lets name it "parallel_reconcile_sequences"), and in the copy replace all calls to "std::for_each" with "__gnu_parallel::for_each", and "std::none_of" to "__gnu_parallel::none_of", and suddenly your reconciliation algorithm is parallel!

Obviously this is slightly contrived, but the point that I'm trying to make is that the more "building block"-ish your code is, the easier it becomes to swap out parts of it later. This lets you evolve the implementation of various algorithms that your code uses without having to actually change the code that uses those algorithms. 

Or consider that you might change the type of the collection that your lambda's working against so that you can iterate over it more efficiently, you can create a specialized version of for_each, or similar, to go with your new iterator, without having to modify the contents of your lambda at all.

But it gets better than just the example above.

So you've got a lambda function, say one of the first examples, that prints every int given to it. Again, a very simple lambda, but it provides a simple representative example to work with.
void foo(const std::vector<int> & vect)
{
    std::for_each(vect.begin(), vect.end(), [](const int val) -> void { printf("%d", val); });
}
Now how would you approach needing to do the same operation on multiple for-loops? If you were writing hand-tuned loops, you'd simply copy-paste and change the variable names. But with lambda functions it can be considerably easier.
void foo(const std::vector<int> & vectOne, const std::vector<int> & vectTwo)
{
    auto lamb = [](const int val) -> void { printf("%d", val); };
    std::for_each(vectOne.begin(), vectOne.end(), lamb);
    std::for_each(vectTwo.begin(), vectTwo.end(), lamb);
}
Simply by storing the lambda in a variable, you can write your loop body once, but use it many times. The lambdas can also be stored in member variables of objects, or returned from functions, or otherwise reused over the lifetime of a program.

Remember that, behind the scenes, the compiler is effectively just making a struct that has an operator() function. Anything you could normally do with a normal struct, you can do with a lambda, including copies, pointers / references to, calling functions with it passed as a parameter, and so on.

I'm sure that you'll see that there are significant usages of being able to quickly and easily define new function objects and pass them around for

Understandability

I think this is the big one, and I can make the case without too many pages of text.

Consider two loops

void(const std::vector<int> & foo)
{
    for(std::vector<int>::iterator it = foo.begin(), it != foo.end(); it++)
    {
        printf("%d", *it);
    }
}

void(const std::vector<int> & foo)
{
    std::for_each(foo.begin(), foo.end(), [](const int val){ printf("%d", val); });
}

Which one is easier to understand in it's entirety?

There's two situations to consider here. People who are experts on this code, and people who are not experts.

At a glance, an expert will be able to nearly instantly understand both versions of this particular loop without any problem, but a person who isn't an expert on this code will be able to understand the second version much faster.

Why?

A non expert, someone who isn't familiar with the section of code in question, or who isn't on amazing ground with C++ yet, is going to be able to read the second version, and only 2 keywords into the loop "std::for_each", will know instantly that whatever comes next is applied equally to the entire range. There's no possibility of short circuiting, stopping early, rewinding, etc. Now, obviously the function that's applied to each item can be selectively applied using an if statement or similar, but it's not possible for that function to effect the iteration, only the action taken on each item.

Imagine that we're working with a much more complex loop body. An algorithm that spans > 200 lines of code, and has many returns or breaks, or continues, in it. Even worse, a goto. How would you be able to tackle how this loop works? To be able to answer as simple a question as "Does this loop apply the loop body to every item?" you'd have to very carefully analyze each line of code.

Now take that same loop body, and translate it into a function suitable for use with std::for_each. Once you see std::for_each, it doesn't matter WHAT's in the loop body, you know for certain that, baring code that never returns, or that exits the program or thread, every single item in the collection will get the function applied to it. The same (or equivalent) loop bodies, but one version could take hours to answer the question of if every item is visited, and the other version takes seconds.

Given that the algorithm (and numeric!) library offers lots of very useful functions, such as copy, backwards copy, transform, for_each, none_of, find_if, and more and more, judicious use of them can drastically increase the ability of newbies to understand what your program's supposed to be doing.

I'd love to see any other rationale from readers, or any arguments toward one or the other kind of loop. Let me know what you think!

Friday, August 28, 2015

Why it's important to plan a project properly

My last post was about the commonly expounded phenomenon of shaving Yaks : http://cogito.jonesmz.com/2015/08/yak-shaving.html

That post used, as an illustrative example, a project that I'm working on professionally.  Please understand that this post isn't intended to be critical of any particular person, project, company, or event, but is instead a gestalt of my experiences all throughout my career so far. Every company can have absolutely amazing people, customers, projects, and culture, and every company can have awful people, customers, projects, and culture. Frankly, a lot of companies have a mix of both good and bad, just like everything else in life. So please don't think I'm trying to criticize any particular organization, this post, and most of the posts that I have on my blog, are intended to be philosophical waxing, not criticism.

Even farther back in the past, I wrote a piece on how the industry that my father is in, residential construction, is remarkably, even eerily, similar to the field of Software Engineering. : http://cogito.jonesmz.com/2014/06/nails-and-sawdust-on-your-floor.html

I had a conversation with my father about the yak shaving activities I was participating in (and not enjoying) at work, while he had no idea what i meant by yak shaving, as soon as I explained the whole deal around a task that has to be completed so that a task can be completed, so that a task can be completed, so that...... so that my actual assignment can move forward, he instantly recognized what I was talking about and was able to match my own frustrated rant with one of barely contained intensity born of 30+ years of shaving his own yaks!

An example of said yak shaving, (which involves significant amounts of me getting the details mixed up, and adding embellishments :) ).

He's at a customer site, building, say, a porch or something similar. Turns out that the customer's door frame is rotting out, and the customer agrees it needs to be replaced before the construction of the porch can continue. So dad calls up the local hardware store and asks about a specific model door frame, the guy on the phone assures him they have it. So he goes the half hour drive out to the store, only to find out they didn't have it! Next store closer is another half hour drive, luckily they have it in stock, but half way there he runs into road construction and has to take a detour. Yada yada yada, ultimately almost an entire day is blown on this project of building a porch before a single nail can be hammered into a single board!

Clearly yak shaving is common in lots of different fields.

We got to talking about mitigation strategies, and we were able to delve into the realm of timeline estimation, foreseeing problems, standard operations that helped prevent problems ahead of time, and so on.

On the subject of time estimates, clearly the answer is "it'll take as long as it'll take!", which is the howl of a hell hound for people trying to timebox a project, or a component of a project. My own time estimation is so notoriously bad that even doubling what my gut tells me is consistently an underestimate.

Over the years, my dad's developed a set of procedures for how to undertake operations, either big or small. Going into the details of all the procedures would be kind of hard to do, there are so many of them, and they're all gut knowledge that would probably take decades for him to convey to me, but the general gist is that he seems to build in risk avoidance at every turn. This consistently adds more time to the project compared to the best-case timeline, but that's not what concerns him. The biggest risk for a construction project, a risk that could see the death of his business, and the endless nightmare of pissed off customers and angry lawyers, is always the worst-case timeline of a project. The kind of worst-case involving a house collapsing while building it or similar disaster.

The first step, obviously, is proper planning. Assessing what the end result needs to be, visualizing it, and modeling it. Getting feedback from relevant stakeholders, such as the customer, an architect, possibly a designer or realator, and asking the relevant subcontractors what various options in the plan will do to the bottom line. On a project that will take up to a year of boots-on-the-ground time, this first phase of planning and review can easily take 3 months, or longer, especially if local government has additional concerns to throw into the mix.

For a software project, where the "agile" methodologies movement has been trying to push release dates from years to days, saying that you want to spend over a quarter of the total project timeline doing planning sounds like a sure way to get canned! Without proper project planning, you can certainly reduce your best-case timeline, but at the cost of dramatically increasing your worst-case. That's the reason for doing things this way, and it's to stave off the worst-case scenario of the project not working at all. In the software world, the cost to fix a defect in the software project drastically increases the later on in the project it's discovered. (http://www.isixsigma.com/industries/software-it/defect-prevention-reducing-costs-and-enhancing-quality/)

What's worse, if I remember my schooling properly, the defect injection rate of a project is at it's absolute highest at the beginning of a project, decreasing over time until the last phase of the project, upgrade / replace / discontinue. (What, you thought the last phase was delivery to customer? Ha!). So, the conclusion that I draw from this is that, since the most defects are introduced into the project when it's the least expensive to fix, and that the longer these defects stay in the project, the more costly they become, you should put as much effort into fixing defects in the early phases of the project as you possible can. In this scenario, a dollar of prevention can be worth a grand of cure!

The best, and cheapest, phase to correct defects is the project inception and planning phase!


Back to the point of shaving yaks, I've recently run into quite a few situations where poor project planning has cost a drastic amount of time late in the project. I went over some of the specifics about what's going on in the project that's causing so much yak shaving, but I really want to put emphasis on my feeling that the delays and yaks are, fundamentally, caused by a lack of defect correction early in the development process.

Of course, everything in the software world is interconnected, just like it is with the construction world. My dad recently did a basement remodel, and while working on it, over and over and over again, ran into huge delays that had nothing to do with the original scope of work. For example, a gas line needs to be moved out of the way, but to do that some electrical work also needs to be done, however doing that properly needs a new circuit put into the breaker and holes town in part of the ceiling that wasn't part of the original plan to route around an unexpected waterline. Get the point? Yaks.

Now, with a construction project at least you can tell just by looking that things are happening, but it can be really difficult to tell when a software project is stalled. "Boy, I haven't heard from my engineer in a couple days. What's he (or she) doing that I'm not seeing?" Well, it could be that your engineer just went off on a vacation without mentioning it, or it could be that he's spent 16 or more hours a day for an entire work week refactoring a delicate piece of complex C++ template code that's running up against compiler bugs that aren't documented very well and he just can't figure it out. Or it could be that he's had to rewrite the unit test framework for the project to account for an unanticipated incompatibility with the relevant standard, or it could be any number of things.

The real question is, knowing that these kinds of unexpected delays and "side tracks" are going to happen, what can you, as a project leader, do to either mitigate or prevent these issues. My opinion is that planning and documentation is the only proven method to handle it. If you're implementing a specific internet standard, for example, frankly the first step is going to need to be translating the incomprehensibly poorly written RFC documents into an actionable requirements document. Next step, involves the design of a project architecture, and spending at least several weeks working on that, and reviewing it, chewing on it, implementing small prototypes of the stuff in that architecture, and just in general ruminating. Why? Because, again, the early stages of a project are the source of the vast majority of project defects!

You wouldn't allow a home to be built without a blueprint. You wouldn't allow a blueprint that hasn't been reviewed by an architect. You wouldn't dig a foundation, ground well, or septic system without properly surveying the ground to ensure that you know whether to add special considerations or not. You certainly wouldn't start framing a house before seeing the blueprint, and you damn sure aren't going to start wiring the house for electricity without consulting with the customer and designer on how the rooms are designed. So why do we accept software projects that are started with no design, no specific details about what it needs to do, no review, no sign off, and components that depend on other components are started before the component they depend on are even built?

I don't know, I really don't.

But I do know this. That defect that managed to hide in your code, back when you forgot to do the inception and planning phase? That defect had it's teen and college years back in the initial development phase, it partied HARD with the other defects that you managed to kill while developing with whiskey and pizza, giving birth to it's own baby defects and puking all over your code at the same time, but this isn't a little defect anymore. It's a giant behemoth of a defect. A bureaucrat of the defect world, grown fat and lazy, but all the more powerful (like Jabba the Hutt), by the lack of attention that you've paid it over it's life.

This defect is your Yak. As soon as you hit the integration phase, this Yak is going to walk right up to you and lick your face. And it's going to keep doing that, invisible to your boss, but staring you in the face from all of 2 inches away for months, until you shave it. Try as you might, you're going to eventually shave this Yak, because it'll either drive you so crazy that you do it yourself over a weekend or 20, or it'll completely prevent progress on your project until it's been satisfied with how thoroughly you shave it.

Shave that Yak!



Wednesday, August 26, 2015

I shaved a yak, but I didn't like it

It seems to be a long standing tradition of every computer science professional to eventually write about the concept of Yak Shaving.

Before I get too far into this, I should point out that while I happen to be using a recent project that I'm working on professionally as an example, it's intended to be only for illustrative purposes. Please understand that this post isn't intended to be critical of any particular person, project, company, or event, but is instead a gestalt of my experiences all throughout my career so far. Every company can have absolutely amazing people, customers, projects, and culture, and every company can have awful people, customers, projects, and culture. Frankly, a lot of companies have a mix of both good and bad, just like everything else in life. So please don't think I'm trying to criticize any particular organization, this post, and most of the posts that I have on my blog, are intended to be philosophical waxing, not criticism.

That being said, instead of writing my own rendition of the definition of this prodigious term, I'll simply point you at another writer who already did a pretty good job describing the phenomenon (though I don't know if I agree with his conclusions): http://sethgodin.typepad.com/seths_blog/2005/03/dont_shave_that.html

Another take from Urban Dictionary: https://en.wiktionary.org/wiki/yak_shaving

And an excellent clip from the show Malcolm in the Middle: http://imgur.com/gallery/D0daEJW

Suffice to say that so many times while conducting a project, an engineer, ultimately, needs to go shave a yak. It's tedious, frustrating, and *very* hard to properly account for up front.

I want to expound a bit on a particular yak that I've had the distinct displeasure of shaving recently.

I'm building an Interactive Connectivity Establishment (RFC 5245) protocol implementation, in library form using C++. This particular implementation has some very specific performance constraints which make the process quite challenging.

I don't want to make this post about the performance constraints, but I will point out the biggest one, which explicitly forbids dynamic memory allocation or deallocation inside the library. This is something that a lot of C++ programmers take for granted. Every STL container can allocate memory dynamically internally, smartpointers, inherently, are about managing dynamic memory, and countless frameworks out there expect the ability to dynamically allocate memory whenever they want. It doesn't seem like it, but, this can become very hard to do, especially when combined with other performance considerations.

So anyway, what's the Yak?

This ICE library isn't my creation from start to finish. I was handed a 75% baked implementation, and asked to finish it. Originally, we thought this would only take a couple months, but it's been on-going for a year now. The most recent challenge involves the dynamic memory allocation issue. The library, as per the ICE standard, notifies it's API consumer whenever it discovers a new connection pathway. The specific mechanism that it was using was to allocate a chunk of memory that was around 5 kilobytes in size. In a normal environment, this would be no big deal, but I'm only able to get pre-allocated (and thus safe to use) chunks no larger than 128 bytes!

Skipping over a lot of details related to the HOW, the What ended up requiring *massive* changes to the library, and the code that uses the library. First, I had to redesign the struct containing the information on this communication path to only include information on one path at a time, instead of four at a time. Then redesign the communication flow between the client code and the ICE library, such as requiring the client to generate authentication information and pass that in at constructor time instead of having the ICE library generate that information and send it to the client, and various other changes. Then, had to create some new datastructures to play nicely with the code that uses this library, and rewrite the code that generates and/or uses these information structures on both sides of the API.

At each step of the way, not only did the change have to be made to the library itself, but the unit testing framework needed to be adjusted, the code that uses the library needed to be adjusted, and hundreds of lines of documentation needed adjustment for each item.

To sum up, since each change required 4 different chunks of code/docs be updated, at a minimum (and, like I said, a few times required changes to client code framework datastructures !), this particular Yak of making my library not allocate memory dynamically turned into a HUGE undertaking. Just adding up the very broad steps undertaken above, there were, at least, 4*4 = 16 individual large refactorings that had to happen to do what, conceptually, could have been a very simple undertaking. A few of them were difficult enough to work on that I got stuck on them for a week or more at a time. Nothing can demoralize a software engineer than a Yak that refuses to be shaved!

Consider also that, in general, each of these changes took well over a day of work, and very quickly, a simple task of "Go do this change" can explode into a month or more of engineering time.

This isn't a cautionary tale of not telling your engineers to refactor their code. Our performance requirements are VERY strict, and there's no way that we could use this code without having made these changes. This is more an illustration of how quickly the list of "high level" or "manager view" items that need to be addressed for business and technical reasons can magnify into dozens or hundreds of individual items that an engineer needs to address.

I shaved that yak, but I didn't like it!

Monday, June 8, 2015

Newly Remodeled Home Office!

I had the joy of starting the setup on my newly rebuilt home office over the last weekend. :-)

Hopefully I'll have it properly setup before too long. For now I just threw things in as quick as I could. Here's a few pictures of how it's laid out.



Wednesday, May 27, 2015

It's crazy how easily bugs can stay hidden

Sometimes it's really surprising how little a bug can actually effect the functionality of an application.

A situation that I ran into today involved a stack object inside of a c++ application.

Consider:

class Stack
{
public:
    typedef Item* pointer;
    pointer clear(void);
    pointer pop(void);
    void push(pointer);
private:
    pointer m_top;
}
This is a stack of pointers. Each Item has an Item::next() function, which points to the next item on the stack.
Push sets pointer->m_next = this->m_top, and then m_top = pointer;
Pop does the equivelent of m_top = m_top->next() ; return previous m_top;
Clear returns m_top and sets m_top to nullptr.

Pretty simple stuff.

Consider though that the actual implementation of this stack is hand-written assembly, for two reasons. 1) To squeeze every ounce of performance out of the application. 2) To get atomic thread safe operation.

Even better, Windows offers something called an SList : https://msdn.microsoft.com/en-us/library/windows/desktop/ms684121%28v=vs.85%29.aspx


Anyway, suffice to say, the details of this datastructure are anything but trivial.


Here's where the bug comes in.

Our implementation of Stack::clear(), depending on what platform was being compiled for, actually had the same implementation as pop. The obvious consequence of this is that clear() didn't ACTUALLY empty the list. It only pulled off the first item!

This code has been in use for over ten years! No one noticed a problem with it, in all of 10 years of production use. How crazy is that?

Talk about an obvious bug having almost no effect :-)

Thursday, April 23, 2015

Gotta love new places to live

From summer 2012 through to summer 2013, my parents undertook construction on a large addition onto their home to allow for my mother's parents to move into the new space to be with family in their twilight years. This expansion included a new in-laws suite, including small kitchen, full bathroom, closet space, and living area, as well as an expansion of the former attic into a guest bedroom, and new basement area under the new space. 

In the summer of 2014, I made an agreement with my parents to move back into their home to help them with their living expenses. This arrangement included the agreement that I would help fund the remodeling of the basement into a fully self contained apartment. Essentially, the breakdown of the costs are that I fund the labor, while my parents pay for materials. Roughly speaking, we'll each be paying in about $15,000, for a total remodeling cost of around $30,000.

I moved out of the condo that I lived in from April 2013 until April 2015 this last weekend. Unfortunately, the work on the basement remodel isn't finished, so I'm currently staying in the guest bedroom. However, there's been a huge amount of progress.

Pictures!!!

What'll eventually be the bedroom:



Bedroom closet 1:

Bedroom closet 2 combo hallway to office:



Private office interior. In this first picture you can also see the interior of the server room through the wall.




Hallway between bathroom, laundry, and bedroom:


 Lenin closet in hallway.

Bathroom



Laundry





Kitchen






Living Room






Spare room






Saturday, February 21, 2015

Horsepower!!

My new Dell C6100 has arrived.

This is a 4 blade blade-server type setup. It's got 4 blades, which operate as completely independent computers with some shared hardware, like the harddrive back-plain, shared power supplies, and cooling infrastructure.

Each blade has 6 2.5 inch harddrive bays, for a total of 24.
Each blade has 2 Intel Xeon processors. Each processor has 6 full cores, supporting 12 simultaneous hyper-threads, for a total of 24 cores per blade.
Each blade supports a maximum of 96 gigs of ram, with this particular server populated with 48 GB of ram per blade via 6x 8GB chips.
Each blade supports 1 small form-factor PCI expansion card. Not sure what I'll be adding.

Front view.
 Top view

 Side view

 Back
Interestingly, each of the four blades can be individually ejected, even while the others are powered on and in use. Same with the power supplies, allowing for continuity of service

First view of the inside.
 Popped out one of the power supplies. Upside down view of the label.
 The connector supplies power to the motherboard.
 Power cable goes here! They provide a nice little handle that you can use to pop it out.
 Each blade can be pulled out individually, and here's how it looks.
 Two x two blades
 Here's the connectors for the power supplies.
Power supplies just hanging out.
Motherboard connection to the shared power bus.
Cleaned up the cable management a bit after I took the other pictures.