Sunday, July 27, 2008

The Allure of Code Reuse

The desire to reuse is well ingrained into the software development process.

The idea of code reuse suffers from a poor choice in wording. I have not seen great success with code reuse. I have seen significant productivity gains from using object files, libraries, services, and frameworks.

The bad kind of code reuse often involves what is known to programmers as ifdefs, custom environment variables, shims, wrappers, and most commonly the copying and pasting of code. Code reuse is inside the box or white box. Libraries are black boxes.

The "ifdef" type changes have an insidious nature. Firstly those involved work under the assumption that just a few little changes will ultimately be harmless to the code base. Secondly the thought of reusing a large amount of code seems enticing but the very thought beguiles the user with the thought that reuse is always cheaper than rewriting software.

The trickery is the idea that your only two choices are reuse or rewrite the code.

"We can take Kim's code and just tweak it a bit to handle our needs!"

This thought is as if it came inside of a vacuum or in absolute defiance of how code takes the shape that it takes.

I have designed many clean domain models, object models, and system architectures in the pure and clean world of theoretical idea. Sometimes these designs have been based on well understood programming paradigms and the resulting code was very much the embodiment of the design. Sometimes. More often the development runs into issues and these issues cause the design to be changed.

I recently developed a 2D graphing/charting package. This is nothing new to me because I have done a few. What was new to me was the system that it had to be built upon. Fortunately developing 2D graphics in a GUI based OS has not changed much since the early days of Macintosh, Amiga, OS/2, and Windows development. Fundamental rules still apply. For instance, if you want to cause something to be drawn or refreshed you call Invalidate on the window or control.

Because the fundamental rules apply my design for this system was "mostly" correct. As I coded I soon discovered that the OS had limitations that I did not expect. These limitations caused me to make "in place" design changes to get around the weakness of the OS. These in place changes mutate the design of the overall system making it difficult to remember or explain what the code does and why it does it that way. Obviously one uses all of the tricks of the trade to capture the intent of the code but I often hear people reading such code and saying, "That looks weird. Why did they do it that way when all you have to do is blah blah blah".

I myself have even forgotten why I did something one way and I put in the more obvious solution only to remember, "Oh yeah, that doesn't work. That is why I had to do it that way."

One of the most common short comings is performance. There may be some call you can make in a provided library that does what you need, but does it too slow. Performance is a requirement and it must be met. Another issue may be memory usage. Any of these issues cause the code to deviate from the theoretical design in order to meet the demands of reality.

Forgetting or ignoring that code is filled with such special case code is one of the traps of code reuse.

Code often takes the path of least resistance. I have seen developers change their code because of a deficiency in their own code. Sometimes it is expedient to just fix a problem where it is encountered instead of drilling in and finding the real problem in their own code. Some developers do this because they do not consider that their code is flawed and others do it knowing the flaws of their code but justify that this is the most expedient solution to the problem.

Regardless of the reasons code is filled with little "bypasses" around bad or clogged veins of code.

No back to the topic, code reuse.

As an argument if favor of code reuse you will hear it said, "This code does almost all we need already."

If I may I would like to say that 80% is almost. I pick that value not as an absolute but as a common variable used by programmers when describing code. It does not really mean exactly 80% but it means "mostly".

I have heard and witnessed that 80% of the code can be developed in about 20% of the time to complete a software feature. It is the last 20% that is difficult. Again, 20% doesn't mean exactly 20% but it means "the devil is in the details".

Please remember that code has bypasses all through it to avoid deficiencies. With that in mind take into consideration that the last 20% of the code takes 80% of the time to develop. This 20% of the code is the very same code that will cause the code to be difficult to reuse.

So, if you can write the theoretical ideal of the code (the first 80%) easily then why exclude that when considering code reuse. Remember that the two factors of code reuse often are reuse or rewrite.

I have ported code from Macintosh OS 7, 8, and 9 to Unix and Windows. I know from years of experience on large systems that it is better to first port the design, the theoretical model, the ideal, than to reuse code filled with bypasses. (Well our code is not filled with bypasses. Yeah, right.)

Instead of taking code that is almost correct and filling it with ifdefs and conditions and bypasses around the new problem I suggest to develop the model from the experience gained from the previous solution.

With this design reuse then let the code take its natural path of bypasses and conditions to handle the deficiencies of the new problem space.

Even if the old 80/20 axiom is pure myth, I still recommend to rewrite based on a clean design.

Now many will hear rewrite and equate that to expense. From my experience the reuse situation causes the code to reach a state of confusion such that it is not maintainable or extensible and thus it goes from satisfying one task sufficiently to failing to satisfy two tasks sufficiently and thus forces a rewrite. A forced rewrite due to code collapse is considered expensive in that the collapse usually happens at an inopportune moment when the system is under new loads and stresses.

I once worked on some code that was reused by another team. I encountered a performance flaw that needed to be addressed. To do so meant changing the parameters to several methods. Making these changes would break the other team. The other team did not have time to change their code to provide these new parameters. I was stuck. The code began to do two tasks poorly. It only compounds from that day forward.

Now there is another topic that is not considered here. That is the development of frameworks and code units that are meant to be used by many teams. I have developed such systems before and their reuse has been beneficial. Such reuse is really at a higher level than code reuse. These are libraries and services that are reused and the internal code is a black box to the users.

In summary, don't limit yourself to the two choices of reuse or rewrite. There is more, there are designs to be reused and there are experiences to build upon. Code is filled with bypasses around environment deficiencies and thus makes reusing code difficult filled with pitfalls.

3 comments:

Anonymous said...

Amen to that.

Dan said...

Geoff-

I think you make some good points -- the main point being that certain types of "reuse" cause problems. You mention "shims" and "wrappers" and list these with the "bad kind of reuse." Why is that? Can you explain what you mean?

I agree that the "copying and pasting of code" is the most common form of bad reuse. I think you've heard me say before that this is the number one way that bugs get put into the code -- because of the bugs that existed in the copied code.

One of the common problems I've seen with copying and pasting code is when comments that describe a method and its parameters are copied and pasted yet are never updated to reflect the changes made to the code after it is pasted.

I believe that a "good form" of code reuse is the creation of libraries or services that are intended to be used by multiple consumers. I think that your example of not being able to alter the parameters on methods because another team was using your code could have benefitted from a versioned release of a library. Version 1.x would have had the parameters as they started. When the parameters changed, version 2.x would have been used. The other team could have continued using version 1.x until they were ready to move to version 2.x. Of course, a mechanism would have been needed to manage the versions which probably was not available at that time. It seems to me that those kinds of mechanisms are more readily available today and have probably been created because of problems like the example you gave.

I'm interested to hear what you thing of these ideas.
-Dan

Geoff Slinker said...

Dan,

A better place to have a discussion would be at the Digerati Illuminatus users group.

http://tech.groups.yahoo.com/group/digerati-illuminatus/

Let's take it there. I invite others to participate as well.

Geoff