Wednesday, 26 August 2009

I don't believe in "Business Value" (sort of)

Ok, let's define terms. I'm not against value-oriented thinking. in fact, I'm all about the customer. I think whatever can add value to the customer or the output the customer wants (presumably software, in my case), the better. But this term "Business Value" has crept in to the Agile community, as a bastardization of the Lean concept of "Customer Value-Added activities" and it worries me. I'm concerned with terms here, and the term is used two ways. I'm increasingly concerned about it's sloppy use and the impacts of such.

In Lean, specifically in Value Stream Mapping there are two kinds of activities - those which add value to the customer, and those which do not. You analyze a process (say, software development from feature request to delivery into production) and you identify the time each activity takes, and whether that activity added or failed to add value to the customer. Writing code, writing tests, designing, working out clarifications of the requirements - these were customer-value-added activities. Waiting for a server to be deployed, waiting for three VPs to sign off on a design, etc. these were non-value-added activities. I like this kind of cycle-time measurement, because it forces the observer to be in the mind of the customer. If there's a business process which slows down the process, or, in lean terms, lengthens the cycle-time, it's an inefficiency and you try to find a way to remove it, circumvent it, etc.

At a fairly large financial service client, I saw a rather crazy thing occur. The coaches and suits were doing a value stream mapping. There were business activities whose performance served a business need, but not the needs of the customer. This might be a strategic steering committee meeting which a Team Lead might be required to attend, or a process step that existed to satisfy a regulatory requirement imposed on the business. These weren't really "customer value added" activities, but the coaches and business were unwilling to see them as non-value-added activities, or "waste". This led to an interesting compromise - the concept of "business value added" activities.

Now, I don't mind if there's a piece of waste in your cycle, and you acknowledge it, but then decide, for strategic trade-off reasons, to keep it. That's a sane business decision, even if it rankles my personal aesthetic. It is the prerogative of a company's decision makers to judge the interests of their customer against the interests of the organization and of the shareholders and come to whatever balance they feel best serves the principles and values and mission of their company. But by calling it "business value added" what they did, in effect, was move it out of non-customer-value-added column, which effectively stopped people from considering whether or not to reduce or eliminate it. It became synonymous with "customer value".

On the other hand, often when people use the term "business value" in the Agile community, they mean it as the customer value of a product feature, and are using it to help focus people's prioritization of work/features on the backlog (worklist for the uninitiated). And I get it. In that context, you need to have product managers prioritizing based on value, not cost, for reasons that Arlo Belshee can better explain. But the term business value gets mixed up in these various contexts, and I have heard consultants who, two years earlier would have called that committee review of design "waste" simply brush it off as "business value added" activity, and I think it's the sloppy language around BV and CV and NVA that is at the root of this phenomenon. In other words, hard nosed management consultants have stopped calling out the Emperor's nakedness. Since these external entities are the few empowered to call bullshit on a client, this means that less such is being called, to the detriment of our industry.

So yes, this is an argument about definitions and semantics. Yadda yadda yadda. But for my money, I want crisp meaning - especially if it allows the "cloak of truthiness" to descend and sow confusion about what's important. And if delighting your customers is critical to your long-term business survival, then anything not in that "value conversation" is waste (sorry, but it is - deal with it). You might have to live with some waste you can't get rid of yet, but you should never stop calling it like it is. Otherwise you begin to sink back into the very state from which you used value stream mapping to escape. And as far as I'm concerned, serving, nay, delighting the customer is the only way to be in business for the long haul.

Saturday, 20 June 2009

Make Fake/Stub objects simpler with proxies.

I recently wrote a convenience component. It's a Java dynamic proxy creator with a default handler that simply delegates any methods called on it to a provided delegate, throwing an UnsupportedOperationException if such a method isn't implemented. Why? Because I was sick of writing huge Fake implementations of medium-to-huge interfaces when I needed a fake, not a mock, and I was frustrated with using EasyMock as a convenience to create Fakes and Stubs. This delegator allowed me to implement only the parts of the API provided by the Interface, and just not bother with the rest. This was useful for such things as factories or other "lookup" components which I was going to feed (in my tests) from some static hash-map.

This gets back to my new motto: I hate boilerplate. If I have to implement every interface method in a fake object, then I'm going to go crazy implementing public String getFoo() { return null; } all over the place. Especially since that method will never get called in my test, and in fact if it does, I want it to fail noisily. So I could write public String getFoo() { throw UnsupportedOperationException("Test should not call this method"; }. That's great, but if I have to do that for a component that's got thirty methods, my test classes are going to be littered with this. Instead, I can do:

public class MyTestClass {

  private final hMap<Long> BAR_CACHE;

  @BeforeClass
  public void classInit() {
    BAR_CACHE = new HashMap<Long>();
    BAR_CACHE.put(1, new Bar(1, true));
    BAR_CACHE.put(2, new Bar(2, false));
    BAR_CACHE.put(3, new Bar(3, true));
    BAR_CACHE.put(4, new Bar(4, true));
  }

  @Test
  public void testFooValidityCount() { 
    BarFactory fooDep = Delegator
        .createProxy(BarFactory.class,new FooDelegate());
    Foo foo = new Foo(fooDep);
    foo.processBars();
    assertEquals(3, foo.getCountOfValidBars());
  }

  ... more test methods that need to look stuff up...

  public static class FooDelegate {
    public void init() { /* ignore */ }
    public Bar getBarWithId(int id) { BAR_CACHE.get(id); }
  }
}

If we assume that FooFactory is an insanely long interface, then this allows me to do a very clean Fake implementation with only a few bits of implementation. Otherwise, a FakeFooFactory could be longer than all the test code in this testing class. The other thing I like about this, is that - especially if you're testing code that uses dependencies you're not intimately familiar with, nor have access to the source, you can quickly implement an empty delegate and let the exceptions in the test guide you towards the leanest implementation. You'll get an exception any time your delegate is missing a method used by your System Under Test on your dependency. Handy.

Essentially, the Delegator is simply a pre-fab proxy InvocationHandler, with some reflection jazz in the invoke() method to lookup methods on the delegate, or throw... but it saves a bunch of extra boilerplate. I've had fake classes so large THEY needed tests. This is a much nicer approach for non-conversational dependencies (where mocks would be better). I tried to do this with EasyMock, but the implementations were really awkward due (ironically) to the fluent interface style. This is just a rare case where it's cleaner to just have a minimal implementation.

Unfortunately, the code I wrote is closed-source for my employer, but it's simple enough that everything I've done above should let you implement your own.

... and remember, kids... Mocks, Stubs, and Fakes are all Test Doubles, but they're all not the same thing.

Sunday, 12 April 2009

Re-thinking Object-Relational Mapping in a Distributed Key-Store world

JDO, JPA, Hibernate, Toplink, Cayenne, Enterprise Object Framework, etc., are workable object to datastore mapping mechanisms. However, you still have to optimize your approach to the underlying data system, which in most cases historically have been RDBMS. Hence these systems, their habits and idioms are all quite strongly tied to Object-Relational Mapping. Since Google's App-Engine released a Java app hosting service, with a datastore wrapped in JDO or JPA via DataNucleus this week, people have been playing, and the difficulties of these very habits have become clearer. It's easy, when using JDO or JPA with BigTable, to design as if we're mapping to an RDBMS, which a distributed column-oriented DBMS like BigTable is not. There are some good articles on the net about how to think about this sort of data store:

http://www.mibgames.co.uk/2008/04/15/google-appengine-bigtable-and-why-rdbms-mentality-is-harmful/

http://torrez.us/archives/2005/10/24/407/

http://highscalability.com/how-i-learned-stop-worrying-and-love-using-lot-disk-space-scale

I'm struggling with this myself, being a long-time (longer than most) user of Object-Relational mapping frameworks. For example, one cannot do things with BigTable like join across relationships to pre-fetch child object data - a common optimization. Keystores are bad at joins, because they're sparse and inconsistently shaped. The contents of each "row" may not have the same "columns" as each other, so building indexes to join against is difficult. We actually need to re-think normalization, because it's not the same kind of store at all.

Interestingly, OO and RDBMS actually DIDN'T have an impedance mis-match in one key area in that Entity-Relationship models bore a striking structural resemblance to Object-Composition or Class diagrams, and the two could be mapped. Both RDBMS schemata and Object Models were supposed to be somewhat normalized, except where they were explicitly de-normalized for performance reasons. With distributed key-stores, we're potentially better off storing duplicate objects, or possibly building massive numbers of crazy indices. At this point, I don't know what the right answer is, habit wise, and I work for the darned company that made this thing. It's a different paradigm, and optimizing for performance on a sparce key-store is a different skill and knack. And since we're mapping objects to it, we'll have to get that knack then work out what mapping habits might be appropriate to that different store.

Eventually we will work out the paradigms, with BigTable and its open-source cousins (HBase and HyperTable) creating more parallelism and horizontal scaling on a "cloud-y web" as I've seen it called. The forums - especially the google-appserver-java and other forums where objects and keystores have to meet - they will grow the skills and habits within the community. But we have to re-think it, lest we just turn around and say "that's crap" and throw the baby (performance/scaling) out with the bathwater (no joins/set-theory).   

Saturday, 28 March 2009

Hiatus... should be over

So apologies that I haven't posted anything - I never intended to be away from the blog for this long, so soon after starting it. I have recently been in transition, and am now working for Google as an internal software development coach. Having disclosed that, I should mention that this blog isn't Google-specific, and mostly the things I'm speaking about here are things I have seen elsewhere, fodder from discussions with other Agile Community members and practitioners, and may include aspects from my current job, but not specifically, and usually blended in with the former. In other words, I won't be leaking Google's secrets on this page. ;)

Having said that, it's so far quite a fun place to work, with a wide variety of development cultures. There's a great spirit of exploration and experimentation in my new firm, which allows different teams to try different things. I leave it as an exercise for the reader to see how this can be challenging, organizationally, but it certainly allows G to innovate - as is nicely shown in the marketplace.

Anyway, I have four or so articles partially written from before that I hope to publish in the next couple of weeks. It's been an adjustment (haven't been an actual employee for quite some time now), but I'm starting to get a bit of a stable schedule so I can pay attention to this sort of thing (blogging).

Cheers,

Christian - the Geek in a Suit

Oh, P.S. I wore a suit to my first meeting with the team I'll be working with, and got soundly ribbed for it. lol. It's ok... I'm a big boy. I can take the heat. -cg.

Friday, 23 January 2009

Are mocks just stubs by another name, or something more?

[An older article that I published internally to a client community, reprinted with changes to remove client references]

An opinion that I've run into among some of my clients is that mock objects are just what we've always called stubs, and that it's an old approach. It's actually a quite common perspective - one I have held myself for part of my career. In fact early mock object approaches were very much like "fake implementations", but modern mocks are different. While they can both be considered "test doubles" or "test stand-ins", stubbed out interfaces or provide expected data for the system under test, mock objects provide behavioural expectation. These terminologies can be confusing, but we can sort that out.

jMock and EasyMock are two examples of mock object frameworks which allow for the specification of behaviour. jMock uses a domain specific language (DSL) such that you code the expectations in a fairly verbal syntax (often called a fluent interface). Something along the lines of

myMock.expects(methodCall(SOME_METHOD)
.which().returns(A_VALUE))
.then().expects(methodCall(OTHER_METHOD))

... which should be vaguely like english to the initiated. EasyMock, on the other hand, uses a "proxy/record/replay" approach instead, which some find easier. The point is that they both define a set of expected interactions, rather than a first this state, then the next state only.

Martin Fowler, around the middle of his article "Mocks Aren't Stubs", after describing fakes vs. mocks approaches in more detail. He starts to use a clarifying terminology which I like:

"In the two styles of testing I've shown above, the first case uses a real warehouse object and the second case uses a mock warehouse, which of course isn't a real warehouse object. Using mocks is one way to not use a real warehouse in the test, but there are other forms of unreal objects used in testing like this.

"The vocabulary for talking about this soon gets messy - all sorts of words are used: stub, mock, fake, dummy. For this article I'm going to follow the vocabulary of Gerard Meszaros's book. It's not what everyone uses, but I think it's a good vocabulary and since it's my essay I get to pick which words to use.

"Meszaros uses the term Test Double as the generic term for any kind of pretend object used in place of a real object for testing purposes. The name comes from the notion of a Stunt Double in movies. (One of his aims was to avoid using any name that was already widely used.) Meszaros then defined four particular kinds of double:

  • Dummy objects are passed around but never actually used. Usually they are just used to fill parameter lists.
  • Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an in memory database is a good example).
  • Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what's programmed in for the test. Stubs may also record information about calls, such as an email gateway stub that remembers the messages it 'sent', or maybe only how many messages it 'sent'.
  • Mocks are what we are talking about here: objects pre-programmed with expectations which form a specification of the calls they are expected to receive.

"Of these kinds of doubles, only mocks insist upon behavior verification. The other doubles can, and usually do, use state verification. Mocks actually do behave like other doubles during the exercise phase, as they need to make the SUT believe it's talking with its real collaborators - but mocks differ in the setup and the verification phases."

This use of behavioural expectations for "Test Double" objects is quite handy, especially in systems where you have components which make use of heavy service objects or (gasp) singletons which may do more than a simple "call-and-answer" method-call on the object. Deep call chains may require stronger behavioural testing. Being able to provide an object to the system under test that expects to be called a certain way and in a certain order can create a much more precise test, and reduce the amount of code you have to write in a "fake" object. Otherwise, to fully and precisely test, one ends up with severely large number of Test Dummies and ballooning "Fake Objects", which themselves could have error and often have to be tested. Having a system like jMock or EasyMock can reduce the size of your testing code, thus removing some code that could easily become out-of-sync with the system under test, and either introduce false-positive errors or become insufficient and therefore meaningless. So less code, less maintenance, less syncing problems, and the tests are able to be more precise at the same time.

Another approach is to radically re-factor your code into smaller components, each of which is substantially simpler to test. If a component does one thing and does it well, and components interact and collaborate, then you often can use simpler mock behaviours, or simple fakes without as much effort on the testing. Great resources for testable code are available here .

Mocks aren't always easily intuitive for someone used to building fakes (it wasn't for me, and it still occasionally trips me up), but once you're comfortable with the approach, it can be much crisper. This is especially true as people try to get better coverage in isolated unit tests, or who are trying to test-drive their software. The linked Fowler article is a good one, and certainly worth reading for those trying to figure out how to more meaningfully test components without having to start up external servers or simulators.

Friday, 2 January 2009

Why I hate the java.util collection library.

<rant>

It's very simple. While it was a vast improvement over Vector and Dictionary of their day, the Collections library as of Java 1.2 did what all new Java APIs from Sun seem to do... require lots of code to use simply and allow the user to do invalid things.

I won't spend a lot of time on the former, since it could be the topic of a whole other blog post, and I should preface all of this by saying that Java is my most proficient language. So this is not an anti-Java bigot speaking... just a frustrated user who wishes Sun and the community wouldn't keep heaping bad APIs on top of bad APIs.

Sorry... back to the point.

Immutability - it's all backwards... what's with that?

The big issue I have with the Collections API are about allowing the user to do wrong things. This amounts to Sun having inverted Mutability vs. Immutability in the class heirarchy. Immutability, in any language that wants to guarantee semantics, ease concurrency and resource contention, and otherwise clean things up should have immutability as a default. Something set shouldn't be volatile or mutable unless specified as such, and the semantics should enforce this. But with the collections API, we have immutability as an optional characteristic of Collections. To use a simplistic example, you can do this:

Collection c = new ArrayList();
c.add(foo);
c.add(bar);

Collection has an add() method. This means that, by definition, Collection is mutable. "But wait!" you cry, you can obtain an immutable version of this collection by calling Collections.immutableCollection(c);. Sure. At which point you have something that conforms to the contract of Collection, but which may throw exceptions if part of that contract is relied upon. In other words, you should not have access to methods that allow you to break the contract. To allow this is to have written a bad contract with ambiguity. Now, to properly guard against the possibility of stray immutable collections, you may be forced to check immutability before executing the contract (the add() method) or you may have to guard against the exceptions with try-catch logic and exception handling. You can see some of this in the concurrent library's implementations of concurrent collections. It's not bad, but could be simpler with an immutable collection interface.

Additionally, consider how hard it is to create anonymous one-off implementations of Collection. You have to implement not only size, contains(), iterator(), but all the storage logic. If you're wrapping an existing data structure and merely wanted a read-only view on the structure, you are forced to implement all that extra API in your code purely to satisfy the optional contract provided in the Collection definition.

The point here is that an immutable Collection is a sub-set of the functionality of a MutableCollection. That should be obvious, but apparently wasn't to Sun. Consider had Sun used the model used in NeXTSTEP (and now Apple's Cocoa) APIs. Collection would have been defined as (simplified):

public interface Collection<T> extends Iterable<T> {
    public Iterator<T> getIterator();
    public int getSize();
    public boolean isEmpty();
    public boolean contains(T object);
    public boolean containsAll(Collection<T> object);
    public T[] toArray(T[]);
}

and

public interface MutableCollection<T> extends Collection<T> {
    public boolean add(T object);
    public boolean addAll(Collection<T> objects);
    public boolean remove(T object);
    public boolean removeAll(Collection<T> objects);
    public boolean retainAll(Collection<T> objects);
    public void clear();
}

This would, ultimately, mean that a Collection instance, typed as a Collection would not have any mutable methods available to invoke, let alone that would need guarding against stray immutable invocations. There would then be two strategy for guaranteeing immutability. One... cast the stupid thing as Collection, and onothing that has access to the cast can get at the MutableCollections methods (except by explicit reflection). Alternately, add a "getImmutableCopy()" method to the MutableCollection interface that creates a shallow copy that is NOT an implementor of MutableCollection... merely of Collection. Then you have a safe "snapshot" of the mutable object that can be freely passed around without worry that something else will modify it.

Ok, why is this such a big deal? It's about having code that means what it says. If I have to guess about the run-time state, or more concrete type of an instance to know if it's OK to invoke one of its methods or not, then I'm working with implicit contract, and that's murky territory. Java, by making an ImmutableCollection a special implementation of Collection, has inverted the hierarchy of contract, and exposed methods that are not truly available for all implementors. Optional interfaces are fine, but you don't expose the optional interface above where it's true. It's a basic piece of encapsulation that the Java folks just seemed to forget.

Now, this is a decade too late, this little rant. Truth is, I made it when I worked at Sun, but was quite the junior contracting peon, and had no real voice. Now, I have a blog, and am free to whine and be annoyed in public. :) But I hope to make a more general point here about contract. Optional contracts (APIs) need to be handled very carefully, and in a way such that an unfamiliar programmer can understand what you meant from how the contract reads. Look at your interfaces from the perspective that it should not offer what it (potentially) cannot satisfy. Polymorphism doesn't require "kitchen-sinkism" in an API. Just careful, thought-out layers.

The issues of testability also arise here, insofar as a class that has to implement optional APIs must, therefore, have more tests to satisfy what are, in essence, boilerplate code. If I made a quick-and-dirty implementation of Collection as a wrapper around an already existing, immutable data structure, then I have to implement all of those methods and test them, when in fact, half of them will throw an exception. This means (to me) that they probably shouldn't even exist. Code with a lot of boilerplate code (lots of extra getters and setters, or over-wrought interfaces) tend to be hard to test, and one of my big annoyances in life (these days) is testing boilerplate code. Ick.

</rant>