Kotan Code 枯淡コード

In search of simple, elegant code

Menu Close

Using the Visitor Pattern to Compare Object Graphs

Recently I found myself in a position where I needed to write some unit testable code that would perform a ‘diff’ against two fairly complicated object graphs. By object graphs, what I really mean are domain objects (POJOs, POCOs, whatever you want to call them) that have plenty of attributes as well as child objects and collections of child objects.

The goal of the exercise was to traverse the entire object graph of both objects and produce a collection of POJOs that represented the list of all things that changed relative to the source. So, if I changed an attribute, I should get an object that tells me the name of the attribute that changed and the object on which the attribute changed. If I add some object to an object in the destination (or remove it from the source), I should get a result object indicating the name of the new object and the object to which it was added. Conversely, if I add something to the source (or remove it from destination), it should appear in my results as a ‘removed’ object.

There are all kinds of academic papers illustrating the best way to write differencing algorithms using all kinds of math that quite frankly just makes my head hurt. The consumers of this code will be writing HTML that displays differences on a well-known shape. In other words, I’m not doing a blind line-by-line text comparison, I am comparing two things of a known shape, which means my UX can be far more intuitive to the user than something like an SVN compare or a Windiff.

I split this problem up into a couple of chunks. The first chunk was traversal. I wanted a way to guarantee that I would zip through the entire object graph. Again, I know the shape of these objects so I do not need a generic algorithm here. The Visitor pattern seemed ideal for this.

I created a simple interface called ObjectGraphVisitable (not really, I have changed the names to protect the innocent). Each  class in my graph will implement this interface which really just defines an accept method. Each object in the hierarchy has an accept method that, in turn, invokes the visit method on the visitor. If you’ve done any Objective-C/Cocoa programming then the Visitor pattern’s form of indirection should already be very familiar to users of the delegate pattern (e.g. the visitor is just a special kind of delegate).

Here’s the ObjectGraphVisitable interface:

public interface ObjectGraphVisitable {
    void accept(GraphVisitor visitor);
}

And here’s the Visitor interface:

public interface GraphVisitor {
    void visitPerson(Person p);
    void visitAddress(Address a);
    void visitRegion(Region r);
    void visitHighScore(HighScore score);
}

At this point we have the interfaces that typically make up the visitor pattern. The idea here is that when the visitor starts at the top level of an object graph, we call accept(this) where this is the visitor instance (you’ll see that next). It is then up to the root-level object in the hierarchy to recursively (or iteratively, whatever you like) invoke accept on the children, which in turn invoke accept on their children, and so on down the line until every object in the graph has called visitXXX on the visitor. We can then create a class that implements the visitor interface called ObjectGraphComparisonVisitor (again, name sanitized, your name should be descriptive and self-documenting) and contains the actual “diff” comparison logic.

Here’s a sample root level domain object that can be visited (accepts visits from a visitor):

public class RootLevelDomainObject implements ObjectGraphVisitable {
// other class stuff here...

    public void accept(GraphVisitor visitor) {
      visitor.visitRootNode(this);
      for (ChildNode child : this.children) {
        child.accept(visitor);
      }

    }
}

The key aspect of this pattern that often takes a little getting used to is that the visitor is passive and not in control of what gets visited. It calls accept on the root node and then sits back while all of its visitXXX() methods are invoked. This feels kind of like SAX XML parsing where instead of actively walking the graph you are passively fed XML nodes as they are parsed.

So now that we have the object graph set up to be traverse-able we need to take that traversal and use it for comparison. To do that, we can create a class that implements GraphVisitor. This class, called something like ObjectComparisonVisitor, needs to do 3 things:

  • Find all objects in the destination graph that do not occur in source
  • Find all attributes that have changed on objects that occur in both source and destination graphs.
  • Find all objects in source graph that do not occur in destination
I won’t show all the code here, I’ll leave that as a fun exercise for the reader. But the basic psuedo-code is as follows:
  1. For each object type in the graph, maintain a hash table that stores the items in source (source is what you visit)
  2. As each object type is visited, throw it into the hash table (note you need some kind of uniqueness test for each of your domain objects… you should have that anyway)
  3. As each object type is visited, check to see if it’s in the destination graph as well. If it is, do an attribute-by-attribute comparison.
  4. When the entire source tree has been visited (your call to accept is single-threaded, so it won’t return until every node has been visited), traverse the “source” hash tables looking for visited items that do not exist in the destination.

As each of these tasks are being done, you’re progressively building up a collection of ModelComparisonResult objects, which really just contain an enum for the difference type, the name of the object, the name of the attribute, etc.

When all is said and done, the comparison visitor object can provide rich difference results in a format that is both unit testable and amenable to display via command-line or GUI. There are other variations of this that include visiting both the source and the destination trees and then doing a final comparison afterward. This second variation is typically used when you don’t have discrete uniqueness tests on every object in your graph, so you keep track of divergence points in the visitation pattern to detect graph changes while attribute changes are still easy to detect (you could even use Reflection to check for those).

Anyway, this was really the first time I’ve ever found a practical use for the Visitor pattern outside of academics and job interviews, so I thought I would share.