Kotan Code 枯淡コード

In search of simple, elegant code

Menu Close

Tag: patterns

Avoiding Graph Modeling Antipatterns

In my previous blog post, which I just posted this morning, it turns out that I made a pretty large modeling no-no. In terms of consequences to the application and the survival of the zombie apocalypse, I think the issue was pretty minor. However, it does illustrate that no matter how awesome a graph database product is, you still need to put some thought into how you model the database.

In the previous model, I could make assertions like user-[:SPOTTED]-zombie and there were some attributes on the spotted relationship such as the date the sighting took place. As it turns out, this is an antipattern. There are a couple of things I did wrong here.

Firstly, the attributes on the relationship have nothing to do with the weighting or nature of the relationship. The attributes I put on that relationship were informational and arguably belonged on a separate entity. In other words, the sighting was a noun that I had elided from my relationship sentence “a user spotted a zombie”

In the forthcoming Graph Databases book being published by O’Reilly, there is a best practice in graph modeling that says that when the interaction of two nodes produces something of significance, that interaction should be modeled as its own entity rather than modeled as properties on the relationship between those nodes.  In my sample zombie domain, this means that a user produces a sighting when they spotzombie. The sighting has metadata about the actual sighting, which can include the latitude and longitude (I know that neo4j has its own built-in geospatial support, which I will ignore for this example) of where the sighting was, and other information about the sighting itself, such as the user’s confidence level that they saw what they think they saw (aka the “bigfoot factor”).

Now I can go back and remodel my previous database using the following graph model created via Cypher:

	CREATE user1={_label: 'user', firstname: 'Kevin', lastname: 'Hoffman'},
		   user2={_label: 'user', firstname: 'Bob', lastname: 'Bobberson'},
	       alvin={_label: 'zombie', name: 'Alvin', power:20},
	       simon={_label: 'zombie', name: 'Simon', power:30},
	       theo={_label: 'zombie', name: 'Theodore', power:40},
	       sighting1={_label: 'sighting', date:'01/01/2013', lat: 42.12, long: 21.03},
	       sighting2={_label: 'sighting', date:'01/02/2013', lat: 42.00, long: 25.00},
	       sighting3={_label: 'sighting', date:'01/03/2013', lat: 47.32, long: 30.21},
	       sighting4={_label: 'sighting', date:'01/01/2013', lat: 40.22, long: 27.03},
	       sighting5={_label: 'sighting', date:'01/02/2013', lat: 43.00, long: 21.12},
	       user1-[:SPOTTED {confidence: 10}]->sighting1-[:TARGET]->alvin,
	       user1-[:SPOTTED {confidence: 12}]->sighting2-[:TARGET]->simon,
	       user1-[:SPOTTED {confidence: 27}]->sighting3-[:TARGET]->theo,
	       user2-[:SPOTTED {confidence: 50}]->sighting4-[:TARGET]->alvin,
	       user2-[:SPOTTED {confidence: 80}]->sighting5-[:TARGET]->simon

Now I’ve added a confidence factor to the relationship itself, which allows me to do a weighted traversal over the relationship nodes. This is a far better user for attributes belonging to a relationship. Also note that the non-traversal information is now more appropriately located as attributes on an entity. This should all actually seem pretty familiar to you if you’ve done any work with Domain-Driven Design.

With this new model in place, I can now visualize my graph like this:

Modified Zombie Sighting Graph

Modified Zombie Sighting Graph

Rest assured, I can still make effective queries against the graph, they just involve another hop to get the information that I used to get before. So, I can write the query to get the list of zombies that have two “spotted” arrows pointing to them (two different users spotted the same zombie), I just need to extend the shape of the subgraph I’m looking for to traverse over the “sighting” node. In the query below, I don’t care about the information on the node, so I can use double-parens for an anonymous node:

START u=node:node_auto_index(_label='user')
MATCH u-[:SPOTTED]->()-[:TARGET]->(zombie)<-[:TARGET]-()<-[:SPOTTED]-(somebodyelse)
RETURN DISTINCT zombie.name

This matches all zombies that have a spotted->(sighting)->target path from a user and also have the reverse <-target-(sighting)<-spotted relationship coming from another user. As expected, the above query returns "Alvin" and "Simon" but not "Theodore". I dare you to try and jump two join tables going one way and then jump the same two join tables going back out without ripping a hole in your RDBMS and/or brain. Moral of the story - even amazing tools like Neo4j require thoughtful care of the data model, though Neo4j can certainly handle refactoring and upgrading much more easily than a regular SQL database.

Implementing an Asynchronous Repository with Akka Actors

I’ve been fortunate enough to be able to work on an enterprise LOB application that uses the Typesafe stack – Play Framework using Scala and Akka. One of the things that I’ve noticed with application development in Scala is that every morning when I take a look at my code I scan it for opportunities to refactor. I want to know how I can reduce the number of lines of code, which reduces the number of ways it can fail, I want to know how I can make my code cleaner, more elegant, faster, and more scalable.

I was sifting through my code last week and I noticed this little nugget:

ZombieRepository.findByName( zombie.name ).map { ... some stuff ... }

My first thought was that this code looked really old fashioned. Making synchronous method calls like that is just so last year. I immediately saw an opportunity to add a point of scalability and durability by putting my zombie repository (it’s not actually zombies, I’m just using that as an example…) behind a nice abstraction point as well as allow for all kinds of things like load balancing of requests, distributed calls, etc. I decided to implement a repository as an Actor.

The first thing I did, which is part of one of my favorite practices of contract-first design, was build the Actor Protocol.

object ZombieRepositoryActorProtocol {
    case class Insert(zombie: Zombie)
    case class Update(zombie: Zombie)
    case class Delete(zombieId: String)
    case class QueryByName(zombieName: String)
}

At the moment I don’t need to create case classes for the replies (but I could easily do this) because these messages will either be one-way messages or the replies will be things like Vector[Zombie] or Option[Zombie].

In situations where I want to unit test things that rely upon this repository, I need to be able to give the repository a reference to a different actor than the default one we typically reply to (sender), so I can modify the protocol as follows to allow references to the Akka test actor to be passed along and, if it isn’t passed along, then we just reply to sender:

object ZombieRepositoryActorProtocol {
    case class Insert(zombie: Zombie)
    case class Update(zombie: Zombie)
    case class Delete(zombieId: String)
    case class QueryByName(zombieName: String, requester: Option[ActorRef])
}

I could also make the protocol a little more complicated and reply to some of the other operations with status codes or something to allow clients to confirm that an operation completed, but I am keeping it simple for now.

In my own code, the backing store is MongoDB but, another benefit of this type of encapsulation around the repository is that the only thing anybody else knows is how to send messages to the repo which take case classes as arguments. Not only is this a great decoupling away from a persistence medium, but it also allows for all the benefits you get from running Akka actors – you can put the repository behind a round-robin or load-balancing router, you can distribute the repository actors across a grid, and, my personal favorite, you get all the benefit of supervisory control so if your repository fails and crashes, another one can be started up, etc.

So now I can just handle messages like this:

import ZombieRepositoryActorProtocol._

def receive = {
    case Insert(zombie) => ...
    case Update(zombie) => ...
    case Delete(zombieName) => ...
    case QueryByName(zombie, requester) => requester.map { _ }.getOrElse { sender } ! FetchByName(zombie)  
}

I honestly don’t know how I got any large-scale, asynchronous work done before without the use of Akka.

Bundling Actor Messages into a Protocol

If you have done any programming with Actors (in my case Akka actors) then you know that the vast majority of the messages that you typically end up sending to those actors are case classes. Sure, you can send any type of data you like, but as a practice, most people like using case classes to make the pattern matching of the messages much easier to read. Sometimes you’ll see people send tuples or Vectors but only when it’s really clear that the actor doesn’t receive many other types of messages.

In situations where I have a ton of actors, one phenomenon that I’ve noticed is that I end up polluting package space with bucketloads of case classes. For example, if I put the StarportPlanetStarbase, and Ship actors in the com.kotancode.space package, I’ll likely end up with the messages for those actors all floating around within the package namespace. It might not seem like a problem, but let’s say I have an import com.kotancode.space._ line at the top of one of my actor files, now I’ve included all of those messages even though I’m just defining one actor.

I haven’t been able to find sufficient evidence to see who started the trend of actor protocols but I certainly cannot claim any credit, I’m just borrowing something I’ve seen. The idea is to wrap all of the case classes for a particular actor in a protocol object as a nice way of isolating those case classes in their own sub-scope, even if all the actors belong to the same package.

For example, instead of defining all my messages for Planet at the top of the Planet.scala file (like I normally would do), I can instead define them in an actor protocol like this:

object PlanetProtocol {
    case class Land(player:ActorRef)
    case class Depart(player:ActorRef)
    case class HarvestResources(player:ActorRef, ... , ... )
    case class ...
}

Now, I can define a receive method that looks like this, which keeps the case classes from cluttering up the namespace and, in my opinion, is just much cleaner and more elegant than scattering case classes to the winds at random.

def receive = {
    import PlanetProtocol._

    case Land(player) => ...
    case HarvestResources(player, ... , ...) =>
}

So, it might not seem like this is much of an earth-shattering thing, but every chance I can get to refactor my code to gain more elegance, more cleanliness, more expressiveness, and less ceremony – I’ll take it. Hopefully you find this tip as useful as I did.

Using the Visitor Pattern to Compare Object Graphs

Recently I found myself in a position where I needed to write some unit testable code that would perform a ‘diff’ against two fairly complicated object graphs. By object graphs, what I really mean are domain objects (POJOs, POCOs, whatever you want to call them) that have plenty of attributes as well as child objects and collections of child objects.

The goal of the exercise was to traverse the entire object graph of both objects and produce a collection of POJOs that represented the list of all things that changed relative to the source. So, if I changed an attribute, I should get an object that tells me the name of the attribute that changed and the object on which the attribute changed. If I add some object to an object in the destination (or remove it from the source), I should get a result object indicating the name of the new object and the object to which it was added. Conversely, if I add something to the source (or remove it from destination), it should appear in my results as a ‘removed’ object.

There are all kinds of academic papers illustrating the best way to write differencing algorithms using all kinds of math that quite frankly just makes my head hurt. The consumers of this code will be writing HTML that displays differences on a well-known shape. In other words, I’m not doing a blind line-by-line text comparison, I am comparing two things of a known shape, which means my UX can be far more intuitive to the user than something like an SVN compare or a Windiff.

I split this problem up into a couple of chunks. The first chunk was traversal. I wanted a way to guarantee that I would zip through the entire object graph. Again, I know the shape of these objects so I do not need a generic algorithm here. The Visitor pattern seemed ideal for this.

I created a simple interface called ObjectGraphVisitable (not really, I have changed the names to protect the innocent). Each  class in my graph will implement this interface which really just defines an accept method. Each object in the hierarchy has an accept method that, in turn, invokes the visit method on the visitor. If you’ve done any Objective-C/Cocoa programming then the Visitor pattern’s form of indirection should already be very familiar to users of the delegate pattern (e.g. the visitor is just a special kind of delegate).

Here’s the ObjectGraphVisitable interface:

public interface ObjectGraphVisitable {
    void accept(GraphVisitor visitor);
}

And here’s the Visitor interface:

public interface GraphVisitor {
    void visitPerson(Person p);
    void visitAddress(Address a);
    void visitRegion(Region r);
    void visitHighScore(HighScore score);
}

At this point we have the interfaces that typically make up the visitor pattern. The idea here is that when the visitor starts at the top level of an object graph, we call accept(this) where this is the visitor instance (you’ll see that next). It is then up to the root-level object in the hierarchy to recursively (or iteratively, whatever you like) invoke accept on the children, which in turn invoke accept on their children, and so on down the line until every object in the graph has called visitXXX on the visitor. We can then create a class that implements the visitor interface called ObjectGraphComparisonVisitor (again, name sanitized, your name should be descriptive and self-documenting) and contains the actual “diff” comparison logic.

Here’s a sample root level domain object that can be visited (accepts visits from a visitor):

public class RootLevelDomainObject implements ObjectGraphVisitable {
// other class stuff here...

    public void accept(GraphVisitor visitor) {
      visitor.visitRootNode(this);
      for (ChildNode child : this.children) {
        child.accept(visitor);
      }

    }
}

The key aspect of this pattern that often takes a little getting used to is that the visitor is passive and not in control of what gets visited. It calls accept on the root node and then sits back while all of its visitXXX() methods are invoked. This feels kind of like SAX XML parsing where instead of actively walking the graph you are passively fed XML nodes as they are parsed.

So now that we have the object graph set up to be traverse-able we need to take that traversal and use it for comparison. To do that, we can create a class that implements GraphVisitor. This class, called something like ObjectComparisonVisitor, needs to do 3 things:

  • Find all objects in the destination graph that do not occur in source
  • Find all attributes that have changed on objects that occur in both source and destination graphs.
  • Find all objects in source graph that do not occur in destination
I won’t show all the code here, I’ll leave that as a fun exercise for the reader. But the basic psuedo-code is as follows:
  1. For each object type in the graph, maintain a hash table that stores the items in source (source is what you visit)
  2. As each object type is visited, throw it into the hash table (note you need some kind of uniqueness test for each of your domain objects… you should have that anyway)
  3. As each object type is visited, check to see if it’s in the destination graph as well. If it is, do an attribute-by-attribute comparison.
  4. When the entire source tree has been visited (your call to accept is single-threaded, so it won’t return until every node has been visited), traverse the “source” hash tables looking for visited items that do not exist in the destination.

As each of these tasks are being done, you’re progressively building up a collection of ModelComparisonResult objects, which really just contain an enum for the difference type, the name of the object, the name of the attribute, etc.

When all is said and done, the comparison visitor object can provide rich difference results in a format that is both unit testable and amenable to display via command-line or GUI. There are other variations of this that include visiting both the source and the destination trees and then doing a final comparison afterward. This second variation is typically used when you don’t have discrete uniqueness tests on every object in your graph, so you keep track of divergence points in the visitation pattern to detect graph changes while attribute changes are still easy to detect (you could even use Reflection to check for those).

Anyway, this was really the first time I’ve ever found a practical use for the Visitor pattern outside of academics and job interviews, so I thought I would share.

Scala Companion Objects

In a single sentence, a Scala companion object is an object with the same name as a companion class in the same file that has access to all of that class’ members, including the private ones.

At first, the power and utility of these companion objects wasn’t really all that clear to me but then I started finding myself using them more and more to the point where I am pretty sure I couldn’t write a Scala app of any decent size without using them.

Let’s take a look at one of the most common uses of a companion object as a place to put factory methods using the apply method:

class BucketOSlime private (val slimeColor:String) {
    // Stuff goes here...
}
object BucketOSlime {
    def apply(color:String):BucketOSlime = new BucketOSlime(color)
}

With this code in place you can now create colorful buckets of slime with a constructor shortcut syntax that lets you leave off the word “new” entirely:

val blueSlime = BucketOSlime("blue")
val whiteSlime = BucketOSlime("white")

The fun (and power) doesn’t stop here. You can use multiple apply methods or a single apply method with pattern matching to return different instances of the same abstract object:

object BucketOSlime {
    def apply(color:String):BucketOSlime = {
        if (color == "blue")
              new BlueBucketOSlime
        else
              new RegularBucketOSlime(color)
    }
}

There are a ton of other ways you can use companion objects. Another common one is to wrap implicits to upgrade or enrich a class (like I’ve done with my MUD recently) so that the implicit doesn’t need to be imported directly.

The more I use Scala, the more I thoroughly enjoy it. I still utterly despise when I see people overloading symbols everywhere so that Scala’s complexity approaches “write once confuse everywhere” status… but with discipline I still think developers can keep Scala syntax clean, concise, and eminently readable.

Adding State to Actors with the Pimp My Class Pattern

In my last blog post, I talked about how the new Akka 2.0 actor system hides an enforces the inaccessibility of the actual class of the underlying actor. For example, if you have a class called Zombie that is of type Actor, then when you use the actor system to start that actor, what you get back is an ActorRef, and (without cheating), there is no way to typecast or otherwise gain access to the Zombie type from an instance of ActorRef. While it felt limiting, once I embraced the design principle that necessitated that enforcement, I understood.

That said, I am building a MUD here and I’m going to need state. My rooms need names, descriptions, and exits. My NPCs are going to need combat stats and inventories, my players are going to need names, descriptions, races, genders, etc. This is all basic stuff but – if I can’t get to a game object reference from an ActorRef, how am I going to get at their internal state?

Easy! Don’t use internal state.

Remembering back to my first post where I upgraded my old Scala MUD to use Akka actors, I noted that every actor in a system has a unique path string that denotes its position within a hierarchy of supervisory actors much like the way supervisors and processes exist in an Erlang system. That gave me the idea to store the state for my actors in a big in-memory hash (which I could then upgrade to some kind of distributed or CouchDB-type store later).

With that problem solved, the only real problem now was that my ActorRef instances didn’t have concrete properties. One of my favorite features of Scala, which is also used in building DSLs, is the pimp my class or pimp my library pattern. This works very much like class extensions in C#. I can create a class that encapsulates or enriches an underlying class type by providing methods that operate on or with that class. In my case, I wanted strongly typed accessors like name and description that actually deferred the backing store to this singleton, in-memory hash.

Here’s my scala file that contains an Implicits object that enriches the ActorRef type with the wrapper class and the wrapper class provides a getter and setter for name.

package com.kotancode.scalamud.core

import akka.actor.ActorRef
import scala.collection.mutable.{Map,
    SynchronizedMap, HashMap}

object Implicits {
	implicit def enrichActorRef(value: ActorRef) = new WrappedActorRef(value)
}

/*
 * Singleton containing a hash map keyed on the Actor Ref path
 * which contains additional hash maps.
 * There is one hash map for every actorref that needs state
 */
object StateMap {

   // Creates a thread-safe hash map
   def makeMap: Map[String, HashMap[String,String]] = {
       new HashMap[String, HashMap[String,String]] with
           SynchronizedMap[String, HashMap[String,String]]
   }

   private var stateMap = makeMap

   def getState(key:String):HashMap[String,String] = {
   	 if (!stateMap.contains(key))
	     stateMap.put(key, new HashMap[String,String])
	 stateMap(key)
   }

   def setState(key:String, propertyName:String, value:String) = {
		var state: HashMap[String,String] = null
		val result = stateMap.get(key)
		result match {
			case Some(x) => state = x
			case None => state = new HashMap[String,String]
		}
		state.put(propertyName, value)
		stateMap.put(key, state)
   }
}

class WrappedActorRef(val actorRef: ActorRef) {

	def name:String = StateMap.getState(actorRef.path.toString)("name")
	def name_=(value:String) { StateMap.setState(actorRef.path.toString, "name", value) }
}

An obvious refactoring opportunity here would be to move the StateMap class out to its own separate Scala file but I’ve always been a huge fan of the agile process of getting something working first and refactoring afterward.

Now, to use my newly enhanced ActorRef (with state) class all I have to do is add the following import

import com.kotancode.scalamud.core.Implicits._

To the top of my Scala file and below anywhere the type ActorRef is known, I can read and write the name property, as shown here:

	private def handlePlayerLogin(player:ActorRef) = {
		println("Player logged in: " + player)
		allPlayers ::= player
		for (p: ActorRef <- allPlayers) {
			p ! TextMessage(player.name + " logged in.")
	    }

Because ActorRef has been pimped to use the external state provider, I can use player.name in my code and it works!