As you may have noticed, I’ve recently been doing a little bit of talking about Neo4j and graph databases. A database on its own is a fantastic thing, but it doesn’t do anybody much good if the information contained within stays there, or if you can’t put information into it from some application. In my own proof of concept, I wanted to see how easy it would be to access Neo4j from my favorite web application framework, Play.
Turns out it was a little annoying because I started looking for frameworks to talk to Neo4j. First I looked at AnormCypher, which is a library that provides an “anorm-like” API access to Cypher. I couldn’t use this one because it failed with multiple runtime errors when I attempted to bring it into my Play application. Next, I tried another Cypher wrapper that I found on github and that one failed miserably, too – it had a conflict between the version of a JSON parser it was using versus the one that my Play application was using.
Then I figured, why even bother? It’s just a REST API. So, I decided to try accessing Neo4j using nothing but Play framework’s own Web services API, which is basically the WS object. This turned out to be stupidly easy and, when I figured out future chaining in combination with that WS API, amazing stuff started to happen. You know, like, productivity.
First, I created a teeny little wrapper around the Neo4j REST API URL, I just called it NeoService:
class NeoService(rootUrl: String) {
def this() = this("http://default/neo/URL/location/db/data")
val stdHeaders = Seq( ("Accept", "application/json"), ("Content-Type", "application/json") )
def executeCypher(query: String, params: JsObject) : Future[Response] = {
WS.url(rootUrl + "/cypher".withHeaders(stdHeaders:_*).post(Json.obj(
"query" -> query,
"params" -> params
))
)
}
// Convert a Future[Response] into a Future[Int]!
def findNodeIdByKindAndName(kind:String, name:String) : Future[Option[Int]] = {
val cypher = """
START n=node:node_auto_index(kind={theKind})
WHERE n.name = {theName}
RETURN id(n) as id
""".stripMargin
val params = Json.obj( "theName" -> name, "theKind" -> kind )
for (r <- executeCyper(cypher, params)) yield {
val theData = (r.json \ "data").as[JsArray]
if (theData.value.size == 0)
None
else
// Json: "data" : [ [ ### ] ]
Some(theData.value(0).as[JsArray].value(0).as[Int])
}
}
}
So, in this little tiny class I’ve got a function that I can use to execute Cypher queries and because I know that I have an auto-index’d node property called kind and I have another property called name, I can attempt to find a node’s ID based on the kind and name properties using a Cypher query. Instead of finding them synchronously, I can just convert my Future[Response] into a Future[Option[Int]] where the Option[Int] is the result of looking through the Neo4j database for that data. I’ve just converted a future with a future, and being able to do so is pretty freaking awesome.
Not only can I do that, but I can chain these method calls from other code, like this:
val neo = new utils.NeoService()
for (nodeId <- neo.findNodeIdByKindAndName("zombie", "bob"))
for (zombieId <- otherObject.tweakZombie(zombieId))
yield {
zombieId.map { id => println("I got a zombie!") }.getOrElse { println("Zombie doesn't exist!") }
}
In the code here, tweakZombie can be written to be aware of the Option[] so that if it’s empty, it doesn’t tweak anything, allowing me to chain call after call after call and not worry about slapping a crapload of if statements in there – a judicious use of map, and for, and options and futures gives me a ridiculous amount of power.
All of this is made possible by the fact that the Play Framework Web Services API is based on Futures. I was originally skeptical of futures because the old code I had seen before was more confusing than single-threaded, synchronous programming. With the new Futures syntax and well-coded libraries like the Play WS object, you can do ridiculous things in a very small number of lines of code.
Today I was messing around with Neo4j, as I am wont to do, and I ran across a modeling scenario that I hadn’t expected. My previous experience with NoSQL databases has been with MongoDB. I’ve used Mongo for multiple projects, including a MUD and an MMORPG server both written in Akka and Scala.
When using MongoDB as a backing store for my enterprise application, I’ve been storing multi-level objects in there without much concern. I’ve got a root object that contains an array of nested objects which, in turn, actually contain even more nested objects. The structure is a few levels deep, but using techniques like Play Framework’s JSON inception, it makes serializing an de-serializing to and from Scala case classes pretty easy. All in all, it’s been working out quite well.
When looking at how to store information that I might otherwise put in a NoSQL store in Neo4j to take advantage of its graph query and traversal capabilities, I noticed that Neo4j, while “speaking JSON”, doesn’t do so at quite the robust level that MongoDB does. In short, you can’t store arrays or nested objects in a Neo4j node or relationship. Now, before you complain about this, there are a pile of good reasons for this that I could really only begin to touch the surface of if I tried to explain them. Bottom line is that nodes and relationships in Neo4j have properties, which are name-value pairs, not recursive name-value maps like you find in Mongo.
So, with that restriction in place, how do we model complex, nested objects and arrays in Neo4j?
First we can take a look at why we need an array. In many cases in my model, the arrays of nested objects on a root object were actually modeling a parent-child type relationship where the children had their own set of properties. So, let’s say you want a node in your graph to be a WellArmedZombie and that particular zombie needs an array of weapons. If that zombie owns or contains or is responsible for those weapons, you can split that single node into many, one node for the core WellArmedZombie, one node for each of its weapons, and a relationship (probably -[:WEAPON]->) between the zombie and each weapon.
In my own exercise attempting to remodel a MongoDB database as a Neo4j database, nested objects were very easy to convert. In fact, in almost all cases, I was able to take the field name that referred to the nested item and turn it into a relationship pointing to a related node containing the properties that used to belong to the nested item.
Let’s say this same WellArmedZombie node, in Mongo, had a property called armor which was a nested JSON object containing properties like strength, material, and color. It is pretty easy to switch that into Neo4j where we have a relationship that looks like (zombie)-[:ARMOR]->(armor).
So, the one lesson I learned today was really that thinking relationally requires some unlearning before you can really take advantage of a NoSQL database model. Further, thinking in a graph requires you to unlearn some pretty standard conventions and best practices that are a part of thinking in documents rather than nodes and edges. I realize that as we discover new tools, we often feel like the shiny new tool will fix all problems. It doesn’t, and I’ve got a couple models/domains I work with that I don’t think are good candidates for Neo4j. I have another model that I’ve used where I think a hybrid Neo4j/Mongo approach might be appropriate – put all the information you need for queries, traversals, and quick summary display into the Neo4j database and put the large, bulky (heavily nested and composed) data into Mongo for supplementary query.
Regardless, I am pretty damn excited to be living and working in a time where the kind of computing power that Neo4j offers is pretty much ubiquitous, free, and readily available on my machine and in the cloud. It’s a great time to be a developer.
In my last blog post, I offered up a sample zombie domain as an area where refactoring a property-laden relationship into an entity on its own might be beneficial. Unfortunately, I think some of the context was lost in the discussion about zombies. The main point that I thought was important to the refactor is that if two entities interact with each other in a meaningful way at some point in time, that interaction is a likely candidate for producing a third entity as opposed to simply putting information about the interaction on the relationship.
The biggest motivator for this is if you have two entities, and one entity can interact multiple times with the other entity and the name of the relationship describing that interaction is the same then you absolutely need a third entity in the way and you simply can’t model it with a simple A->B relationship. In the comments section of my previous blog post, I discussed a possible scenario where you might have a wildlife monitoring scenario. In this scenario, multiple researchers interact with animals where they take vital statistics (I used temperature because I know nothing about wildlife conservation and figured body temp might be useful…). The key here is that the same researcher can interact with the same animal for the same purpose (containing different metadata) multiple times.
Here’s a visualization of a really simple graph that has 2 researchers, three animals, and multiple recorded “check ups”:
Here’s the Cypher that created this graph:
CREATE joe={_label:'researcher', firstname:'Joe', lastname:'Doe'},
sally={_label:'researcher', firstname:'Sally', lastname:'McSmart'},
ann={_label:'researcher', firstname:'Ann', lastname:'DuScience'},
bubba={_label:'animal', name:'Bubba'},
kong={_label:'animal', name:'Kong'},
dp1={_label: 'checkpoint', date:'04/13/2013', temp:98, lat:40.7657, long:73.9856},
(joe)-[:CHECKED]->(dp1)-[:ANIMAL]->(bubba),
dp2={_label: 'checkpoint', date:'04/14/2013', temp:99, lat:40.7657, long:73.9856},
(sally)-[:CHECKED]->(dp2)-[:ANIMAL]->(bubba),
dp3={_label: 'checkpoint', date:'04/14/2013', temp:102, lat:40.7657, long:73.9856},
(ann)-[:CHECKED]->(dp3)-[:ANIMAL]->(kong),
dp4={_label: 'checkpoint', date:'04/15/2013', temp:103, lat:40.7657, long:73.9856},
(ann)-[:CHECKED]->(dp4)-[:ANIMAL]->(kong),
dp5={_label: 'checkpoint', date:'04/15/2013', temp:99, lat:40.7657, long:73.9856},
(ann)-[:CHECKED]->(dp5)-[:ANIMAL]->(bubba)
From the graph (and from the Cypher if you’re good at inferring from that) we can see that Ann checked in with the same animal twice and got two different body temperature measurements. Some queries that I can run on this simple dataset:
Bubba’s average body temperature:
START bubba=node:node_auto_index(name='Bubba') MATCH bubba<-[:ANIMAL]-(checkpoint) RETURN avg(checkpoint.temp)
Checkins by researcher and the average body temp taken by said researcher (note the lack of a “Group by” statement here):
START animal=node:node_auto_index(_label = 'animal') MATCH animal<-[:ANIMAL]-(checkpoint)<-[:CHECKED]-(researcher) RETURN researcher.firstname, count(researcher) as checkCount, avg(checkpoint.temp) ORDER BY checkCount DESC
Animals, the number of times they were checked, and the highest temperature recorded:
START animal=node:node_auto_index(_label = 'animal') MATCH animal<-[:ANIMAL]-(checkpoint)<-[:CHECKED]-(researcher) RETURN animal.name, count(checkpoint), max(checkpoint.temp)
If I were better at graph databases, and my dataset was richer, I might be able to write queries that show me clusters, perhaps showing me that a particular researcher is recording much higher than normal temps, and so there may be a correlation between their locations and sick animals, etc.
No matter which way you want to query this particular type of graph, the one thing that I do know is that this one truly does require the kind of refactoring I mentioned in the previous blog post, where we have to take a direct relationship between two entities and put a new entity in the middle.
Avoiding Graph Modeling Antipatterns
In my previous blog post, which I just posted this morning, it turns out that I made a pretty large modeling no-no. In terms of consequences to the application and the survival of the zombie apocalypse, I think the issue was pretty minor. However, it does illustrate that no matter how awesome a graph database product is, you still need to put some thought into how you model the database.
In the previous model, I could make assertions like user-[:SPOTTED]-zombie and there were some attributes on the spotted relationship such as the date the sighting took place. As it turns out, this is an antipattern. There are a couple of things I did wrong here.
Firstly, the attributes on the relationship have nothing to do with the weighting or nature of the relationship. The attributes I put on that relationship were informational and arguably belonged on a separate entity. In other words, the sighting was a noun that I had elided from my relationship sentence “a user spotted a zombie”
In the forthcoming Graph Databases book being published by O’Reilly, there is a best practice in graph modeling that says that when the interaction of two nodes produces something of significance, that interaction should be modeled as its own entity rather than modeled as properties on the relationship between those nodes. In my sample zombie domain, this means that a user produces a sighting when they spot a zombie. The sighting has metadata about the actual sighting, which can include the latitude and longitude (I know that neo4j has its own built-in geospatial support, which I will ignore for this example) of where the sighting was, and other information about the sighting itself, such as the user’s confidence level that they saw what they think they saw (aka the “bigfoot factor”).
Now I can go back and remodel my previous database using the following graph model created via Cypher:
CREATE user1={_label: 'user', firstname: 'Kevin', lastname: 'Hoffman'},
user2={_label: 'user', firstname: 'Bob', lastname: 'Bobberson'},
alvin={_label: 'zombie', name: 'Alvin', power:20},
simon={_label: 'zombie', name: 'Simon', power:30},
theo={_label: 'zombie', name: 'Theodore', power:40},
sighting1={_label: 'sighting', date:'01/01/2013', lat: 42.12, long: 21.03},
sighting2={_label: 'sighting', date:'01/02/2013', lat: 42.00, long: 25.00},
sighting3={_label: 'sighting', date:'01/03/2013', lat: 47.32, long: 30.21},
sighting4={_label: 'sighting', date:'01/01/2013', lat: 40.22, long: 27.03},
sighting5={_label: 'sighting', date:'01/02/2013', lat: 43.00, long: 21.12},
user1-[:SPOTTED {confidence: 10}]->sighting1-[:TARGET]->alvin,
user1-[:SPOTTED {confidence: 12}]->sighting2-[:TARGET]->simon,
user1-[:SPOTTED {confidence: 27}]->sighting3-[:TARGET]->theo,
user2-[:SPOTTED {confidence: 50}]->sighting4-[:TARGET]->alvin,
user2-[:SPOTTED {confidence: 80}]->sighting5-[:TARGET]->simon
Now I’ve added a confidence factor to the relationship itself, which allows me to do a weighted traversal over the relationship nodes. This is a far better user for attributes belonging to a relationship. Also note that the non-traversal information is now more appropriately located as attributes on an entity. This should all actually seem pretty familiar to you if you’ve done any work with Domain-Driven Design.
With this new model in place, I can now visualize my graph like this:
Rest assured, I can still make effective queries against the graph, they just involve another hop to get the information that I used to get before. So, I can write the query to get the list of zombies that have two “spotted” arrows pointing to them (two different users spotted the same zombie), I just need to extend the shape of the subgraph I’m looking for to traverse over the “sighting” node. In the query below, I don’t care about the information on the node, so I can use double-parens for an anonymous node:
START u=node:node_auto_index(_label='user') MATCH u-[:SPOTTED]->()-[:TARGET]->(zombie)<-[:TARGET]-()<-[:SPOTTED]-(somebodyelse) RETURN DISTINCT zombie.name
This matches all zombies that have a spotted->(sighting)->target path from a user and also have the reverse <-target-(sighting)<-spotted relationship coming from another user. As expected, the above query returns "Alvin" and "Simon" but not "Theodore". I dare you to try and jump two join tables going one way and then jump the same two join tables going back out without ripping a hole in your RDBMS and/or brain.
Moral of the story - even amazing tools like Neo4j require thoughtful care of the data model, though Neo4j can certainly handle refactoring and upgrading much more easily than a regular SQL database.
Last week I was talking to a colleague about when a graph database might be overkill versus when it might be the right solution. That got me to thinking, and so I spent several days taking a look at various data models that I used on a regular basis and re-imagining those models as graph models. The results were pretty interesting in that it was the rare case where I felt an application could not be improved by the use of a graph model.
So this got me interested in messing around with a graph database for my favorite sample domain: zombies. In many previous posts and sample applications, I have used a sample domain of an application server that receives zombie sighting messages and in turn sends out messages to aid and inform those attempting to survive the zombie apocalypse.
In my domain, users (human survivors of the zombie apocalypse) can spot zombies. A zombie sighting is a record of vital information about a particular zombie so that zombie can be tracked. Ideally, some higher-level analysis can be done about clusters of zombies (geospatial analysis … also ideally suited to graph databases) and we might also want to do some other analysis on zombies, like correlations between sightings or, better yet, if two people keep reporting the same zombie in similar areas, then perhaps those people should team up.
First thing we need to do is create some sample data. In a graph database, sample sets can be ridiculously big so I’m going to keep it simple and create a “diamond shape” (you’ll see that in a bit) as well as another relationship dangling off the side. This sample set has two users who are reporting sightings of three different zombies: Alvin, Simon, and Theodore. To get this data into the database I’ll use a Cypher (neo4j’s query language) statement. It’s worth noting how the description of relationships is very much like an ASCII Art depiction of what the relationship would look like if you drew it on a whiteboard. This is a huge benefit and one of the things that drew me to Neo4j to start with:
CREATE user1 = { label: 'user', firstname: 'Kevin', lastname: 'Hoffman' },
user2 = { label: 'user', firstname: 'Bob', lastname: 'Bobberson' },
alvin = { label: 'zombie', name: 'Alvin', power: 20 },
simon = { label: 'zombie', name: 'Simon', power: 50 },
theodore = { label: 'zombie', name: 'Theodore', power:80 },
(user1)-[:SPOTTED {date: '01/02/2013'}]->(alvin),
(user1)-[:SPOTTED {date: '01/03/2013'}]->(simon),
(user1)-[:SPOTTED {date: '01/02/2013'}]->(theodore),
(user2)-[:SPOTTED {date: '01/02/2013'}]->(alvin),
(user2)-[:SPOTTED {date: '01/03/2013'}]->(simon)
So here we’re creating two users and three zombies, and we can see that the Kevin user spotted alvin and simon while the Bob user spotted all three of the zombies. We can use Neo4j’s built-in data visualizer to see what this graph looks like:
There are a couple of really important points in this graph. The first is that the connections between the nodes are directional. The other is that there is data associated with the relationships as well as with the nodes themselves, which is one area where the use of a graph database truly shines over a traditional RDBMS with “join tables”.
The next thing to notice is that the left side of this graph forms a very familiar “diamond” pattern. The diamond pattern, in social networking circles, can be used to determine mutual friends. In our case, we don’t really have mutual friends but we do have mutual zombie sightings. To get a list of the zombies which have been spotted by multiple users, we can issue the following query:
START u=node:node_auto_index(label='user') MATCH u-[:SPOTTED]-(zombie)<-[:SPOTTED]-(somebodyelse) RETURN DISTINCT zombie
Here we are matching all user nodes with the zombies they spotted, which have also been spotted by another user. This could potentially return multiple duplicates of the zombie (Bob and Kevin both spotted Alvin) so we remove duplicate nodes with the DISTINCT clause.
To be more specific, we can also run a query to list off the zombies that were spotted by Bob and Kevin (e.g. the “mutual friends”) query:
START bob=node:node_auto_index(firstname='Bob'),
kevin=node:node_auto_index(firstname='Kevin')
MATCH (kevin)-[:SPOTTED]->(zombie)<-[:SPOTTED]-(bob)
RETURN zombie.name
When I execute this code, I get the results “Alvin” and “Simon” (the diamond pattern) but I don’t get “Theodore” because only Kevin saw Theodore, not Bob.
You can see that with unbelievably simple, and remarkably readable syntax like (kevin)-[:SPOTTED]->(zombie)<-[:SPOTTED]-(bob), you can describe intricate relationships and perform absolutely mindblowing queries against a graph database. This is how Facebook gives you the information it gives you, it’s how Amazon gives you product recommendations, and it’s how LinkedIn knows that you have “2nd” and “3rd” degree contacts in your professional network (it’s counting hops across the “connected to” relationships in a graph traversal).
This is just the tip of the tip of the iceberg, barely scratching the surface of the power of graph databases. They can be used for all kinds of amazing things, most notably social networking and post-apocalyptic zombie sighting management. What could you use one for on your projects?
The other day I was lamenting the fact that every time I made a tiny little change to the case classes that I use for reading/writing Ajax requests from the JavaScript client code for my web application, I have to go and manually modify my JSON combinators that convert between the Scala case classes and the Json representations.
There are some undocumented (Play Framework’s own website doesn’t make any mention of this, even on the page for Json combinators!) Scala 2.10 macros that actually allow for auto-generation of this conversion code … I wish I had coined the term myself, but someone else appropriately refers to this activity as JSON inception.
The basic idea behind Play’s JSON combinators is that they let you use a natural, fluid syntax to convert to/from JSON. For example, you might have the following implicit writes that lets you “jsonify” a zombie sighting case class:
implicit val zombieSightingWrites = (
( __ \ "name").write[String] and
( __ \ "timestamp").write[Int] and
( __ \ "location").write[GpsCoordinate]
)(unlift(ZombieSighting.unapply))
implicit val gpsCoordinateWrites = (
( __ \ "long").write[Double] and
( __ \ "lat").write[Double] and
( __ \ "altitude").write[Double]
)(unlift(GpsCoordinate.unapply))
It doesn’t look like all that much code to maintain, but let’s say my application deals in about 20 different kinds of individual case classes that can be sent or received from Ajax/web service calls. Certainly in the middle of development, making changes to this is going to be annoying and while doing it, I couldn’t shake the feeling that this could be cleaner, more elegant, more kotan.
The first thing I did was wrap all my implicit reads and writes up into a single Scala object so I could just do an import JsonReadsWrites._ and then all my Json conversion code is in a single place. That felt a little better, but I still thought it could be easier. The above sample is overly simplistic, my real case classes are filled with values of type Option[T] and dealing with those manually in the unapply/apply combinators you normally write for Play makes maintenance even more tedious.
Enter Scala 2.10 macros…
As of Scala 2.10, macros are now fully supported. A macro is basically a pre-compile code generation pass. If you flag a method as a macro method, then it will be executed at compile time and then the return value of your method is an AST (abstract syntax tree). So, what Play Framework has are macro methods called writes and reads. These methods are executed at compile time and they replace the writes and reads code that you see in the IDE with a syntax tree that constructs your Json combinators for case class conversion for you automatially.
To be honest, when I first looked at how this is done, it looked like it was some form of black magic, or that there was no way it would be possible without the use of some magic fairy dust. I read and re-read the documentation on Scala macros and after a while, it started to sink in. Reflection is available to the code you write in your macro, so, at compile time, your code can introspect the types of information passed to your macro via generics, and can then use that information to figure out how to construct a Json reader or writer.
So now, the code I wrote above can be re-written as:
implicit val zombieSightingReads = Json.reads[ZombieSighting] implicit val gpsCoordinateReads = Json.reads[GpsCoordinate]
Now I can make changes to the case classes and the macro-generated code will automatically compensate for those changes, and it handles arrays, nested case classes (which Salat doesn’t even do for case class conversion for MongoDB…).
I would be hard pressed to find a better use of Scala macros than this.
The Futures Have Arrived
As you may know, I’ve been working on building a fairly complex enterprise LOB application using Play, Scala, and Akka. When I started, I had a rudimentary understanding of Akka and Play, and I’d done some playing around with Akka before, but my experience was a far cry from the type of experience a grizzled, production-deployed veteran might have.
My website, without giving away any details, allows me to do a keyword search across multiple different types of entities. Let’s say I’ve built a zombie apocalypse preparedness social networking site (who doesn’t need one of those?!?) and I want to be able to enter a keyword and have it look through the repository containing identified zombie classifications as well as possible friends and even the names of weapons.
I built this in a way that I thought was pretty decent, I created a Search actor that takes a KeywordSearch case class message. This Search actor then asks the Akka actor system for references to the different repositories, which are also actors. In this case, I might need a reference to the ZombieIdentificationRepository actor, the WeaponRepository actor, and the UsersRepository actor.
This problem is screaming for parallelism. I want the search actor to send fire-and-sorta-forget messages to the different repositories and, when it gets answers from all three repositories, send a message back to the actor that invoked the search containing all of the results. Further, I want the web request itself to not block on waiting for the results, so I want to use Play’s asynchronous controller pattern.
When I first built the search, I had code that looked kind of like this:
def index(keyWord:String) = Action { implicit request =>
...
val resFuture = ActorProvider.Searcher ? SearchActorProtocol.KeywordSearch(keyword)
val results = Await.result(resFuture, timeout.duration).asInstanceOf[Array[SearchResult]]
Ok(views.html.search.index(results)
}
So, while under the hood I was using an Actor, I’m still performing an explicit block. Worse, the search actor was also performing an explicit block as it was doing an Await.result for each of the different repository queries. I had heard that Future composition was possible, so I thought I’d give it a shot.
Here’s how I refactored the Search actor’s search method to combine multiple searches performed in parallel:
def performKeywordSearch(seeker: ActorRef, keyword: String) {
implicit val timeout = Timeout(3 seconds)
val zombieFuture = ActorProvider.ZombieRepository ? KeywordSearch(keyword)
val weaponFuture = ActorProvider.WeaponRepository ? KeywordSearch(keyword)
val userFuture = ActorProvider.UserRepository ? KeywordSearch(keyword)
val combFuture = for {
z <- zombieFuture.mapTo[Array[SearchResult]]
w <- weaponFuture.mapTo[Array[SearchResult]]
u <- userFuture.mapTo[Array[SearchResult]]
} yield seeker ! ( z ++ w ++ u).sortBy( r => r.name )
}
Note that there isn’t a single line of blocking code in the preceding sample. Everything happens when it’s done and there’s no explicit “sit and wait” code. I can even then modify the search controller’s method to remove the Await.result:
...
val searchFuture = ActorProvider.Searcher ? SearchActorProtocol.KeywordSearch(keyword)
val timeoutFuture = play.api.libs.concurrent.Promise.timeout("Failed to finish search", 5 seconds)
Async {
Future.firstCompletedOf(Seq(searchFuture, timeoutFuture)).map {
case searchResults: Array[SearchResult] =>
render {
case Accepts.Html() => Ok(views.html.search.index(searchResults))
case Accepts.Json() => Ok(Json.toJson(searchResults))
}
case t:String => InternalServerError(t)
}
}
And now I’ve removed a chain of blocking calls and replaced all of it with completely asynchronous, non-blocking code and now the only thing that ever sits and waits is Play itself, as it awaits the response from the Actor.
Coding with Futures and in a completely asynchronous fashion with Actors is difficult, and it requires a lot of up front effort and takes time to understand what’s really going on, but, as you can see from the code above, the clean and simple non-blocking elegance that you get as a reward is well worth the effort.
I’ve been fortunate enough to be able to work on an enterprise LOB application that uses the Typesafe stack – Play Framework using Scala and Akka. One of the things that I’ve noticed with application development in Scala is that every morning when I take a look at my code I scan it for opportunities to refactor. I want to know how I can reduce the number of lines of code, which reduces the number of ways it can fail, I want to know how I can make my code cleaner, more elegant, faster, and more scalable.
I was sifting through my code last week and I noticed this little nugget:
ZombieRepository.findByName( zombie.name ).map { ... some stuff ... }
My first thought was that this code looked really old fashioned. Making synchronous method calls like that is just so last year. I immediately saw an opportunity to add a point of scalability and durability by putting my zombie repository (it’s not actually zombies, I’m just using that as an example…) behind a nice abstraction point as well as allow for all kinds of things like load balancing of requests, distributed calls, etc. I decided to implement a repository as an Actor.
The first thing I did, which is part of one of my favorite practices of contract-first design, was build the Actor Protocol.
object ZombieRepositoryActorProtocol {
case class Insert(zombie: Zombie)
case class Update(zombie: Zombie)
case class Delete(zombieId: String)
case class QueryByName(zombieName: String)
}
At the moment I don’t need to create case classes for the replies (but I could easily do this) because these messages will either be one-way messages or the replies will be things like Vector[Zombie] or Option[Zombie].
In situations where I want to unit test things that rely upon this repository, I need to be able to give the repository a reference to a different actor than the default one we typically reply to (sender), so I can modify the protocol as follows to allow references to the Akka test actor to be passed along and, if it isn’t passed along, then we just reply to sender:
object ZombieRepositoryActorProtocol {
case class Insert(zombie: Zombie)
case class Update(zombie: Zombie)
case class Delete(zombieId: String)
case class QueryByName(zombieName: String, requester: Option[ActorRef])
}
I could also make the protocol a little more complicated and reply to some of the other operations with status codes or something to allow clients to confirm that an operation completed, but I am keeping it simple for now.
In my own code, the backing store is MongoDB but, another benefit of this type of encapsulation around the repository is that the only thing anybody else knows is how to send messages to the repo which take case classes as arguments. Not only is this a great decoupling away from a persistence medium, but it also allows for all the benefits you get from running Akka actors – you can put the repository behind a round-robin or load-balancing router, you can distribute the repository actors across a grid, and, my personal favorite, you get all the benefit of supervisory control so if your repository fails and crashes, another one can be started up, etc.
So now I can just handle messages like this:
import ZombieRepositoryActorProtocol._
def receive = {
case Insert(zombie) => ...
case Update(zombie) => ...
case Delete(zombieName) => ...
case QueryByName(zombie, requester) => requester.map { _ }.getOrElse { sender } ! FetchByName(zombie)
}
I honestly don’t know how I got any large-scale, asynchronous work done before without the use of Akka.
If you have done any programming with Actors (in my case Akka actors) then you know that the vast majority of the messages that you typically end up sending to those actors are case classes. Sure, you can send any type of data you like, but as a practice, most people like using case classes to make the pattern matching of the messages much easier to read. Sometimes you’ll see people send tuples or Vectors but only when it’s really clear that the actor doesn’t receive many other types of messages.
In situations where I have a ton of actors, one phenomenon that I’ve noticed is that I end up polluting package space with bucketloads of case classes. For example, if I put the Starport, Planet, Starbase, and Ship actors in the com.kotancode.space package, I’ll likely end up with the messages for those actors all floating around within the package namespace. It might not seem like a problem, but let’s say I have an import com.kotancode.space._ line at the top of one of my actor files, now I’ve included all of those messages even though I’m just defining one actor.
I haven’t been able to find sufficient evidence to see who started the trend of actor protocols but I certainly cannot claim any credit, I’m just borrowing something I’ve seen. The idea is to wrap all of the case classes for a particular actor in a protocol object as a nice way of isolating those case classes in their own sub-scope, even if all the actors belong to the same package.
For example, instead of defining all my messages for Planet at the top of the Planet.scala file (like I normally would do), I can instead define them in an actor protocol like this:
object PlanetProtocol {
case class Land(player:ActorRef)
case class Depart(player:ActorRef)
case class HarvestResources(player:ActorRef, ... , ... )
case class ...
}
Now, I can define a receive method that looks like this, which keeps the case classes from cluttering up the namespace and, in my opinion, is just much cleaner and more elegant than scattering case classes to the winds at random.
def receive = {
import PlanetProtocol._
case Land(player) => ...
case HarvestResources(player, ... , ...) =>
}
So, it might not seem like this is much of an earth-shattering thing, but every chance I can get to refactor my code to gain more elegance, more cleanliness, more expressiveness, and less ceremony – I’ll take it. Hopefully you find this tip as useful as I did.
When you’re sitting at home using your Mac or your Windows computer and you’re the master of your domain, everything is good. You control your firewall rules, you control which ports are blocked and if you want to access publicly available resources then you can do so with impunity. The world is your oyster.
Until you get into the office.
Then you notice that you have blocked ports, you can’t reach github because it’s classified as “social networking”, you are required to have a username and a password just to get traffic through the corporate proxy to get to the outside world. And even then, after you’ve managed to get outside, you notice that your organization has blocked access to the central Maven Nexus repository – the source for pretty much any framework, library, or tool that can be packaged as a JAR. Including your application dependencies.
I had this problem and it was causing me no end of grief. So I thought I’ll just add a reference to my company’s private Nexus repository and all will be just fine. This seemed like it should work, and it seemed to fix the vast majority of the problems I found on Stack Overflow:
resolvers += "Corporate Nexus" at "http://corpserver.corp.com/nexus/content/groups/public"
This actually didn’t work. The problem? Play (or SBT underneath Play, actually) stopped trying to resolve dependencies after it failed to connect to public. So what I really needed to do was to convince Play (and SBT) to never even try to go to the public Nexus. To do that, I added the following code to my Build.scala:
externalResolvers <= resolvers.map { rs =>
Resolver.withDefaultResolvers(rs, mavenCentral = false)
Now my code works just fine and I can declare dependencies on internal JARs (stuff other teams within my company have deployed, as well as my own internal products) and I can declare dependencies on public stuff and I don’t have to worry about Play/SBT failing the attempt after they get error messages from Maven central, because the resolution process now bypasses Maven central entirely.
I realize this may be a small piece of information, but it took me hours and hours of searching to try and find this and finally one of the many ridiculously helpful people at Typesafe linked me to some information on sbt that finally held the answer.
Hopefully this little tidbit helps someone else who is coding behind a corporate firewall and can’t figure out why dependency resolution is failing so miserably.



