Kotan Code 枯淡コード

In search of simple, elegant code

Menu Close

Regular Expressions and Parsing in Scala

Last night I was messing around with some code and I wanted to rip a couple of pieces of information out of a free-from string. This is typically where people reach for regular expressions. Not me, though.

No, my experience with regular expression support and parsing in other languages like C# and Java has left a terrible taste in my mouth. I hate it. When I want to parse stuff, I want to use pattern matching and rich syntax like I can get from languages like F# or Scala.

This made me wonder if Scala didn’t make regex parsing easier. Turns out, it makes it brain-dead simple with the clever use of the concept of an unapply. The same capability that makes Scala so good at constructing, zipping, and traversing list structures also makes it just as good at cracking apart structures into their component parts.

As a sample, I created a simple string in Scala and then put the “.r” postfix on it, turning it into a regular expression:

scala> val zombieLog = """(d\+) zombies spotted in (\w+).""".r

This created a regular expression with two groups. If you’re used to doing this kind of thing in Java or C# then you’re probably thinking that you now need to write 10 lines of code to set up the regex compiler, call some method to feed the string into the regex compiler, then access groups by numeric, 0-based indexes.

Fortunately, this is Scala and more often than not, things suck less in Scalaville.

First, I’ll create an input string that I want to crack apart with my regex:

scala> val inputText = "200 zombies spotted in Mexico."

Now I’ll use unapply and syntactic sugar and Scala goodness to pull out the right values:

scala> val zombieLog(zombieCount, location) = inputText // UNAPPLY magic
zombieCount: String = 200
location: String = Mexico

And that’s it, Bob’s your uncle. With the plethora of online regular expression builders available, it should only take you a few minutes to create the regex you need and slap it into your Scala code.