In case you’ve been unplugged and haven’t heard, there’s yet another scandal a-brewing: The NSA obtained, through a special warrant, the ability to pull phone records from a Verizon subsidiary individually and in bulk. When I say phone records I mean a list of phone numbers and the numbers those numbers called and the duration of that call. No subscriber names or other personal information is included in this dump.

For those unfamiliar with this type of operation, it’s basic data mining 101, more specifically, they’re looking for patterns and, at least during pattern discovery, they don’t particularly care about the names of the pattern participants, only in the shape of those patterns. If you take a look at the type of information the NSA requested, it becomes immediately obvious to anyone who has worked with graph databases that they intend to build a network and then run analysis on it.

If you think about making a phone call, you are establishing a link between two participants, the caller and the callee. This establishes what in graph database parlance is called a directed relationship. In other words, you can think of it as a little picture that looks like this:

867-5309 —CALLED—>555-1212

Since we know the duration of the call, we can use that duration to weight the relationship between the two phone numbers, assuming that a longer duration call implies a “heavier” weight on the CALLED relationship:

867-5309—CALLED(5 minutes)—>555-1212

So now that we have this massive database filled with links between seemingly anonymous phone numbers, what would the NSA do with it? The first thing they can do is network detection. They can take a look at the numbers and see if particular shapes show up, the simplest of which is a “ring”:

A–CALLED–>B–CALLED–>C–CALLED–>A

This could be a completely innocent ring, or it could be terrorist information passing where the inbound call comes from someone who was never in direct outbound contact with the original caller. If you’re looking at a straight pull of phone records looking only at a single target phone number then you would never be able to tell that a seemingly innocent outbound call to B resulted in an inbound call from C (unless of course the bad guys were dumb enough to repeat this same pattern often enough so someone would see the same inbound call always happens 10 minutes after the same outbound call).

So, the NSA can detect rings of phone numbers. They can also detect a bunch of other well-known network/subnetwork shapes like diamonds, triangles, etc. Of course, the real power of putting this information in a true graph database is that they can detect networks of unknown shapes. You can ask the database to find recurring patterns over time of a shape that you’ve never seen before. Regular communication patterns would start to appear.

This is just one phase in a data mining process. Once you’ve got a potential list of networks (innocent or criminal, at this point it doesn’t matter, they’re just identifying shapes) then you can run that against the list of calls to or from foreign countries. Now you can see which networks have regular communication with foreign phone numbers. Then you can see if any of those foreign phone numbers are in hot spots or potentially owned by or used by the bad guys.

The list of information the NSA can mine from this data (remember that information and data are not the same thing, information has meaning, data is just data until meaning is gleaned from it in the form of information) goes on and on. When you think about the other resources and databases the NSA has at its disposal that they can use to cross-reference the information from this operation, the conclusions or insights they might gain are staggering.

Should you be upset that your rights have been violated if you happen to have your phone on this Verizon subsidiary? I’m not going to tell you whether or not to feel violated, that’s not the point of this blog post. You should, however, feel pretty confident that nobody at the NSA cares if you are making regular calls to your mistress, or any other activities that might be in your “network”. If, on the other hand, you make regular phone calls to a guy who receives regular phone calls from a terrorist, then you might expect a phone call or a visit from men in suits 😉

Regardless of how you feel about the means by which this information is gathered, it seems obvious to me that potential terrorist acts might be prevented by analyzing this information. Whether our rights were violated in the execution of this data mining operation is a topic for another blog post entirely.