Ticket to Graph: All Aboard!

What’s going on here?

If you’re lost about what I’m trying to do here, the beginning can be found here: Ticket to Graph: modeling a board game in Neo4j. The TL;DR is that Ticket to Ride is a popular board game that asks players to build the best rail network across North America by buying routes to connect cities. I’m building out a model that will eventually be able to track the state of the game and help suggest the best routes and most valuable cards to pick up. Above is a list that includes all the posts in this series in order.

The Map is Full!

I spent a decent amount of time looking at a JPEG of the board and writing JSON to represent the rest of the relationships between cities and routes, and after loading them with the scripts I completed last entry, now I have a complete map of the base game represented within Neo4j!

These 36 city nodes and 98 route nodes represent all the important aspects of the board. In the future, we’re going to add some more features to these nodes and relationships to keep track of individual game states: things like who owns what route, what moves are legal, etc. But now that we have a representation of the board, it’s enough to take a little break from modeling and see what we can learn about the design of the board itself.

Data Analysis Depot

match (c:City)-[:CONNECTED_BY]->(r:Route)<-[:CONNECTED_BY]-(c2:City)
WITH c, count(r) as Connections, c2
WHERE Connections > 2
return c.name, Connections, c2.name
;

This is the first query I wrote to do a quick sanity check of my model, since the JSON typing was a little bit monotonous, and I had messed up a few of the relationships. I cleaned up some of those issues and continued.

Obviously, this model will be most useful when we’re tracking the current state of the game, at this stage we really only have data about the structure of the board itself, but maybe if we dig a little bit, we can find some interesting insights that might begin to shed light on potential strategies to use in the game.

Most Connected Cities

I want to look at the cities on the map and try to understand what cities are easier to connect to in general. Cities with a large number of connected neighbors are going to be easier to connect to and thus destination cards with them as end points should be more attractive when selecting which cards to keep. Here is my first pass at this query:

match(c:City)-[:CONNECTED_BY]->(r)<-[:CONNECTED_BY]-(c2:City)
RETURN c.name, count(c2)
ORDER BY count(c2) desc

This is going to crawl our model from every city and find all the cities that are one route away from one another. Here’s one city to illustrate roughly what’s being counted:

match(c:City)-[:CONNECTED_BY]->(r)<-[:CONNECTED_BY]-(c2:City)
WHERE c.name = 'Pittsburgh'
RETURN c, r, c2

For Pittsburgh, our first draft of the query is counting 9, while we actually only are connecting to 7 cities. This is because we’re counting distinct paths that can be traversed from Pittsburgh. In play, this probably is important, since multiple routes are available, and one can play either in order to achieve the same thing.

To be completely frank, when I realized my mistake, I got a little deflated and I think this could potentially be difficult to work with the model as I’ve initially designed it. I’m going to put a pin in that and try a few more queries. Neo4j training stresses that taking an agile approach to designing your graph database is essential. If I find that my design is suboptimal in any way, I can just work the model until it works better. I’m also just slightly rusty in Cypher since I trained almost a year ago, so we press on. If it doesn’t get better as the feeling of writing these queries returns to me, I’ll consider what improvements I can make.

In order to actually return the number of cities connected to, I revised the query into the following:

match(c:City)-[:CONNECTED_BY]->(r)<-[:CONNECTED_BY]-(c2:City)
RETURN c.name, count(distinct c2.code) as Neighbor_Cities
ORDER BY Neighbor_Cities DESC
;
City NameNeighbor_Cities
Pittsburgh7
Denver7
Helena7
Oklahoma City6
[abridged][abridged]
Boston2
Las Vegas2
Vancouver2
Cities with the most and least neighbors.

So as we can see, there is a pretty wide range of neighbor cities. Several cities have as many as 7 neighbors, but cities at the bottom of the stack have as little as 2.

For anyone who has ever played the game before, Miami has a reputation for heartbreak. Miami has a total of three cities connecting it, all of which are long single routes of at least 4 cars. As you can see, in terms of connectedness, Miami is not the worst on the board, but there’s a lot of risk in trying to make a play there. Many players I talk with avoid needing to go there altogether if they can.

With that in mind, I think we should look at the reasons that Miami is more memorable. I think all the single routes and how long they are is at play. Let’s look into each of those. Maybe the number of routes is more important than I thought. At the same time, I think the direct relationship between the cities is more important than I thought. And I figured out a quick way to add that to the model.

match(c:City)-->(r)<--(c2:City)
merge (c)-[:CONNECTED]->(c2)
merge (c2)-[:CONNECTED]->(c)
Bidirectional relationships between all the cities — sans route data.

And as easy as that, I now have a way to look at just the cities and not the routes if we don’t want. This makes it way easier to look at how many skips are needed to connect different cities. I’m not sold on the relationship type name “connected” but I will put a pin in that. Now the earlier query about the connected cities is a little simpler to write, and faster too.

MATCH (c:City)-->(c2:City)
return c.name, count(c2) AS Neighbors
ORDER by Neighbors desc

This makes it a lot easier to conceptualize how connected cities are. We can do things like see each city and how many cities are within two or fewer jumps of it.

MATCH (c:City)-[*..2]->(c2:City)
RETURN c.name, count(distinct c2.code) as SecondNeighbors
ORDER BY SecondNeighbors desc

When I sat down to work on this, I assumed that I was going to be just writing a bunch of queries, but instead I uncovered a shortcoming in my model and resolved it. I hope this comes off as charming and casual rather than unplanned and erratic, because for these posts I really am writing along my process as I work rather than planning anything out ahead of time. If you’re reading this, thanks for joining me for the ride!

Next time, I’ll try and get a little further with the analysis, but it’s fun looking at the layout of the board from a graph database perspective and seeing it substantiate my experience of playing the game.

If you’d like me to casually amble through your organization’s data, please reach out and we can figure out how to work together!

Leave a Reply

Your email address will not be published. Required fields are marked *