Skip to Navigation

Mapping County Health Rankings: Mapnik, Node.js and PostgreSQL

The third launch of County Health Rankings has been the most exciting out of any of the three due to the mapping component that serves as the primary way to access the rankings.

Having a map on the home page isn't new for County Health Rankings, each version of the site has had one. But where those were done in Flash and used only for navigation the new one is completely interactive. Some of the challenges that prompted changing how it worked were:

  • Portability — Flash doesn't work well in a large number of mobile devices.
  • Difficulty in switching between contexts — each state and county was a separate page, often more than one level deep, so moving between viewing rankings for Virginia and rankings for Maryland took several clicks and reloads.

When our Information Architects came up with the idea of having an interactive map it was an exciting challenge. The first question was what components were we going to use? Thankfully this was a pretty easy decision as the choices are limited. We could have gone with an ESRI product, I'm sure, but we like to work with open source products when we can, so Mapnik was chosen as the preferred method to generate the map tiles. Since we were going to be combining geospatial information with data deciding on PostgreSQL and PostGIS was another easy decision. That left us with what to use as the tile server itself, the component that took the request and handled caching and calling Mapnik. With Mapnik our options on programming languages to write this in are also a little limited, you can use C (been way too long), Python (sorry, Google, I just don't like it) or Node.js — we chose Node.js. Node had the advantage of being a familiar language and benefitting from a lot of work already done by our friends over at Mapbox with Tilemill that we could learn from.

Let's walk through a little bit about how this whole thing works together. When you load up the County Health Rankings site your browser loads some JavaScript libraries called Modest Maps and Wax. These allow us to show the map as a series of little images called tiles and track as you zoom or pan about. Each tile passes what your viewing as well as the zoom level and coordinates relative to the start of the map, which happens to be where the international date line meets the North Pole. So if you were to click on Virginia...

what you're seeing is the map composed of a bunch of images with URLs that look kind of like this: "http://maps4.countyhealthrankings.org/tile/categories/7/36/49.png?year=2012&state=51&cid=1 ..." or if we were to separate out all the tokens we'd have something like "http://maps4.countyhealthrankings.org/tile/{map}/{z}/{x}/{y}.png?year={year}&state={state}&cid={id}." This then goes to an Amazon Elastic Load Balancer which distributes it to each of the mapping servers. Once it hits the server we cache it for a time using Varnish (because Varnish is awesome). If Varnish doesn't have the tile cached it passes it to our custom tile server written in Node.js which checks to see if it has it cached locally. If it's a new request we take the tokens provided in the request and create our custom SQL query for PostgreSQL and pass it to Mapnik to generate the tile, cache it after it's been generated and then return it. 

While we normally use Rackspace for our hosting needs we chose to use Amazon EC2 for a couple of reasons. One was the ease in creating server images and standing them up quickly if we needed to add capacity. Despite fairly detailed statistics from previous years and load testing we weren't entirely sure how many servers we would end up needing to support the launch so we needed to be able to bring up as many servers as we needed in a short period of time without having to fiddle with a lot of configuration. In fact we ended up creating an image that had the entire mapping stack on it completely independent of any other server, so each server has it's own instance of Varnish, the tile server, PostgreSQL with data and Mapnik and shares no resources with any other server. Not the most efficient setup, but the simplest and easiest to scale. The other reason, and the primary one, was that Amazon EC2 allowed us to increase the CPU of the server without increasing the memory. This is important since generating these map tiles requires very little memory but is very CPU intensive so the cost to run it on Rackspace would have been much higher since in order to scale the CPU to meet our needs we would have also been paying for the memory we didn't need.

One reason this project has so many pieces is due to having to render the tiles on-demand instead of generating them all ahead of time. One of the main requirements for displaying these rankings is that one state cannot be compared to another state; how the data is gathered and analyzed makes that sort of comparison impossible. So each state needs to be shown in isolation and there are potentially more than 125 separate pieces of data that can be visualized for each state. Those sorts of numbers start to add up quickly.

Description Number
Tiles for zoom levels 1-9 in US approximately 42,000
States 50
Data elements 125
Total 262,500,000

And that's just from zoom levels 1-9. It gets a lot worse the further down you go. And since it's possible for the data to change having to re-render every possible combination would have been an extremely time consuming process. 

There were a number of challenges we faced in getting this project launched. One of the largest was the scale of the traffic and the sheer number of possible ways of viewing the data. In the first 24 hours after the report went out the website handled just shy of one million page views which in turn generated approximately 80 million requests to the mapping servers. That's not Google Maps territory but I'd like to think it's a pretty good number. Another large one was the complexity of integrating these various components, several of which are undergoing current rapid development.

Thankfully we were able to get a lot of support from the community to point us in the right direction and fix a couple of issues that cropped up. And, of course, there's always the fact that there were many parts of this project that were new to us. Having done it once I'm looking forward to see what other opportunities there are.