One of the aspects shown on a statistics page of a publication on Genealogie Online is the “Distribution within the Benelux”. This map shows(approximately) where in the Belgium, Netherlands and Luxemburg area the genealogical events in the data collection took place. The larger the circle, the more events. This works reasonable when the numbers are small (see image below on the left) but doesn’t do well with big numbers (see image below on the right).
What I wanted was a means to aggregate the data into a more coarse representation suitable for display. Enter hexagonal binning. Rather than displaying a scatter plot with tens of thousands of points/circles, you can bin points into gridded hexagons, and then display the distribution using colour and/or area.
The data which get’s binned is constructed by Genealogie Online from the users GEDCOM. Each place name, from individual events like birth and death and family events like marriage, is collected. The dataset of geographical database Geonames.org is used to lookup the longitude and latitude of each place. The quality of this fuzzy matching process depends on the quality of the data of the users. In general over 80% of place names is recognized. Lastly, the longitude/latitude values for each event are counted. This results in tab separated lines like “52.29616 4.57822 25” which translated to “in this publication 25 genealogical events were found which took place in Hillegom (Netherlands)”.
The hexbin map draws a hexagon for a specific area. The colour is determined by the number of events within this area, which could consist of multiple places. The d3.hexbin plugin takes care of the heavy lifting, basically it just takes the TopoJSON and the TSV and constructs the hexbin maps:
The result is cleaner, but maybe less impressive for the large numbers. To counter this a little bit, a legend was added to show what a hexagon represents (waarden=values). Still, the lightest colour in the left image could represent values 1-70, where as the same colour in the right image could represent 1-5000. More tweaking can be done by using other colouring scales (which now is linear) and the size of the hexagon.
Work in progress
At the moment, the hexbin maps are only available as proof of concept. Looking at the maps resulted in new questions. Should the hexagons have tooltips? Should they be clickable, causing a geographical search within the publication for people with events in that area? Why limit the graph to the Benelux? Why not make the map zoomable and pan-able? Or should I just opt for the marker cluster solution I use on the Stamboom Forum (see image below, click it to see dynamic version)?