## Tuesday, July 11, 2017

### Path Labels (Part Six)

Back in Part 4 of this series on path labels, I started constructing polygonal bounding boxes for path labels and using them during simulated annealing to detect if a path label was overlapping a feature or another label:
This has the advantage of greatly improving path label placement, but it also has a major disadvantage:  It takes a lot more computation.

When all the labels are represented as axis-aligned bounding boxes, it is fairly easy to calculate whether they are overlapping.  But now that the path labels are represented by an arbitrary polygon, that problem has gotten a lot more difficult.  The problem of detecting whether two arbitrary polygons overlap is very difficult.  Even if you avoid things like polygons with holes in them, the calculations are still complex.

In the end I settled on using a library based upon the Greiner-Horman algorithm. This library was small, self-contained and fairly speedy.  I used this for all path label comparisons, and for comparing path labels to features.  (For the other types of labels I can still use the fast bounding-box overlap.)

But it still added a lot of computation.  Run time for a map was 140 to 200 seconds. Ouch.

Most of this time is taken up in checking for collisions between a path label (a polygon) and a fixed feature on the map (a bounding box).  It occurred to me eventually that there was probably a faster algorithm for the special case of clipping a polygon to a bounding box.  It turns out there are several, depending upon how complex the polygons are, but a good one for my case is the Sutherland-Hodgman algorithm. The good folks at Mapbox have done a Javascript implementation.

So I decided to switch the label-feature collision checks from the Greiner-Horman algorithm to the Sutherland-Hodgman algorithm.  The way I did this was to implement the new algorithm, and then run both algorithms in parallel.  If the new algorithm is correct, it should always agree with the old algorithm.   Much to my surprise, I found lots of cases where the algorithms disagreed.

One problem turned out to be this bug in the implementation of Greiner-Horman I am using.  The code gets confused if the polygon repeats its start point at the end.  I was doing this for ease of drawing, but I was able to remove the duplicate end point, and that addressed many of the disagreements I was seeing between the two algorithms.

But not all of them.

The other disagreements seem to be caused by this bug in the Sutherland-Hodgman implementation. What I found was that if the polygon is just touching the bounding box at one of the corners, the code identifies that as an overlap and (incorrectly) returns one entire side of the bounding box as the intersection.  Since there's zero overlap area when two polygons touch at only one point, I don't think this should be considered an intersection.  (And Griener-Horman does not consider this an intersection.)  But this may be a limitation of the algorithm; we'll see what the Mapbox folks say about it.  In some ways this is a fortuitous bug: reporting this as an intersection just means that Dragons Abound will avoid placing labels that touch each other, which is good behavior for my purposes anyway!

Switching to the Sutherland-Hodgman for the path label to feature collision checks cut the run-time dramatically.  A typical map went from 149 seconds to 59 seconds.  The collision checks are only about 4% of that run-time after the switch.

In profiling the code to perform this optimization, I also noticed that findLocAt() was using a lot of time (about 6% of the overall program time).  This is a function that finds a location in the world closest to some (x,y) coordinates.  It wasn't used much when I first wrote it, so I implemented a naive algorithm that just walks through every location in the world and keeps track of the one closest to the given point.  Since then it has become more used in the program and started to be a time sink.

Back when I implemented some initial optimizations for label placement, I talked about putting the features into a quadtree for efficiency.  A quadtree is a data structure that organizes its contents by spatial coordinates.  It's intended to make it quick to search for data based upon location.  It made a big difference in efficiency for label placement, and findLocAt() seems like another good place to use a quadtree.

The change is pretty straightforward.  When I initially create the world's locations, I add them all to a quadtree.  I use the D3 implementation of quadtree, so findLocAt() just becomes quadtree.find(). With this substitution, findLocAt() goes from taking up about 6% of the processing time to almost unmeasurable -- 4672 ms to 2.7 ms on my sample map.  So pay attention in your Data Structures course :-).

Amusing code snippet:

// Check *this* out! :-)
world.loc[index].index = index;

After one of the previous postings on Path Labels, /r/Azgarr suggested adding a metric that would try to place the label for a river near the midpoint of the river.  In truth this isn't an issue for most river labels on my maps, but I happened to run across a map with an example of a label being placed at the end of the river:
Here you can see how 'R.Lawtewayiyt' is placed at the end of the river.  (It may look like the lower river label is at the end too, but that's actually about midway on a longer river.)  Since I had a good example handy, I thought I'd try implementing Azgarr's suggestion.

This is fairly straightforward to add; I just calculate the center of the label, measure the distance from there to the midpoint of the river (path) and apply that (with some weighting) as a penalty to the label placement.  Here is an initial result:
You can see this has pulled the river label closer to the center of the river.  I have to be careful with the weighting here, because I'd rather be on a straight part of the river that's further from the center than on a sinuous part of the river right near the center.  The stronger I make the weighting, the more the label is pulled toward the center:
With this weighting, the label is near the midpoint of the river.  In this case the river is pretty straight, but you can see that the end of the label has become slightly distorted to follow the river.

This same map had another problem:
In the center bottom of the map, the river label is on top of a lake.  The reason for this poor placement is clearer once I turn on label boxes:
The green boxes here are areas to avoid.  The edges of the lake are boxed but the lake itself is not. This makes some sense; it would be okay to have the label for a city on a lake entirely in the lake itself.  On the other hand, I might someday add names to the lakes and I'll presumably want those inside the lake.  So it would be better to make the whole lake something to avoid.  I can do that by using the bounding box of the lake as the feature to avoid:
Now the green box encompasses the entire lake and with this constraint in place the algorithm can't find a good spot for the river label and drops it entirely.  Interestingly, the city label 'Toynewnmoyt' was previously in an awkward spot far from its anchor, and now it has moved down to overlap the lake much closer to its anchor.