Thursday, June 29, 2017

Path Labels (Part Four)

Last time I left off with path labels in this state:
Let me catalog some of the problems and work on addressing them.
  1. The label for "R. Zhochuchuchu" is on a nice straight part of the river, but it's all the way to the end, which looks awkward since there's clearly room for it to be more centered on the river.
  2. The labels for "R. Chutfkur" and "R. Torzhozhida" are overlapping on the rivers they label.
  3. The labels for "R. Torzhozhida" and "R. Kurha" are upside-down.
  4. The labels on concave paths (e.g., "R. Chutfkur" and "R. Torzhozhida") look awkward.
  5. The placement for "R. Torzhozhida" is probably too curved for reasonable labeling.
So, plenty of room for improvement!  :-)

Centering labels on the river should be fairly straightforward.  I need to add a new criteria to the simulated annealing that prefers labels to be close to the middle of the path they are labeling.  This should be strong enough to force the label onto a (somewhat) more curvy path, but not strong enough to force a label to overlap with another feature.
If you compare this map with the one above, you'll see that "R. Zhochuchuchu" is much more centered.  The other labels only shift slightly because they were already pretty close to the middle of the river paths.  However, "R. Zhochuchuchu" now overlaps another river.  :-(  Well, that was already issue #2...

There are a couple of reasons why a path label might overlap another feature.  One possibility is that the bounding box approximation I use when placing labels isn't accurate -- that is, it doesn't overlap the feature while the actual label does.  I can check that by turning on the diagnostic display of that box (it's the purple):
I could tell by visual inspection that the "R. Zhochuchuchu" bounding box was overlapping, but you can also see that "R.Chultkur" is overlapping some green boxes as well.

So the overlap is real.  It could be that the algorithm is not detecting the overlaps correctly, or it could be that it detects the overlaps but still thinks that's the best placement.  Giving very high weights to the overlaps (so that the algorithm tries much hard to avoid overlaps) will tell us which is which:
So this has fixed the "R. Zhochuchuchu" overlap problem, but it looks like "R.Chultkur" might still be overlapping.  Hard to say.  One nice thing about SVG is that zoom actually works:
You can see that simulated annealing has actually found a placement where the purple box just barely *doesn't* overlap any of the green boxes.  It's kind of impressive, really.

However, because the purple bounding box is only an approximation, the final label can end up overlapping the river anyway:
Which is unfortunate.

I could deal with this by padding the offset between the label and the river, but there's actually another more serious problem with the bounding box approximation.  Suppose we consider putting the "R. Torzhozhidal" label on the other side of the river.  I've drawn in the approximate bounding box that would result:
You can see that because the river is convex between the two anchor spots for the bounding box, the bounding box will inevitably overlap the river, unless the offset from the river is very large:
And that placement will get a largely penalty for being so far away from the river.  This means that labels are rarely placed on the convex side of a river.  The solution to these problems is to use some better approximation of the bounding box.

So how do we do that?

First, it's obvious that we can't actually use a box -- we need a shape that is close to the curve to which the final text will be fit, so that it will bend around obstacles appropriately.  Let's assume I want to fit text to the portion of the path below between the white markers.
It seems like I could start by duplicating the path between the two points and offset it to where I want to place the text.
Assuming I want the new path to be X units away from the existing path, how should I offset it?  The right answer is probably complex, but reasonable approach might be to establish a normal to the line between the two endpoints on the original path and offset the new path in that direction X units.
Here the black arrows represent an X unit distance along the normal to the dotted line.  Next, I create another copy of the path and offset it in the same direction the maximum height of the text.
Finally, I connect the endpoints of the two copied paths.
And that is the bounding box for text placed along the river at that location with X offset.  (With some allowances for descenders in the text and so on.)

This isn't exactly right, because SVG seems to place each glyph (letter) in the text at the normal to the path at the point where the letter is drawn, but it will at any rate be a much better approximation than the rectangular bounding box.  After fixing a few dumb bugs, here's what I have:
(I've added the green dot with the purple outline to indicate the midpoint of the river.)

It doesn't look like the labels are being placed correctly, but the purple bounding looks pretty good.  The labels are all on the bottom line of the bounding box, which means I need to offset the text by the measured size (as I figured out back in Part Three).
So now the text is at least centered in the bounding box, although it looks like some other problems are back.  One step forward, two steps back. :-)

One problem is that the center of the bounding box -- which is used to gravitate the label towards the center of the river -- is no longer being computed correctly.  The center of a curved arbitrary bounding box is probably a problematic notion, but at any rate the spot I want to use is halfway between the middle points of the top and bottom sides of the bounding box.
The (semi-visible) purple dot in the middle of the bounding box marks the center of the bounding box; you can see that the top label has been pulled closer to the middle of the river (the green dot with the purple border).  However, the labels are now overlapping the river and other features again.  Well, one step forward, two steps back... again.  :-)

The problem here turns out to be somewhat unexpected:  due to a bug in the code that places the labels, they weren't allowed to be very far from the river (you can see in the above screenshot that the purple box always lies almost exactly on the red path).  Loosening that up improves things:
Here you can see how the bounding box is no longer overlapping the green boxes -- although it is just barely kissing in spots. Now let's look at reversing labels like "R. Kurha" so that they run the other direction.  This turns out to be a little more difficult when I'm laying text along a path.

When my path labels were just regular text, I could flip them around to the other orientation by rotating the text 180 degrees and then moving it to the other end of the bounding box, something like this:
Unfortunately, that won't work for path labels.  When you lay out text on a path, the direction of the text matches the direction of the path.  If you want the text to run the other direction, you need a path that runs in the other direction.
(In either case, you can move the text to the other side of the path by using an offset but that doesn't chance the direction or orientation of the text.)

It's possible to reverse an SVG path, but the easier solution in this case is just to create paths in both directions for the river so that I can switch labels from one to the other as necessary.  However, the starting point isn't the same; when I reverse the path I have swap the starting point to the other end of the bounding polygon.  I also have to adjust the offset to pull the label back onto the same side as the bounding polygon.

Here you can see that "R. Kurha" and "R.Torzhozhidal" have been reversed to read the opposite direction.  Also, you can see that the long label starts outside of the bounding box -- I think that's because SVG sets text using the normal to the path at each character, and in this case that's quite a bit different from the normal I'm using to construct the bounding box.  Fixing that would be painful, so I'm going to let it go unless I see it causing significant problems down the line.

The last problem I want to address is when -- despite the best efforts of the algorithm -- a label ends up on an overly curved portion of a river, as with "R.Torzhozhidal" above.  This almost always happens because there simply isn't a less curvy spot available.  Here's another example from a different part of the same map:
"R.Zhochumottor" is in the best spot for labeling, but the river is just too curvy for the label to look good.

There are several ways this could be addressed.  Probably the best is to somehow simplify the river path and reduce the curve.  Or rather than trying to follow the path, labels could use generic shallow arcs situated to best follow the path.  For now, I'm going to take an easier way out:  I'll simply drop any labels that are too curvy.  Since rivers are often unlabeled anyway, this should look fine.

So let's take a step back and see where we are:

Overall, this label placement looks very good. Almost every label is in a near-optimal position. I see only one obvious problem:  That "R. Chukor" and "R.Tertitdul" fall on mountains and are hard to read as a result. I'll address that next time.

Friday, June 16, 2017

Path Labels (Part Three)

I left off last time with the path labels placed on the rivers but not yet curved to match the rivers.  I also noticed a problem with the bounding boxes for my labels:
You can see that the city and river labels are positioned on the bottom of the purple bounding boxes instead of being centered.  And labels that have descenders (like "Sottolkgol") stick out below the bottom of the bounding box.  There problems are less obvious in the curved region labels, but they are also not centered in their bounding boxes.

I wrote previously about some bounding box problems related to using web fonts, but this problem persisted even when I switched to a built-in font, so it's not related to that problem.  After some investigation, I realized that I had some confusion over the coordinates for the text item and the bounding box.  There are three different coordinate systems I need to track:
The black circle represents the coordinates of where I drew the text on the screen.  With left-justified text, this is the left-most point on the text baseline.  The green circle represents the origin of the bounding box around the text.  This is below the black circle because (1) the text has a descender that goes below the baseline, and (2) SVG seems to add some margin on the top and the bottom of text (but not on the left and right?).  Finally, the red circle represents the center point of the bounding box.  I use this as the location of the label.

The only way to figure out the origin (green circle) and size of the bounding box is to draw the text on the screen and use the browser to measure the text.  So when I first place a label on the screen I have to measure the bounding box in order to calculate the relationship between the black, green and red circles.  Fortunately this works the same for both regular text and text which has been placed on a path (which is how I created the arced region labels).

It took me way too long to wrap my head around the various coordinate systems and get the math right.  (It doesn't help that the Y axis in SVG is upside-down.)  It was even worse for the path labels which are rotated from horizontal.  (And I had a bug in the river label code that took a day to find.)

But eventually I got all the math worked out and the correct bounding boxes for the text.  Along the way I uncovered a couple of errors in the handling of the labels by the simulated annealing so I fixed those as well.
You can see that the labels are now properly within their bounding boxes and that the red circle is at the center of the bounding box.  The label placement is also much better as a result.

In retrospect this seems like a bit of dumb problem, but you guys seem to like to see my mistakes as much as my successes :-).

Before I work on curving the river labels to match the river, I'll mention a couple of unique challenges in doing the river labels.

First, typically not all rivers on maps are labeled:
In this example, the "Darkmere" river is labeled in the upper left, but the small river near the center of the map and the river through the hills on the right of the map are not labeled.  That's true on real maps as well, as shown in this excerpt of a map of Mississippi:
In fact, if you happen across a map where every river is labeled, it looks very odd:
I suppose that cartographers pick which rivers to label based on various factors -- the larger rivers, the longer rivers, or rivers that are important for other reasons.  For Dragons Abound I'm choosing to label only the rivers that are substantially longer than the river name.  "Substantially" is a matter of taste, but in practice a number in the range 1.25x to 1.75x seems to give the right proportion to my eye.  (One odd thing is that the place name generator from Martin O'Leary -- which I still use -- tends to generate a lot of long names.)  Given the way the drainage model works, these are also usually the biggest rivers:
You can see a number of unlabeled rivers in this example.

Another interesting aspect of river labels (and path labels in general) is that they're often repeated on long paths.  Imhof also suggests repeating river labels above and below major forks, where the river name may change, as in this example from his paper:
Here the "Albula" label is repeated downriver because otherwise it would be unclear what the name of the river there was.

It turns out the map scale in Dragons Abound makes rivers long enough to require multiple labels pretty rare.  At a simple level, adding multiple labels isn't terribly difficult -- I just put two labels on the river, and because the simulated annealing algorithm tries to avoid overlapping labels, they get pushed apart:
But as this example shows, just avoiding overlap doesn't lead to a pleasing arrangement of labels.  A better implementation would add a criteria that tries to keep the labels farther apart.  Or a more sophisticated approach would try to keep the labels on different sides of river junctions.  At any rate, for the moment I'm using a single label.  When I get that working well I'll look for maps with long rivers and see if it is worthwhile to implement multiple labels.

Now let me turn to the problem of curving the river labels to match the rivers.  It's a little harder to find good examples of this in amateur maps, because it requires laying out text on a path, which some graphics tools don't support and is at any rate difficult to make look professional.  You can see one example in how "Landwasser" is written in the Imhof example above, and here's another example from this map:
Laying out text along a path in SVG is actually straightforward; you create a piece of text as usual, and then you attach it to a path via a TextPath element.
(Blogger seems to have some problems with embedded SVG, so I used an image instead.)

You can control where the text falls along the path using the "startOffset" parameter, which works pretty much exactly how you'd imagine.  So it's pretty simple to drop text on a path:

But as you can see there are some problems with that approach.  The most obvious being that even though I've made some effort to select "straight" parts of the river for labeling, it still doesn't work very well.  The way to fix that is to smooth out the river (or really, the path along which I am placing the label).

I have two ways to smooth out a path.  One method works by dropping some of the points in the path, and the other works by average each point with it's neighbors.  Both of these help, but dropping points is the most important factor -- here's an example showing the smoothed rivers in red:
This does a pretty good job of creating smooth paths, but dropping arbitrary points can sometimes lead to poor results.  The "R. Kurha" river on the left edge of the map illustrates the problem -- between the label and the city of Holdal the red line is just about the inverse of the blue line.

The problem of reducing the number of points in a path while trying to keep the correct shape is called line simplification.  There are a number of algorithms for doing this.  I'm going to look at one called Visvalingam’s algorithm, primarily because Mike Bostock uses it on this page, and I can use his Javascript to help understand how to implement it.

If you look at Mike's page, there's a good explanation of how Visvalingam's algorithm works.  To steal an illustration from his page, you can visualize the areas between successive pieces of the line as triangles:
The algorithm simplifies the line by repeatedly removing the middle point of the smallest triangles.  This has the effect of creating least visible impact at each step.

It's pretty amazing how effective this algorithm is at reducing the number of points in the path while still maintaining nearly the original shape.  Here's a map that compares the original rivers to rivers with 50% of the points removed:
You can see some differences in the paths, but it's pretty minimal.  To get to a path that's more suitable for drawing labels, I have to remove about 85% of the points:
Averaging each point with it's neighbors also helps smooth out the paths:
These paths look fairly reasonable for laying out text, so let's try dropping in the labels:
There are still some problems here, but you can see that this is starting to look like a reasonable approach for labeling paths.  I'll continue next time addressing some of the obvious (and not so obvious) problems.

Friday, June 9, 2017

Path Labels (Part Two)

In the previous post, I talked about adding path labels to Dragons Abound and described the first part of that -- the part of the simulated annealing algorithm that found new candidate positions for path labels.  In this posting I'll talk about the second part, which is how to evaluate a candidate position.

For point labels -- like the name of a city -- I used the following criteria to evaluate a candidate position:
  1. Going outside the map area. 
  2. Distance of the label from its anchor point.
  3. Overlap between labels.
  4. Overlap between labels and map features. 
  5. Label placement penalty.
#2 and #5 don't make sense for path labels, but otherwise this is a good start.  Other criteria I think I should consider include how far a label is offset from the path (closer is better) and how curvy the path is where the label is placed (smoother is better).
  1. Going outside the map area. 
  2. Overlap between labels.
  3. Overlap between labels and map features. 
  4. Offset distance.
  5. Curviness of path.
Implementing #1 is the same as with the other label types so I'll skip talking about that.  #2 and #3 are a little different.

Previously, all the labels and features in Dragons Abound were horizontally aligned:
This makes detecting overlaps fairly trivial.  However, path labels are not aligned horizontally:
So the simple intersection is not going to work.

Determining whether two arbitrary polygons intersect is a non-trivial problem.  Certainly one I don't want to solve myself, but no need, there are plenty of implementations to choose from.  I ended up using an implementation of Greiner-Horman from Alexander Milevski.  I chose this implementation because it was fast, small and did what I needed, but he also has a more full-feature implementation of Martinez-Rueda polygon clipping that I can switch to if necessary.  (And I might, because that supports polygon offsets, which might have some uses in map creation.)

So to implement the overlap conditions for path labels, I have to replace the simple intersection with the more complex Greiner-Horman intersection.  This is pretty straightforward.  Wiring it all up to work with the simulated annealing algorithm is a little more work (mostly because I decide to refactor the whole thing about halfway through), but eventually I have that portion of the evaluation working, and it produces this:
Here you can see the upper label has been moved off the river (where it overlaps with the features comprising the river).

The next step is to assess the offset distance.  Initially I thought that trying to minimize the offset distance was the right idea, but it turns out that jams all the labels right up against the river.  It's better to reward a label that's close but not too close -- say a half label height or so away from the river.  That results in something like this:
Here I've shown two red spots where the label is anchored to the river, and you can see that the labels are about a half a label height away from those anchors.  But notice that the path (river) for the upper label has a bend so that even though the anchors are properly positioned, the label almost touches the river near the "Z".  That will hopefully be fixed when I curve the label to match the river.

Speaking of curves, the last criteria I want to implement is the curviness of the path, so that labels prefer to be on straight sections of the path.  How do you measure curviness?  I chose to use a simple metric:  the ratio of the length of the path between the two red dots to the straight-line distance between the two red dots.  That turns out to be sinuosity.  So I'll turn that on:
You can see the upper label has shifted quite a bit upstream to get on a nearly flat part of the river.  The lower label move is more subtle, but by shifting just a bit downstream it found a much flatter part of the river.

The last thing I'm going to implement in this posting is label flipping.  You'll notice that the lower label in the above screen shots looks awkward because the angle puts the text upside down to the reader.  The solution is to notice when this happens and flip the label end-for-end so that it reads the other direction:
You can see the label is now easier to read.  Unfortunately, this reveals a couple of issues.  First, the label is overlapping another river, which it shouldn't do.  More subtly, the top of the bounding box is now towards the anchor point instead of the bottom.  The problem is that I also flipped the bounding box when I flipped the label.  If you'll forgive a crude cut & paste job, this shows the original label and the flipped label:
The original bounding box didn't intersect with anything and was a good distance from the river.  What I want to do is flip the label within the bounding box:
Well, this sort of worked.  The label is now a reasonable distance from the river.  But for some reason it is now overlapping with the region name.

To make a long (debugging) story short, there are a number of problems here.  The most relevant is that I hadn't gone back to the code for the area and point labels and updated them to check for overlaps with the path labels.  So the "helchza" area label was ignoring the "R. Kurha" river label.  Oops.  Then to make matters worse, I had a silly error in how I was constructing the polygons for that check, so I was essentially only checking the diagonal rather than the whole polygon.  Those things are easily fixed.

I also noticed a problem with the bounding boxes for rivers.  To make the labels avoid overlapping the rivers, the rivers are turned into a series of fixed bounding boxes that are checked for overlap.  But it turns out the bounding box for the very end of the river was missing:
This is a type of "fencepost error" where you don't handle the start or end of a sequence correctly.  It's a very common error where you have to use both the "i"th element and the "i+1"th element to process the sequence.  And indeed, to turn a path into a sequence of bounding boxes I operate two elements at a time.  Or at least I did at one point -- a change in the path data structure made that unnecessary, but the code hadn't been updated.  That was quickly fixed:
With all those fixes in place, the river label placement looks pretty good:
You can see the labels are on reasonably straight sections of river and placed out of the way of other labels and features, and "R. Kurha" has been flipped to read correctly.

It looks like I still have some problems with the bounding boxes on my labels and I still need to curve the river labels to match the rivers.  Those things next time.