Tuesday, November 1, 2016

Profiling and Optimizing

Ugh.

I  had written a fairly long and detailed posting on profiling and optimizing the map generation program, but somehow during editing it I managed to delete everything.  I closed without saving so that I could go back to the previous version, but apparently Blogger had auto-saved in the interim.  What follows will be a somewhat shorter version of the original article.  Unfortunately, I lost the original profiling results, so the numbers below are only approximations.

A few days ago a comment on Reddit asked about the time required for various parts of the map generation.  Generally speaking I believe in writing code as simply as possible, and only worry about efficiency and optimization when it becomes a significant problem.  I find that much of my code development is very ephemeral -- for example, I wrote three or four different versions of erosion code before I settled on the current solution.  It would waste a lot of time to optimize code that might get replaced.  But the comment at least made me interested enough to turn on the Chrome profiler and see where the program spends the bulk of its time.  I generated this map:


This generation took about 65 seconds on my desktop machine.  Here are the routines that used at least 10% of the total execution time:

(39%) pickNextTerritoryLoc:  Territories (such as Kunapwknek in the map above) are created by placing a capitol city at a good location and then growing the territory outward.  This routine is responsible for picking the next location.  It scans the map looking for locations adjacent to the territory, evaluates each location for desirability, and returns the best location it found.

(20%) drawLabels:  This is the routine responsible for labeling the map.  The map generator uses a method called "force layout" to place the labels.  Most of the time is spent in a function that finds an open spot for territory labels, and in a function that detects when bounding boxes collide as labels are moved around.

(20%) neighborsWithin:  This takes a location and finds all the neighbors within a given distance.  On a grid map this would be straightforward, but on a Voronoi grid it is more challenging because each location is irregularly sized and can have different numbers of neighbors.

(15%) windModel:  The wind model is responsible for propagating trade winds across the map.  Because winds can be affected by the rising and falling land, and because the Voronoi grid is irregular, this is a complex iterative process.
For different reasons, I'm not interested in trying to optimize drawLabels or windModel right now.  I may replace drawLabels at some point with a simulated annealing algorithm, so it may not make much sense to sink a lot of time into optimizing it.  And I've already spent some time making the windModel faster, so there's probably not a lot of easier improvement available there.  But pickNextTerritoryLoc and neighborsWithin are worth examining to see if there are any easy gains to be had.

Examining the code for pickNextTerritoryLoc reveals two possible avenues for optimizing.  The first opportunity is the function for evaluating locations.  It is called every time a location is considered, but most of the evaluation is static and could be computed once before the territories are grown.  The second opportunity is to improve how the routine identifies candidate locations.  Currently the map is scanned to identify all the territory's neighbors.  This is inefficient when the routine is called many times.

(As a side note, when you're implementing optimizations, it's best to work in small steps and check after every step that the program behavior hasn't changed.  This guards against introducing errors and gives you a chance to monitor the function efficiency.  Sometimes an early change saves so much time that doing the rest is unnecessary.  In this case, I began by splitting the evaluation function into static and dynamic components.  In doing so, I discovered a bug in the original function that I had to correct before attempting the optimization.)

Splitting the evaluation function into static and dynamic pieces, and only calling the static portion once yielded about a 2% gain.  As it happens, the most time-consuming part of the evaluation had to remain in the dynamic piece, and that limited the improvement.  The second optimization was to have each territory maintain a list of neighboring locations and incrementally update that list as the territory grew, rather than repeatedly scanning the map.  This was much more fruitful, and dropped the percentage of total time taken by this function from 37% to 5%.  Good enough!

Moving on to neighborsWithin, an examination of the code didn't reveal any obvious optimizations.  I attempted an optimization that involved switching from arrays of neighbors to a hash table, but partway through rewriting the code realized that this approach wouldn't work and backed out the changes.  I also thought about using a quadTree representation to speed up the search.  This would probably work (and is on my To-Do list, because it would be potentially useful in many places) but it would be more work than I was willing to spend at the moment.

In the end, I fell back on the old optimization trick of trading off space for time.  The real problem with neighborsWithin is that it gets called many times for each location on the map.  A simple optimization strategy in this situation is to cache the function result.  The first time the function is asked for the neighbors within some distance of a location it calculates the result, but it also saves the result.  The next time the function is asked the same question, it uses the stored result rather than calculate it again.  (Obviously this only works if the answer doesn't change, or if you can tell whether it has changed and needs re-calculation.)  This change dropped neighborsWithin from 20% to ~4.5% of the computing time.  Again, that was a nice gain for very little effort!

These two optimizations dropped the overall map generation time from 65 seconds to 44 seconds.  After all that, here's the approximate times spent in different phases of the map generation and display:
(27%) Generating the World (physical geography)
  (17%) Wind Model
  (4%) Add a Central Island
  (4%) Creating Biomes
(4%) Generating the Culture (countries, etc.)
(57%) Drawing the Map
  (55%) Drawing the Labels
     (40%) Finding an Open Spot for Territory Labels
     (16%) Executing the Force Layout
The glaring problem here is that 40% of the total execution time is now spent on the relatively minor task of finding an open spot for the territory labels!  As I mentioned above, I'm reluctant to spend too much time on optimizing the label layout because I'm already half-convinced that I should switch to a simulated annealing approach, but this may be worth at least a preliminary look.

2 comments:

  1. I don't have any comments specific to profiling but I wanted to thank you for writing all these details on your blog!

    ReplyDelete