Saturday, July 28, 2018

The Naming of Places (Part 10): Towns (continued)

In the last posting, I covered a number of ways to generate town names:
  1. An invented name plus a word for “village," e.g., “Schen Village"
  2. A descriptive natural feature, e.g., “Blue Hill" or “Apple Woods"
  3. An invented name plus a suffix for “village," e.g., “Schenley"
  4. A city base name plus a suffix, e.g., “Woodford," “Hollyville"
That produced a pretty good mix of names, but there are a few other variants I want to add.

Many towns are named after people.  Variant #3 above is a version of this, where the name is invented.  But it would also be nice to have a scattering of “real names" or at least some typical medieval English names.  In medieval times most people were identified by first name, and towns reflected this, e.g., James-town, Louis-ville, etc.  I put together a list of about 500 common medieval first names (both female and male) and made a rule to combine them with the town and city suffixes.  This gives me names like this:
Some of these might seem a little unfamiliar, but for the most part they seem like reasonable town names.

The next variant is to take a descriptive natural feature name and run the words together, e.g., “Blue Hill" become “Bluehill".  This is just a matter of leaving the space between the words out and forcing the second word to be lower case:
About half of these are not too bad -- “Wetneck," “Earlyspur," “Bestmarsh" and a few others.  For some, the words don't match up very well, e.g., “Beguilingquarry".  “Desertedwasteland" is actually a good match, but just seems too long and labored to really be a town name.

However, I won't normally be using these words directly.  Instead, I will feed the two word names (e.g., “Early Spur") into a process of linguist drift that will mimic the sort of shortening and word changes that could turn a descriptive name like “Pearl Moorland" into “Pearlmoorland" into “Pearlmoor" into “Pearlmore" and finally into “Perlmore."  I'm sure there are many sophisticated models of linguistic drift, but my approach will be rather more adhoc :-).

The first step will be to reduce the longer word combinations by dropping some syllables.  To do that, I need some code to break words into syllables.  This is a little bit of a problem.  There are a number of Javascript packages to hyphenate words, but since you can't hyphenate at every syllable (e.g., you can't hyphenate the word “ivory" between the first and second syllables) that doesn't help much.  There are also a few packages to count syllables that seem to work fairly well, but they don't actually break the word down into syllables as they count, so that's not much help.  The best I found was the fairly simple and short nlp-syllables, although it makes mistakes:
Safe Mink (Safe-Mink)
Normal Bull (Nor-mal-Bull)
Lucky Hollow (Luc-ky-Hol-low)
Twin Hill (Twin-Hill)
Remote Plateau (Re-mote-Pla-teau)
Loose Ditch (Loose-Ditch)
Garnet Onion (Gar-net-Onion)
Busy Spur (Bu-sy-Spur)
Hidden Hedgerow (Hid-den-Hed-ge-row)
Weary Lagoon (Wea-ry-La-goon)
The idea will be to trim any combinations that are greater than 3 or 4 syllables.  I will do this by removing syllables from the middle, i.e., from the end of the first word or the beginning of the second word.  So trimming “Lucky Hollow" to three syllables and combining the words would give me either “Luckylow" or “Luchollow."  “Remote Plateau" would become “Replateau" or “Remoteteau."  It also works pretty well to sometimes trim from 3 to two syllables:  “Normal Bull" to “Norbull", “Busy Spur" to “Buspur" etc.

My current setting trims phrases to 2 syllables 25% of the time, 3 syllables 50% of the time, and 4 syllables 25% of the time:
Stony Bog (Sto-ny Bog)
Merry Landslide (Mer-ry Lan-dsli-de)
Paltry Magnolia (Pal-try Mag-no-li-a)
Key Quarry (Key Qu-a-rry)
Jasper Foothill (Jas-per Foo-thill)
Tourmaline Peak (Tour-ma-line Peak)
Magenta Snake (Magen-ta Sna-ke)
Burgundy Volcano (Bur-gun-dy Vol-ca-no)
The results are interesting.  It's not exactly what I expected, but the names have a good sound to them.

I sometimes get long single word city names as well, usually when the fantasy language generator spits one out.  I've done something similar to reduce the length of long single word names as well:
Shortened Reslonley to Resley
Shortened Gundreawood to Gunwood
Shortened Branwyneton to Branton
Shortened Bramblside to Bramde
Shortened Wrongfile to Wronle
Shortened Regsetken to Regken
Shortened Memkenfield to Memfield
Shortened Setslenti to Setti
Shortened Sutanken to Suken
The algorithm maintains the first and last syllable and randomly removes syllables in the middle of the word.  The result is often quite different from the original word, but if you squint can often imagine the name getting shortened that way.

Sometimes shortening words results in some wrong or unlikely spelling, like having three of the same letter in a row.  To address these sorts of problems, I created a small dictionary of regular expressions that are used to fix these problems.  Each entry is a regular expression to match in the word, and then a substitution for the match.  The entry for triple letters looks like this:

// No tripled letters
[ /([a-z])\1\1/, '$1$1' ],

This is a list in which the first element (enclosed in slashes) is a regular expression to match in the word, and the second element (enclosed in single quotes) is the string to substitute.  I won't provide a tutorial on regular expressions here, but the regular expression above matches any letter that appears three times consecutively, and then the string replaces that with the matched letter two times in a row.   Here are some examples of these rules being applied:
Applied rule to change Timrardan Castle to Timardan Castle
Applied rule to change Raenmeton to Ranmeton
Applied rule to change Plebmetittown to Plebmetitown
The first rule gets rid of hard-to-pronounce r-vowel-r patterns, in this case, the “rar" in the middle of the name.  The second one replaces “ae" with “a", and the last one gets rid of the awkward doubled t caused by adding -town to a word ending in a t.  Some of these are obviously a matter of taste, but I've picked a small set of rules that I think improves the readability of the city names.

The last set of modifications makes more substantive changes to the names.  I only do this to 25% of the names.  Some of these modifications try to give a “medieval" feel to the word, such as by adding an “-e" to the end of a word, e.g., “Memfield" to “Memfielde".  Other modifications randomly swap sounds, e.g., “Bududu City" to “Buwudu City."   Here are some examples:
Modified Tiunchin Manor to Tiufin Manor
Modified Shamsland to Shamslande
Modified Oranfore to Ozanphore
Modified Rare Cut to Rale Cutte
In the first name, “f" has been substituted for “nch".  (When substituting vowels or consonants, the rules always replace a string of vowels or consonants.  This prevents a substitution into the middle of a sound, so you never get something like “nch" to “njh".)  The second example has had an “-e" added at the end.  The next two examples have had multiple modifications.

At this point I have about 20 rules that fix names, and about 50 rules that do linguistic drift of one sort or another.  I'll use these for a while and see how I feel about the results.

The next thing I want to tackle are related city names, such as “Hampshire" and “New Hampshire" or “Virginia" and “West Virginia."  My approach is pretty simple.  When I'm naming a city, there's a small chance I'll name it after a nearby city.  If that happens, I'll pick the nearest city as a namesake and then create either a directional name or “New/Old".  The only slightly tricky aspect is that the namesake city might not actually be named yet.  To avoid that problem, I'll actually do this check as a second step -- I'll name all the cities as normal and then go through to see if I want to create any namesakes.

This happened during an early test run:
The city on “ke Te Island" got named after a town that had been named after a town...  I realized that I don't want to have multiple namesakes from one town (I don't want “North Goton" and “Old Goton" as well) and I don't want to create a namesake from a namesake.

Another useful constraint is to set a maximum distance between a city and its namesake.  It seems odd to have “North Goton" be on the opposite side of the map from “Goton."  A distance of about 1/5 of the map seems to work fairly well as a limit.

Finally, another version of related names is to name a city on an island after the island, so that you get “Gonlo City" on “Gonlo Island."  I've added a rule for that as well.


  1. Great work. I like the way youvey done regionalisation. Do you have any plans to include towns with names including on, upon, over? Normally combined with a sea or river.

  2. Awesome writeup! Your explanation of the linguistic drift model you've implemented was easy to follow, and fascinating too! Very cool.