Tuesday, June 26, 2018

The Naming of Places (Part 5): At Bay

Now that I have a grammar tool more-or-less working and retrofitted the Lost Coast names to the tool, I'm going to tackle another type of place names: bays.
This picture illustrates how Dragons Abound currently handles bays:  It identifies a stretch of coastline that creates a “pocket" and then attaches a label to the center of the pocket.  Currently the place name is just an invented name from Martin O'Leary's generator plus one of a few synonyms for “bay," such as “basin" and “harbor."  Let's see if I can do better.

I'm going to start out by looking at how bays are named in the real world.  As I did previously with mountains, I'm going to analyze real world naming data to see what kinds of words are used in bay place names.  In my previous effort, I analyzed place names data from the USGS.  Since then, I've discovered another source for real world place names:  geonames.org.   This organization provides a worldwide geographical naming database under a Creative Commons license.  Like the USGS database, names are labeled with a feature class that I can use to extract out all the “bays".  I can then run some lexical analysis on these names to help guide how I create my fantasy place names.

For a first step, I'll run all the names through and look at the last word, which is usually “Bay" or a synonym.  Here are the top 20 results:
1. Bay: 9482
2. Cove: 6674
3. Harbor: 1280
4. Creek: 904
5. Bayou: 850
6. Inlet: 404
7. Pond: 363
8. Lake: 302
9. Arm: 297
10. Sound: 291
11. Lagoon: 281
12. Hole: 228
13. Canal: 216
14. Bight: 204
15. Pass: 169
16. Basin: 167
17. Slough: 134
18. Anchorage: 90
19. River: 84
20. Eddy: 59
There is some duplication here because many of the same features appear in both the USGS data and the Geonames data.  Some notes:
• “Bay" and “Cove" are by far the most common names.
• “Harbor" is an interesting synonym, because it presumably applies only to bays where people have created a harbor.  For Dragons Abound purposes, this suggests only using it where a city is on the bay.  “Anchorage" also implies a human use to the area, but in that case it doesn't imply a city.
• “Creek," and “River" are odd synonyms to apply to a bay.  Looking at some examples in Google Maps suggests that these are places where a creek/river empties into a larger body of water and the area has been identified as a bay:

which I think means I can safely ignore these names for my purposes.
• Something similar is happening with “Pond" and “Lake"; these seem to be mostly ponds and lakes connected to some bigger body of water.  I'll ignore these too.
In the end, I have about 40 synonyms for “bay", most of which will be used very infrequently.

Next, let me look at the nouns that I have been used in bay names, but not as the last word of the name:
1. island: 269
2. creek: 215
3. lake: 181
4. port: 180
5. point: 159
6. saint: 118
7. arm: 111
8. sand: 91
9. river: 85
10. goose: 81
11. rock: 77
12. mud: 75
13. mill: 70
14. harbor: 70
15. oyster: 64
16. spring: 59
18. hill: 48
19. house: 46
20. beach: 45
You can see some interesting trends here.  There are a lot of bays named after other (presumably nearby) geographical features: islands, creeks, lakes, points, arms, rivers, springs, heads, hills and beaches.  Harbor and its synonym port show up again.  Terms for earth are common:  sand, rock, and mud.  Animals, too:  goose and oyster.  An interesting term is Saint -- many geographical features are named after saints.

I looked at about the hundred most common terms and tried to break them down by category:

CategoryTerms
Geo Featuresisland, creek, lake, point, arm, river, spring(s), head, hill, branch, pond, hole, brook, fork, finger, key, neck, wash, sea, bluff, isle, mountain
Manmade Featuresport, mill, harbor, house, shelter, camp, boat, garden, yacht, castle, fort, town, sawmill, schooner
Personsaint, squaw, king
Natural Itemssand, rock, mud, salt, water, stone, boulder, granite
Animalsgoose, oyster, eagle, fish, deer, turtle, bass, horse, seal, fox, otter, alligator, cow, clam, beaver, mosquito, herring, swan, buck, bird, bull, salmon, pigeon, sheep
Plantswillow, oak, pine, tree, hickory, stump
Mischammock, devils, mile, greens, basin, drum, echo, sunset, tar, winter, paradise, dollar, crescent

Almost all of the terms fall into just six categories.  Of course, as you go further down the list into less common terms they become more diverse, but I can take this as a starting point to elaborate a more complete set of possible nouns to use in naming bays.

It would be nice if there was a simple automated system for elaborating these categories, but there's not.  It's mostly a matter of grunt work to research and capture terms, and to try to decide along the way how much categorization is worthwhile.  For example, to elaborate the “animals" category, I have to go to various websites with lists of animals and decide which ones to include, and whether it is worthwhile to categorize them as land versus sea (yes), whether a “pelican" should go into land, or sea or air (maybe both sea and air?) and whether insects should be included (no).  Eventually I end up with a list of land animals:

 Aardvark Bushbaby Cougar Gila Monster Jaguar Mule Deer Python Tapir Anaconda Bushpig Cow Giraffe Kangaroo Muskox Rabbit Tiger Anteater Caiman Coyote Glider Koala Muskrat Raccoon Tortoise Antelope Calf Crow Gnu Lamb Ocelot Rat Turtle Ape Camel Deer Goat Langur Opossum Rattlesnake Viper Armadillo Capybara Dingo Gopher Lemming Oryx Rhinoceros Vole Aye-Aye Caracal Dog Gorilla Lion Otter Salamander Wallaby Baboon Caribou Donkey Grizzly Lizard Ox Scorpion Warthog Badger Cat Dormouse Hare Lynx Peccary Sea Lion Water Moccasin Bandicoot Cattle Duiker Hartebeest Macaque Pig Sheep Weasel Bear Chameleon Elephant Hedgehog Marmot Pika Skink Whale Bison Cheetah Elk Hippopotamus Marten Platypus Skunk Wildcat Boa Constrictor Chicken Emu Hog Meerkat Polecat Sloth Wildebeest Boar Chimpanzee Ermine Honey Badger Mink Pony Snail Wolf Bobcat Chipmunk Ferret Horse Mole Porcupine Snake Wolverine Box Turtle Chuckwalla Fox Hyena Mongoose Porpoise Spider Wombat Buck Civet Gazelle Iguana Monkey Possum Spoonbill Woodchuck Buffalo Cobra Gecko Impala Moose Prairie Dog Squirrel Woodrat Bull Constrictor Gemsbok Jackal Mouse Pronghorn Stallion Yak Cottonmouth Jackrabbit Puma Zebra

After completing this, it occurred to me that I might like to have a separate list of polar animals, so that I can use more appropriate names when adding bays in cold climes.  So I went back and broke out the list of polar animals.

 penguin tern moose northern pike whale wolf muskox salmon seal caribou narwhal smelt lobster reindeer ptarmigan herring polar bear ermine puffin cod snow fox stoat snow goose whiting snowshoe hare lemming walrus flounder

So now in a snowy clime I'll get “Penguin Cove" instead of “Camel Cove."

If this sounds like a lot of work, it is.  But for something like this, a shortcut is not going to produce a good result.  In fact, I'm probably not doing *enough* work on this; a really good solution would capture more semantic information about the words and use that to construct the place names.

An initial pass generates a few thousand possible nouns to use in bay names, and generates names like these:
1. Swirl Cove
3. Annihilation Sound
4. Illuminator Bay
5. Fool Bay
6. Silk-Dresser Cove
7. Monastery Bay
8. Moneyer Bay
9. Beryl Bay
10. Table Bay
You can see that there are some archaic terms in the list that you might not recognize.

A simple extension is to flip a name like “Swirl Cove" into “Cove of Swirls."  Note that in most cases I have to pluralize the noun for this to sound right -- but not mass nouns like “mud" or abstract nouns like “annihilation."
1. Shale Bay
2. Cove of Chamberlains
3. Bay of Furriers
4. Gum Bay
5. Pocket of Nakerers
6. Rogue Cove
7. Bushbaby Hole
8. Mud Cove
9. Brewer Bay
10. Orca Cove
Another variation for nouns that represent individual people is to use the possessive, e.g., “Rogue's Cove" or “The Rogue's Cove."  I can also use an invented name for the individual, and I can add a title to that individual as well (as I did when naming the Lost Coast).
1. Cartier Bay
2. Oilskin Gully
3. Viper Cove
4. Steward Bay
5. Dolphin's Bay
6. Purse Maker Bay
7. Vicar Daedoe's Prong
8. Catchpole's Bay
9. Reverend Beedae's Cove
10. Duednoeb's Cove
Interestingly, it sounds fine to use animal names where individual names work, which is why I get “Dolphin's Bay" above.

Another way to name a bay is with an adjective, such as San Francisco's “East Bay" or “Long Bay" in Myrtle Beach.  To get an idea of what sort of adjectives I can use to name bays this way, I'll go back to my corpus of US and UK bay names and pull out all the adjectives that are used this way (as best I can).  Here's the top ten:
1. North
2. West
3. Long
4. Big
5. East
6. Little
7. Grand
8. Middle
9. Hollow
10. Indian
11. Sandy
12. Horseshoe
13. Browns
14. Great
15. Round
16. Blind
17. Flat
18. Twin
19. Finger
20. Blue
Directions are very popular.  (South is missing, but that seems to be a problem with my Parts of Speech classifier not thinking it is an adjective.)  Sizes are also popular: Long, big, little, grand, great.  A little further down are shapes:  Hollow, horseshoe, round, flat, twin, finger.  “Browns" is a little odd; I think that's a possessive with a missing apostrophe:  “Brown's Bay."  At number 20 we get an actual color -- a bit of surprise to me because I thought colors would be more popular.

One note about direction and size adjectives -- I probably don't want to use these randomly.  A “northern" bay ought to be to the north in some way, and “great" bays should be larger than average.  I'll talk about implementing that next time, but for the moment I'll just collect these adjectives and set them aside.
1. Admiral dem Gen's Bay
2. Chokeweed Bay
3. Indigo Pass
5. Deserted Bay
6. Pufferfish's Bay
7. Damnation Bay
8. Catfish Cove
9. Quiet Bay
10. Glassy Cove
I've turned up the frequency to make the adjective names more common, but you see here “indigo", “paradise," “deserted,", “quiet," and “glassy" as examples.

The next option is to use both an adjective and a noun to name the bay, such as “Blue Dolphin Bay."  In this case, the adjective modifies the noun, not the bay, so a different set of adjectives is needed.  In theory, the adjective should be something that makes sense with the noun, but in practice the human mind is so good at conjecturing connections between an adjective and a noun that it's almost difficult to come up with an example that seems totally wrong.  For example, “Virgin Dolphin Bay," “False Dolphin Bay," and “Barbarian Dolphin Bay" are all cases where the adjective doesn't really fit with the noun, but seem like acceptable place names.  I'm sure it's possible to come up with combinations that totally don't work, but my point is that you don't have to be as careful about this as you might imagine.  However, you do have to avoid abstract nouns.  Something like “Blue Infidelity Bay" is nonsensical enough to trigger even the most forgiving reader.

An initial cut at adding these sorts of names gives me (again, with tweaked probabilities) this list:
1. Sore Brigantine Bay
2. Imashosh's Bay
3. Dead Raft Bay
4. Lone Cedar Bay
5. Dukdush's Cove
6. Broom-Dasher Bay
7. Cove of Hermits
8. Dukuklen's Bay
10. Parson Prong
Here “Sore Brigantine" doesn't make much sense (a brigantine is a two-masted sailing vessel) and “Dead Raft" isn't much better, but I think they would pass on a map.  (“Lone Cedar" on the other hand is nearly perfect.)

That's it for the first part of naming bays.  The lists of nouns and adjectives can be expanded near endlessly, but I have about 3000+ in the vocabulary right now, which makes for plenty of different names.  In the next posting I'll get into some of the less random naming strategies.

Obligatory map shot:
Limner's Bay," “Elk Bay," and “Penny Firth."

Wednesday, June 20, 2018

The Naming of Places (Part 4): Using a Tool

One thing I learned in working on “Lost Coast" names in the last posting was that treating words as lists of strings is very limiting.  For example, I added some common monster names to the words I was using, and I could use those in various ways:  “The Vampire Coast," “The Coast of Vampires," “The Vampire's Graveyard," and so on.  But I ended up just duplicating those names in different templates.  It would be nice if I had a way to say “Include all the monster words here."  Likewise, I sometimes need the plural form of a word, and I had to add that in manually.  So it seems like it's time to consider using a language generation tool.

One of the main features I need is to pick a word randomly from a class of equivalent words, as for example when I select a word that means “coast."  It would be nice to have a tool that provides built-in word categories that would solve all my needs.  Some tools (like WordNet) provide word classification into various categories, and also provide lists of synonyms.  Unfortunately, these tools usually categorize words by parts of speech (POS), e.g., noun, adjective, verb, etc., which isn't useful for me.

Some tools also provide synonyms, but they're not usable without manual editing.  For example, WordNet synonyms for coast include “seashore" and “lakeshore", neither of which work well in labeling a coastline.  And there's no way for WordNet to know that in this case, “cliff" is an acceptable synonym for coast, even though those words do not mean at all the same thing.  There are shades of meaning in words that are difficult to capture and use in a language library.

So I don't think I'm going to find a tool with built-in categories and synonyms that I can use without modification.  Instead, I'll need a tool that supports creating my own categories, e.g., to be able to say “words to use in labeling a coastline are coast, shore, strand, bank, ... etc."

I also want to be able to weight the choices in these synonym lists to indicate that some choices are more common than others.  Quite a few of the tools I've looked at only support choosing randomly (with equal probability) from a list.  I can work around that limitation by repeating the same word multiple times in the list, but if I want a word to be 100 times as common as another word, that requires a lot of repeats.

Another feature I'd like is for the tool to be able to pluralize (and conversely, singularize) words for me, so that I don't have to enter each word as both a singular and a plural.  I can use this to switch between “The Ogre Coast" and “The Coast of Ogres" easily.  I think there will still be some cases where I have to explicitly indicate a plural or a singular but in many cases I could rely on the tool to create the appropriate plural / singular.

There are also a number of common features I don't need.  I'm not building interactive fiction (IF) so I don't need any facilities for interacting with the user.  Some tools have ways to remember a choice and use the same choice later, so that you can (for example) pick the name “Pete" in one place and then use it throughout a long text.  I can think of some cases where I might want to do this, but in general it's a feature I can live without.  I also don't need to parse text, assign parts of speech to words, or similar input-oriented tasks.

With all that in mind, I think my best choices come down to Tracery and RiTa.

Tracery is focused on text generation, and has a simple, clean format for expressing a grammar and some nice built-in features like capitalization.  On the negative side, it lacks any way to do choice weighting.  (There exists a fork of Tracery that adds choice weighting (and much more!) but it is written in Swift rather than Javascript.)  In fact, Tracery does a “shuffled deck" selection when making choices, meaning it runs through all the choices (randomly) before repeating itself.  Since I want some place name choices to repeat (e.g., I want to use “coast" much more often than “bracks"), this won't work for me.

RiTa is a more general toolkit that provides features like analyzing text, conjugation, stemming, Markov chains, and more.  The grammar format is similar to Tracery, but lacks a number of features (like capitalization) that Tracery provides.  However, it does provide choice weighting.  On the negative side, it's much bigger (about 10x) than Tracery, but it has some reduced versions and I suspect I can make do with the smallest.

For the moment, at least, I'm going to work with RiTa.

Both Tracery and RiTa do generation with context-free grammars.  If you aren't familiar with context-free grammars, they consist of rules that look something like this:
<coast> => coast | shore | banks
This rule means “Wherever you see the symbol <coast>, replace it with the word coast, or shore or banks.'  You can chain these rules:
<lost> => lost | forgotten | accursed
<name> => The <lost> <coast>
Together, these rules say that you generate a <name> by generating the word “The" followed by whatever the <lost> symbol generates and then whatever the <coast> symbol generates.

(The “context-free" part just means that you can only have one symbol on the left side of a rule.)

As you may remember, I can also name a lost coast after a monster, e.g., “The Zombie Coast".  With a grammar, I can now separate out the monster names:
<coast> => coast | shore | banks
<lost> => lost | forgotten | accursed
<monster> => Zombie | Kobold | Orc
<adj> => <lost> | <monster>
<name> => The <adj> <coast>
The <adj> symbol can now expand into one of the synonyms for lost, or a monster name.

Whenever the grammar engine has to make a choice, it chooses randomly among the options.  So in this case, “coast", “shore" and “banks" are all equally likely names.  If we want “coast" to occur more frequently than the other choices we can (in RiTa) add a frequency to the choice:
<coast> => coast[5] | shore | banks
which says (in this case) to pick “coast" five times out of seven.

That works well to intentionally adjust word frequencies, but random choice can cause a more subtle problem.  Consider, for example, this rule:
<adj> => <lost> | <monster>
50% of the time this rule will use a “lost" adjective, and 50% of the time it will use a monster name.  But in my case, I have 325 synonyms for lost, and only 33 monster names!  I really want to pick an adjective equally from that whole pool.  I can fix this by adding a frequency to this rule:
<adj> => <lost>[10] | <monster>
but this requires me to count the choices in each category, calculate the ratios, and keep all this up-to-date as I add new terms and options.  That's error-prone, and gets complicated when there are multiple levels of rules.

These sorts of rules represent composition rather than choice -- we'd really like to have some syntax like
<adj> => <lost> & <monster>
to indicate that the grammar engine should compose the two lists together before choosing.   Neither RiTa or Tracery seem to have this capability.  Maybe I'll add that, but in the meantime I'll use a workaround of defining <lost> and <monster> in Javascript and combining them when I create the rule:
let lost = 'lost | forgotten | accursed';
let monster = 'Zombie | Kobold | Orc';
[...]
<adj> => lost + ' | ' + monster
Apologies for the psuedocode mish-mash, but I hope you understand what I mean.

Another thing I need to do in name generation is to insert the result of a Javascript function call.  For example, I can name a lost coast after some (imaginary) person:
Mesh's Boneyard
In these cases, the name of the imaginary person is generated by Martin O'Leary's place name generator, with a call that looks like this:
Language.makeName(world.lang, 'person')
So if I want to be able to generate a name like this, I need a way to tell the grammar “Hey, at this point go off and execute this Javascript and use the result."  This is called a “callback", and in RiTa this works by enclosing it in backticks:
Language.makeName(world.lang, 'person')'s <coast>
There's a lot of  quoting going on there, but the important part is that the call to Language.makeName() is inside backticks.  When the grammar engine evaluates this rule, it knows to pull out that bit of code, run it, and put the result back into the rule.

It turns out that this isn't as straightforward as it looks.  Without getting too technical, every bit of code executes in a context that represents all the other code and definitions around it.  In this case, code was written in one context but gets executed in a different context.  This creates no end of problems.  For example, in the rule above, “Language.makeName" isn't defined in the context where the code actually gets executed, and so the callback fails.

RiTa has a solution for this problem, but it isn't very good.  I patched my copy of RiTa with a better solution (and provided that back to the RiTa authors) so my callbacks work as I expect.

RiTa has a number of ways to actually write a grammar, but I'm using a JSON format.  Here's what the core of the Lost Coast naming rules look like:

     // The Lost Coast
'<lc1>': 'The <lost> <coast>',
'<lc2>': 'The <coast> of <lost2>',
'<lc3>': "The <sailor>'s <negcoast>",
'<lc4>': "Language.makeName(world.lang, 'person')'s <negcoast>",
'<lc5>': "<noble> Language.makeName(world.lang, 'person')'s <negcoast>",
'<lc6>': "<admiral> Language.makeName(world.lang, 'person')'s <negcoast>",
'<lc>': '<lc1>[10] | <lc2>[5] | <lc3>[3] | <lc4>[3] | <lc5> | <lc6>',

<lc1> through <lc6> are the basic patterns for different Lost Coast names (as described in the previous posting).  In <lc4>, <lc5> and <lc6> you can see callbacks to Martin O'Leary's name generator.  The last rule sets the proportions to use the various forms, “The Lost Coast" being the most common and forms like “Admiral Dyg's Boneyard" being fairly uncommon.

There are a couple of functions I think I might eventually want to use that aren't in RiTa or Tracery.

One is the capability to remember and reuse a name across different runs of the grammar.  For example, I create “Admiral Dyg's Boneyard" I might want to name a nearby rocky point “Admiral Dyg's Folly."  Or if I name a mountain “Black Rock Peak" I might want to name a nearby city "Black Rock Town."  I'm not exactly sure the best way to do this yet.

Another is the capability to adjust the distribution of some terms on a per-map basis.  For example, I might have a distribution for the names of bays:
'<bays>': 'Bay [10] | Cove [5] | Basin [3] | Bight | Estuary'
Mostly I want to use “Bays" and “Coves" and rarely “Bight" or “Estuary".  This reflects the relative proportions of those names on today's maps.  But maybe on this map, bays are mostly called “Bights" and only occasionally other names.  Right now I don't think there's a way to write a “meta-rule" to figure out the distribution within another rule.

Next time I'll start to work on bay names.

Thursday, June 14, 2018

The Naming of Places (Part 3): The Lost Coast

I'm still not entirely sure about my approach to generating place names, but I'll start off by working on place names for “The Lost Coast."  The idea with the Lost Coast was to create a stretch of flat, empty coastline on the map and then give it an evocative name to create a little mystery.  So far I've just worked on creating and labeling a spot, but I've just been using “The Lost Coast" label for all of them as a placeholder.
I'd like to generate a name that indicates that this stretch of coast is empty for some mysterious or negative reason.  It's filled with poisonous vapors, perhaps, or frequented by pirates, or ... well, use your imagination.  Place names tend to be short and pithy -- names like “Pirate Island" and not like “The Small Island of Cimmerian Pirates and a Mountain With Mysterious Caves," so to start with I'll just try to create names in the form “The <Adjective> Coast," where the adjective phrase is just a single word, e.g., “The Lost Coast," “The Empty Coast", etc.  (And I'll elaborate further as I go.)

This is a little more focused than naming something like a mountain range, because in this case I know I want a name that conveys a specific meaning or mood, so the choice of names is much more constrained and manageable.

The first step is to collect appropriate adjectives.  To do this, I start with “lost," “empty" and some other appropriate adjectives that occur to me and then I use Thesaurus.com to find appropriate synonyms.  And I repeat that with all the synonyms I find, and so on.  Altogether, that takes about four hours.  I also add in a list of common fantasy monsters (e.g., “The Troll Coast").  I don't go looking for archaic terms because at this point I have almost 350 terms:

 forgotten bitter trackless cursed heartless burning damned bloodthirsty flaming doomed demonic smoldering accursed malevolent festering blasted monstrous putrefying infernal craggy rotting unlucky jagged bubbling dead fatal reeking unfortunate perilous stinking empty treacherous malodorous barren menacing fetid arid savage rancid deserted vicious mephitic desolate barbarian leaden abandoned primitive afflicted forsaken savage grieving godforsaken feral futile destitute uninhabited sterile forlorn lethal impassable lonely toxic merciless solitary pestilent unforgiving dreary grisly cutthroat lonesome terrible harsh ruined hideous acrid friendless uncharted hard dismal unmapped adamantine backward undiscovered callous gloomy unexplored ruthless miserable untraveled killing wicked unseen slaying stony secret diabolic desperate enigmatic nefarious tragic cryptic criminal wretched arcane outlaw hidden unknowable bandit bleak delphic raider somber sybilline buccaneer windy augural corsair black fatidic picaroon funeral vatical rogue melancholy incomprehensible blackguard mournful untold charlatan weary uncounted trickster ravaged masked freebooter ghastly cloaked highwayman murky clandestine privateer dark concealed robber doleful enshrouded fugitive dolorous tainted marauder hopeless infected berserker shadowy defiled brigand sepulchral spoiled pariah caliginous emaciated leper wailing harsh pirate abominable weather-beaten rover nefarious weatherworn satanic profane lightless sinister worthless unlit foreboding cruel frigid unlucky rocky frozen ominous dangerous icy horrible violent algific fiendish wild shivering teratoid deadly wintry bloody dire brumal blood-soaked grim hiemal bloodstained unknown torrid blood-spattered shrouded blistering suicidal veiled scorching malefic blighted sweltering louring gaunt searing rabid windswept austral ancient tenebrous blazing primeval stormy broiling primordial starless scalding baleful stygian steaming mephitic devastated recalescent abhorrent pillaged desecrated frightful plundered looted horrid shattered stolen false spoiled purloined deceptive wasted poacher's Basilisk ashen broken Centaur pallid blasted Chimera uncanny cracked Cockatrice foggy fractured Djinni misty crippled Dragon glowering splintered Dwarven impenetrable corrupted Elvish smoky gray Gargoyle nubilous cinereal Gnoll cimmerian drab Gnome sunless muddy Goblin dolent dusty Gorgon woebegone sallow Griffon impossible ashy Hippogriff deathly colorless Hobgoblin weeping faded Hydra howling eerie Kobold groaning unexplained Manticore growling unnamed Medusae shrieking nameless Minotaur keening mysterious Mummy bellowing occult Ogre whimpering mystical Orc crying orphic Skeleton screaming acroamatic Spectre barking unearthly Ent dreadful ethereal Troll godless haunted Vampire unhallowed tormented Wight pagan vaporous Wraith heathen impassable Wyvern unholy pathless Zombie

These aren't in any particular order, but they're all “reasons you wouldn't want to be here" in some sense or another.  Some are very specific, such as “deathly" or “blood-spattered" but others only hint at why the coast is uninhabited:  “eerie", “shrieking" and so on.  It was a lot of work to put this list together, and I can see that it's going to be a lot of work if I have to do this for every type of place name.  On the other hand, I'm not sure I have any better approach available.  For cases where I have a list of example place names (e.g., a list of river names in the US), I can take advantage of that to generate a list of candidate names.  But I don't have a good list of example place names for “Lonely Coasts" and I found in going through the synonyms that I had to apply a lot of judgement to pick and choose among the synonyms.  So maybe this is the best approach available.

It's also apparent to me that if these words were annotated with meaning, it would make it easier to re-use the lists.  For example, suppose I want to create a “Lonely Mountain" as in The Hobbit -- a single mountain in the middle of a plain -- and give it a place name.  The words in the Lost Coast list that have some semantic connection to the concept of “lonely" could be reused to name the mountain -- “The Abandoned Mountain," “The Forgotten Mountain," and so on.  An initial approach might just be to sort the words into categories of this sort.  But some of the words that have nothing to do with loneliness seem suitable as well -- “The Haunted Mountain," for example.  So perhaps it's just going to require a lot of work and a custom approach for each place name.

I also want some variation in the “Coast" part of the place name, so I go through a similar process to collect synonyms for coast.  There are far fewer of these -- really, in common usage the only synonyms are “coast" and “shore", although “strand" and “bank" are probably familiar to many people.  For some variety, I can throw in archaic forms and words for “cliffs" -- even though on my maps these areas are not usually cliffs.  That gets me this list:

 coast foreshore bracks shore tidewater bluffs shores seaside scarps strand shoreside glint bank seaboard ledra banks cliffs staithe slakes cleo brink outland cleeves rivage warth cloughs verge wash heughs

Some of these words are fairly obscure, but context on the map should make the meaning obvious.  Note that in this list, I have to manually decide whether using the singular, the plural or either makes sense.  For example, “shore" can be either (e.g., “The Lost Shore" or “The Lost Shores") but “coast" only works in the singular, and “bluffs" only works in the plural.

Now I can create a new label by selecting randomly from each list:
The Wraith Coast
The Violent Verge
The Grim Coast
The Bandit Bank
The Baleful Coast
The Delphic Bank
The Deceptive Slakes
The Primitive Warth
If the modifier in the Lost Coast name is (or can be) a noun, then an alternate form of naming is to say “The Coast of <noun>."  For example, another way to say “The Wraith Coast" is “The Coast of Wraiths."  This variant adds some more variety to the naming, but notice that you cannot just swap the words around between the two forms.  For that reason, I've set up a second list of adjectives specifically for this form that (mostly) don't overlap with the first list of adjectives.
The Kobold Strand
The Dark Rivage
The Strand of Blasphemy
The Tidewater of Lost Hopes
The Seaside of Chimeras
The Bank of Monsters
The Brumal Strand
The Shore of Manticores
This tends to produce too many obscure terms for “coast".  I need to weight the “coast" terms so that the more common usages get picked more frequently than the obscure ones.  Because of the size and diversity of the adjectives list, it's not really a problem to pick among all those alternatives equally.  But the “coast" list is much smaller and some of the synonyms are quite obscure so I probably don't want to use them too often.  The weights may take some tweaking, but here's a weighted sample:
The Sweltering Coast
The Bank of Lost Hope
The Louring Shore
The Devastated Shore
The Pagan Coast
The Unholy Shores
The Coast of Griffons
The Cleeves of Kobolds
The Bluffs of Centaurs
The Coast of Orcs
The list is now much heavier on the common terms like “coast" and “shore."

The Coast of Vampires, in ti Numru Bay, Zomle Zo

Normally, a generic geographical feature might be labeled with someone's name, like “Jason's Point" or “Frederick Mountain."  That's not something that's too commonly done with coastlines (in truth, coastlines are rarely named per se), but at any rate it's not something I can do here because a name like “Jason's Coast" doesn't convey any mystery or explanation about why the coast is deserted.  I'm relying upon the adjective portion of the place name to convey that information.

However, it occurs to me that perhaps I can use the noun portion of the place name to provide this meaning.  For example, I could call the coastline “Jason's Graveyard" and provide some of the same atmosphere as names like “The Forgotten Shores."  The noun I use has to have a negative connotation and designate a place, but it doesn't have to be explicitly linked to the coast, because the placement of the label should indicate to the reader that it is the coastline that is being labeled.  So something like “Graveyard" should work.  I could probably get a bit further afield with phrases like “Jason's Calamity" but that starts to sound too much like I'm labeling a specific event that happened in that location.

For the proper name part of this place name I use the existing fantasy name generator, to get names like “zum Nassir's Graveyard."  I can use the proper name by itself, but I can also add a title to the name (like “Commodore") to give it more of a naval feel, or I can use a noble's title (like “Baron") to imply that this area is famous and named because of a bad thing that happened to an important man here.
Obviously these names can get quite lengthy, so I'm going to have to go back and add some logic to shrink the size of the label when the name gets long.

But I don't have to use a proper name as the first part of this place name.  I can use a generic name, like “The Sailor's Graveyard."  (Note that I have to add “the" to the start of the place name when I'm using a generic noun.)  Here I'll want to use names related to sailing, such as “sailor," “mariner," “pirate," and so on to reinforce the image I'm trying to convey.

Here's a representative sample of the possible names:
The Colorless Coast
Retgres's Litten
The Uncounted Coast
The Cutthroat Coast
The Monstrous Bank
Commander Lozchem's Necropolis
The Weatherworn Coast
The Cryptic Bank
The Shipwrake Shore
The Hobgoblin Cliffs
I that makes a nice variety of names, but I'll continue to be on the look out for new possibilities.  (And if you have any ideas, please suggest them in the comments!)

Friday, June 8, 2018

The Naming of Places (Part 2): Some Resources

I've been contemplating the problem of naming places for some time and have gathered some Internet resources that I will document here since they may prove useful for other developers.

Previously I wrote about exploring the names of U.S. mountains using data provided by the US Board on Geological Namesis a federal agency founded in 1890 to keep track of the official names of geographic features within the United States and makes that data publicly available.  So that's a good source for finding out about how things are currently named (at least in the US).  I used that data to develop, for example, a list of the most common synonyms for mountains:
1. Hills: 1617
2. Mountains: 1239
3. Range: 479
4. Buttes: 432
5. Peaks: 319
Another resource I talked about in that posting was the Natural Language Toolkit.  This is a Python library that provides a number of handy natural-language processing functions, including classification by part of speech.  This is useful when you're gathering names so that you know whether something is a noun or an adjective or something else.

Procedural language generation is something a lot of people dabble with, so you can find lots of examples and code with a little Googling.  It also shows up on /r/proceduralgeneration occasionally. There are also lots of name generators intended for use with fantasy role-playing games, although the quality is hit-or-miss and you may have to dig into the web page source code to figure out how the names are being generated.

A common approach for name generation is to use a Markov chain.  The idea is to look at a large set of existing names (say, English town names) and determine how often syllables (or letters) follow other syllables (or letters).  For example, in “Birmingham," the syllable “Bir" starts the name, it is followed by “ming" and then “ham" ends the word.  You'd find in English town names that “ham" is a quite frequent ending.  Once you've gathered up all these statistics you can then use them to create new town names that follow the same statistical patterns:
1. Barkingham
2. Basingham
3. Birkenham
4. Bebingham
5. Bollingham
and so on.  Code is available to do this in many (computer) languages.

Markov chains are somewhat old school; the new approach is to train a neural network.  And of course someone has done that.  (Actually, several people have done that.)  The results sound a lot like the names generated by the Markov chains, at least to me.

For these approaches, you need a corpus of existing names.  For English town names, you can find some lists online.  For a more comprehensive list, you can consult the Dictionary of British Place Names, although that might require you to have an account, and is not conveniently organized for name generation purposes.  Darius Kazemi has a Github project that collects corpora (lists of words) and it has a list of English town names.  If you want to do names that sound “Chinese" or “fantasy," you're going to have to find a bunch of examples to feed the Markov chain -- the more the better.

For me, these approaches aren't that useful, because I already have Martin O'Leary's code to invent names.  However, it's something to keep in mind if I ever want to generate names that specifically sound like other names, e.g., if I want to generate fantasy maps with names that sound like English towns.

One thing I will need to do is to find synonyms and similar words.  When I'm naming a mountain range, I'd like to know all my options for saying “mountain."  One good source for this is a thesaurus, of which there are several online.  Another good source is Wordnet, which is a lexical database of words that has a great deal of knowledge about the semantics of the words.  Wordnet can give you direct synonyms, related synonyms and even sister termsConceptNet is similar to Wordnet but perhaps a bit higher-level, with a graphical interface to explore concepts like mountain.

Those resources are good for finding modern synonyms, but for fantasy maps it's nice be able to sprinkle in some archaic and medieval terms for flavor.  These types of synonyms are much harder to find.  One resource is the Oxford Historical Thesaurus.  This is a companion volume to the massive Oxford English Dictionary (the OED) which provides a historical timeline of synonyms.  Using this, you can determine, for example, that around 885, the word “barrow" meant mountain, even though today's dictionaries will tell you it means a grave mound.  The Oxford Historical Thesaurus online requires a login, but your library card may get you access (mine did).

It might be that I can get by with synonyms and some simple patterns, but if not, there are some procedural text generation packages available for Javascript that offer additional capability.  Some that look promising are rant.js (which is a Javascript port of the Rant library), RiTa.js and Tracery, which started as a 2104 Procedural Jam entry.  These libraries let you specify text as patterns (grammars) and then process the patterns to create text, as in this example (from the rant.js page):
var rant = require("rantjs");
var sentence=rant('<firstname male> likes to <verb-transitive> <noun.plural> with <pron poss male> pet <noun-animal> on <timenoun dayofweek plural>.');

console.log(sentence); // 'Sean likes to chop parrots with his pet cat on Saturdays.'
Note that with these sorts of tools you can specify various features of the placeholders and the engine will pick a word that fits those features, e.g., in the above example where the author uses “<firstname male>" to pick from only male names.  These typically also include some useful features like pluralizing words.  Each package has differently capabilities, so the choice might depend upon which capabilities you need most.

There are also a number of packages and tools for writing interactive fiction that combine generative text with user inputs or choices, such as Inkle's Ink.  I haven't focused much on these because I don't (at least at the moment) need to make my place name generation interactive.

Another good source for text generation resources are the resources threads for National Novel Generation Month (NaNoGenMo).  You can start in the thread for 2017, and it has pointers to the threads from previous years.  Most of the resources discussed there are probably overkill or not relevant for place name generation, but might be useful for folks doing more in-depth language generation.

Saturday, June 2, 2018

The Naming of Places (Part 1)

“Ged saw that in this dusty and fathomless matter of learning the true name of every place, thing, and being, the power he wanted lay like a jewel at the bottom of a dry well.  For magic consists in this, the true naming of a thing." -- A Wizard of Earthsea, Ursula K. LeGuin
There is no magic in this world -- or at least so far as I know -- but the naming of places on a map holds a kind of magic.  Take a featureless stretch of map and give it an evocative name:
And it magically becomes an interesting feature.

In this series of postings I'm going to be working a naming system for Dragons Abound.  Right now, Dragons Abound uses the naming system that Martin O'Leary developed for the fantasy map generator that inspired Dragons Abound.  This generates place names in invented languages, and many of them certainly sound like something you might find in a fantasy novel -- “on Genum" on the map above being one example.  There's a nice logic to this -- certainly we wouldn't expect some random fantasy world to speak English, so we wouldn't expect their maps to have nice English names.  Using all invented names helps maintain the consistency of the map.

On the other hand, there are some drawbacks to that approach.  On the little map excerpt above, I've added the English word “Point" to the name “on Genum" and that helps the viewer (you!) understand that the name belongs to that mass of rocks sticking out of the coast.  I could certainly have generated a word in the invented language to mean Point, e.g., “Nam" and used that.  On a big enough map with enough points you might even figure out that “Nam" means “Point".  But that's making the viewer do a lot of work.  And even on modern maps we don't generally do this.  For example, US maps label the lake on the border of Switzerland and France “Lake Geneva" even though its name in the local language is “le Léman."

And because the invented names don't mean anything, their ability to evoke interest is limited.  If I translate “The Lost Coast" into an imaginary language -- “Em Usu Eno" for example -- it may sound more “fantasy" but it also doesn't bring to mind the picture of a deserted shoreline.  If I use an English name and pick it correctly, I can be more descriptive and atmospheric.

Another reason to use English names is strictly personal -- I'm an English speaker and grew up reading the great English fantasy classics and playing Dungeons & Dragons.  And while those books had their share of invented names, they also had a backbone of vaguely medieval-English names that still mean “fantasy" to me.  That said, there's still room for a layer of invented fantasy names to lend an air of mystery and history to the map.

One advantage of an invented naming language is that it only has appear consistent; it doesn't have to actually make sense.  In an invented language, I can call one mountain range “The Na Ri Casto" and another one “The Lansa Casto" and you get the notion that “Casto" means something like mountains and “Na Ri" and “Lansa" are adjectives or descriptors or specific names.  But if I'm naming mountain ranges using English I can't just throw together a random adjective and random noun and call a mountain range “The Fallacious Arithmetic."  The words need to fit together and be an appropriate name for the map feature.  Even if I know enough to use “Mountains" as my noun, the sensible adjectives are not entirely obvious.  “The Hazy Mountains" seems reasonable, but “The Distinct Mountains" does not.  For some reason, “The Blue Cow Mountains" seems plausible, even though “Blue Cow" itself is nonsensical.  The need to create sensible names makes naming much harder in English than in some imaginary language.

My academic background is in artificial intelligence and specifically in natural language processing, so I'm well aware of how difficult it is to name things the “right way".  That would involve having some sort of model of the things to be named (mountains, rivers, islands, bays, points, etc.) as well as a deep understanding of how things get names (e.g., The Blue Cow Mountains named after a blue-gray rock formation that looks something like a cow when viewed from the pass to the Arid Plains).  Thinking about this, I am reminded of a stream near my home named “The Ten Mile Run."  It's named this way not because it is ten miles long, but because at one time it had ten mills along it's length.  Years later someone copied a map and wrote “Ten Miles" instead of “Ten Mills" and the typo somehow became the official name of the stream.

There are many other complications in naming things.  Many things are named after people or other place names, such as “New York" or “Smith's Landing."  These are generally “proper names" -- names that refer to a specific thing but have no meaning beyond that.  Names get borrowed as well, as for example with the many Native American names in the US.  And although we treat these as proper names, they often meant something in their native tongue, e.g., “Massachusetts" meant something like “Near the Great Blue Hill".

My hope is that by carefully picking patterns and vocabulary I can get good names most of the time.  I'll also rely on people's apophenia to see meaning where none exists and come up with an explanation for names that don't make obvious sense -- the way I did for “The Blue Cow Mountains."  There's a lot of room for creativity in naming.  I'll indulge some of that, but my goal for this first attempt is to generate wide variety of acceptable names.  I can always come back in the future and explore some of the more challenging ideas.