Wednesday, February 23, 2022

Map Compasses (Part 13): Lodestone

Welcome back to my blog series on implementing procedurally-generated map compasses!  In the last posting, I started implementation of Lodestone, a tool to execute procedural generation grammars.  So far I have designed the rule language and implemented a parser that outputs a dictionary of rules.  In this posting I'll develop the code to execute the rules and create procedural output.

First, let me quickly review how a procedural generation grammar works and outline the algorithm for executing the rules.  A grammar is a collection of rules which have non-terminals on their left-hand sides and an expansion on their right-hand sides:

<name> => <lost> <coast>
<coast> => coast | shore | banks
<lost> => lost | forgotten | accursed

So in this example, if we start from “<name>" the first rule would expand that into “<lost> <coast>" and the next two rules would expand <lost> and <coast> into a word choice, so we might end up with "Accursed Shore."

That explanation provides a general notion of the rule expansion algorithm, but let me describe it in more detailed terms before getting to the code.

The algorithm begins with the start non-terminal in a stack.  The (basic) algorithm then loops repeatedly over the following steps until the stack is empty:

  1. Pop the top element of the stack.
  2. If it's a non-terminal, find a rule that matches and push the right-hand side of the rule onto the stack.
  3. If it's something else, add it to the end of the output.
You can try this out in your head with the example above.  Note that this algorithm works from left to right, always expanding the left-most terminal and building the output from left-to-right as well.

The way the Lodestone dictionary is constructed, there is always only one rule that will match a non-terminal.  But the rule can have multiple choices on the right-hand side, so we need a step to select the right-hand side choice.

  1. Pop the top element of the stack
  2. If it's a non-terminal, find a rule that matches and: 
    • Randomly select one of the right-hand side choices for this rule
    • Push the selected choice onto the stack
  3. If it's something else, add it to the end of the output
By the time the rules are being executed, there are only two other elements that are non-terminals:  strings and embedded Javascript.  For a string, the action is to push it onto the output.  For a Javascript element, we need to execute it:
  1. Pop the top element of the stack.
  2. If it's a non-terminal, find a rule that matches and: 
    • Randomly select one of the right-hand side choices for this rule
    • Push the selected choice onto the stack
  3. If it's embedded Javascript, then:
    • Recursively expand the element
    • Execute the expanded element in Javascript
    • Add the return value to the end of the output
  4. If it's a string, add it to the end of the output.
Note that an embedded Javascript element can itself look like the right-hand side of a rule, using non-terminals and choices and so on, so we need to recursively run it through the same algorithm to get it down to a string that can be executed.

There are a couple of other nuances.  First, one-time non-terminals should only be expanded once.  We'll do this before starting the algorithm, and then use that value when we encounter a one-time non-terminal:

Pre: Iterate through the dictionary, expand all one-time non-terminals and save their values. 
  1. Pop the top element of the stack.
  2. If it's a one-time non-terminal, add it's value to the end of the output.
  3. If it's a non-terminal, find a rule that matches and: 
    • Randomly select one of the right-hand side choices for this rule
    • Push the selected choice onto the stack
  4. If it's embedded Javascript, then:
    • Recursively expand the element
    • Execute the expanded element in Javascript
    • Add the return value to the end of the output
  5. If it's a string, add it to the end of the output.
The last nuance has to do with selecting a right-hand side choice.  This is done according to the relative weights of the choices.  But weights (like embedded Javascript) can have values that look like the right-hand side of a rule.  So to evaluate weights, we'll have to recursively run them through the expansion algorithm just as we do for Javascript.

Pre: Iterate through the dictionary, expand all one-time non-terminals and save their values. 
  1. Pop the top element of the stack.
  2. If it's a one-time non-terminal, add it's value to the end of the output.
  3. If it's a non-terminal, find a rule that matches and: 
    • Recursively expand all the choice weights
    • Randomly select one of the right-hand side choices for this rule
    • Push the selected choice onto the stack
  4. If it's embedded Javascript, then:
    • Recursively expand the element
    • Execute the expanded element in Javascript
    • Add the return value to the end of the output
  5. If it's a string, add it to the end of the output.
The code itself is somewhat lengthy, so I'll walk through it piece by piece.
function executeRules(start, rules, context, otValues=null) {
    // For convenience, if a single string gets passed in we'll convert it to
    // a non-terminal.  Otherwise the start should be an initial stack of elements.
    let stack = Array.isArray(start) ? [...start] : [{type: "nterm", value: start}];
    let output = "";
    otValues = otValues || {};
    // Expand any unexpanded one-time rules
    for (let nt in rules) {
	// Is this a otnt with no value?
	if (nt[1] == "$" && !(nt in otValues)) {
	    // Put a placeholder in otValues so that we don't try to
	    // recursively evaluate this otnt.  Real values for otnts
	    // will be strings, so any non-string number will do.
	    otValues[nt] = NaN;
	    // Recursively evaluate nt.  Mark it with an illegal value
	    // before starting so that 
	    let val = executeRules([{type: "otnterm", value:nt}], rules, context, otValues);
	    otValues[nt] = val;
	};
    };
The function takes an initial stack (or starting element), the rules dictionary, a context (which is used to evaluate embedded Javascript) and optionally a dictionary of values of the one-time non-terminals.  The first step is to evaluate any unevaluated one-time non-terminals.  To do this, I iterate through the rules dictionary looking for rules with a one-time non-terminal on the left-hand side that doesn't already have a value in the otValues dictionary.  Since I'm going to call executeRules recursively to evaluate the right-hand side of this rule, I need to give the one-time non-terminal a value, otherwise the first thing the recursive call will try to do is evaluate the same one-time non-terminal.  The value from the recursive call becomes the value of the one-time non-terminal.

The next step is to start iterating through the stack.
    // Iterate the stack until it is empty or we hit a limit
    let count = 0;
    while (stack.length > 0 && count < 1000) {
	count++;
	const current = stack.pop();
	// If it's a one-time non-terminal that has been defined, add
	// it's value to the end of the output.  Otherwise it will fall
	// through and get evaluated.
	if (current.type == "otnterm" && otValues[current.value] &&
	                                  otValues[current.value] != NaN) {
	    output = output.concat(otValues[current.value]);
	    continue;
	};
I've implement a “guard" on this iteration that gives up after 1000 iterations.  It's easy to accidentally write rules that will never terminate, so this provides a little safety if that occurs.

The top element in the stack is popped off and then dealt with based upon its type.  The first type to be handled is a one-time non-terminal.  Assuming this has a value, that value will be pulled from the otValues dictionary and added to the output.  If it doesn't have a value or has the NaN placeholder value, it falls through to get evaluated as if it was a regular non-terminal.  (This is how one-time non-terminals get defined in the recursive call from earlier.)
	// If it's a non-terminal, find a rule that matches, select a choice,
	// and push the choice on the stack.
	if (current.type == "nterm" || current.type == "otnterm") {
	    // Select a choice from the available rules
	    let choice = getChoice(current.value, rules, context, otValues);
	    if (choice) {
		// Push choice on the stack.
		stack = stack.concat(choice.slice().reverse());
	    } else {
		// No choice was found, so treat the nterm as a string
		output = output.concat(current.value);
	    };
	    continue;
	};
The next part deals with non-terminals (and unevaluated one-time non-terminals).  The getChoice function selects a right-hand side from among the choices for this non-terminal.  The right-hand side is then pushed onto the top of the stack.  Note that since stacks in Javascript actually push and pop from the end of the array, we need to reverse the right-hand side before adding it to the stack.  We also copy the array with .slice() so that any changes we make don't inadvertently change the rule this choice was taken from.  If for some reason we encounter a non-terminal that doesn't have a rule, we just treat it like a string and add it to the output.

Before I go on to the rest of the engine, let's look at how getChoice works:
function getChoice(key, rules, context, preValues) {
    // If there's no rule for this key or no rhs, give up.
    if (!rules[key] || ! rules[key].rhs) return null;
    return selectFromRHS(rules[key].rhs, rules, context, preValues);
};
getChoice itself is just a front-end to selectFromRHS that checks to see that a rule actually exists for this non-terminal.  selectFromRHS is more complex:
function selectFromRHS (rhs, rules, context, otValues) {
    // If there's only one choice, return that.
    if (rhs.length == 1) {
	return rhs[0].elements;
    };
    // Otherwise we need to iterate through the possible choices
    // and add up all the weights to decide which to pick.
    let weights = [];
    let totalWeight = 0;
    for (let choice of rhs) {
	let weight;
	// Simplification, weight might just be a string or a number
	if (typeof choice.weight == 'string') {
	    weight = parseFloat(executeRules(choice.weight, rules, context, otValues));
	    // If weight isn't a number
	    if (weight == NaN) {
		console.log('Weight is NaN: '+weight);
		weight = 1;
	    };
	} else { ...
To start, for the common case where there's only one choice for the right-hand side of a rule, we return that choice.  Otherwise we iterate through all the choices, evaluating the weight for each choice and keeping track of the total as we go.  The weight might just be a string like “1" and in that case we can simply turn it into a number.  But the value of the weight might itself be the right-hand side of a rule:
	} else {
	    // Weight is itself a RHS, so we need to select from it and then
	    // expand the selection
	    const select = selectFromRHS(choice.weight, rules, context, otValues);
	    if (select) {
		const expanded = executeRules(select.slice().reverse(), rules, context, otValues);
		if (typeof expanded === 'string' || expanded instanceof String) {
		    // The string should parse to a number
		    weight = parseFloat(expanded);
		    // If weight isn't a number
		    if (weight == NaN) {
			console.log('Weight evaluated to NaN: '+choice.weight);
			weight = 1;
		    };
		} else {
		    console.log('Expansion did not return a string for '+choice.weight);
		    weight = 1;
		};
	    } else {
		// selectFromRHS failed
		console.log('Unable to select from '+choice.weight);
		weight = 1;
	    };
	};
In that case we need to recursively select one of the choices, execute the choice, and then turn it into a number.  If any of those steps goes wrong, we just set the weight to 1 and continue on.

In any case, once we have a number for the weight of this choice, we save it and add it to the total:
	// Save weight so that we don't evaluate it again during selection.
	weights.push(weight);
	totalWeight += weight;
    };
Now that we have all the weights and the total of the weights, we can randomly select one of the choices based upon the weights and return that:
    // Now figure out our selection.
    let selection = Utils.rand(totalWeight);
    for (let i=0;i<weights.length;i++) {
	selection -= weights[i];
	if (selection <= 0) return rhs[i].elements;
    };
    // Something likely went wrong
    console.log('Fell out of selection loop in getChoice');
    return rhs[rhs.length-1].elements;
};
If for some reason that didn't work, we'll return the last choice.

Now back to the main execution loop.  We've covered how non-terminals are evaluated.  Now we'll cover embedded Javascript:
	// If it's embedded Javascript, then:
	//    Recursively expand the element
	//    Execute the expanded element in Javascript
	//    Add the return value to the end of the output
	if (current.type == "jscript") {
	    // The value of the jscript is a right-hand side, so make
	    // a choice from there.
	    let choice = selectFromRHS(current.value, rules, context, otValues);
	    // Copy the value before reversing it onto the stack.
	    // Reuse otValues so we continue to use the one-time values.
	    const expanded = executeRules(choice.slice().reverse(), rules, context, otValues);
	    if (typeof expanded === 'string' || expanded instanceof String) {
		// Execute in the provided context.
		const execVal = String(context(expanded));
		if (typeof execVal === 'string' || execVal instanceof String) {
		    output = output.concat(execVal);
		} else {
		    console.log('Unable to coerce context value to String for '+expanded);
		};
	    } else {
		console.log('Expansion did not return a string for '+current.value);
	    };
	    continue;
	};
The value of the embedded Javascript is a right-hand side, so this looks much like the handling above for a weight.  We select a choice from the right-hand side, expand it to a string, then we evaluate it using the provided context function.  Finally we force the result of the evaluation to be a string and append it to the output.

The last possibility is a string element:
	// If it's a string, add it to the end of the output.
	if (current.type == "string") {
	    output = output.concat(current.value);
	    continue;
	};
In that case we simply add the string to the output.

If something falls through these cases (although nothing should!) then we'll simply treat it as a string and add it to the output.
	// Else complain but treat as a string
	console.log('Unknown element type "'+current.type+'" in executeRules.');
	output = output.concat(String(current.value));
    };
    if (count >= 1000) {
	console.log('executeRules hit the iteration limit.');
    };
    return output;
};
That's the end of the iteration loop.  When the stack has been emptied (or we've reached the 1000 iterations limit), we return the output.

Here's a simple rule set that tests all the capabilities:
<$pi> => `Math.PI`;
<start> => <first> [1] | <second> [`99+99`];
<first> => I am surprised this was selected!;
<second> => The value of pi is <$pi>;
Running this almost always produces:
The value of pi is 3.141592653589793
If you run this in the code, you'll have to look for this output in the Javascript console.  Speaking of which, the code to run this looks like this:
function test(svg) {
    // Create a Parser object from our grammar.
    const parser = new nearley.Parser(nearley.Grammar.fromCompiled(Lodestone));
    // Parse text
    parser.feed("<$pi> => `Math.PI`;\n\
                 <start> => <first> [1] | <second> [`99+99`];\n\
                 <first> => I am surprised this was selected!;\n\
                 <second> => The value of pi is <$pi>;");
    console.log('Parse is '+parser.results[0]);
    let rules = processRules(parser.results[0]);
    console.log('rules is '+rules);
    let result = executeRules('<start>', rules, s => eval(s));
    console.log('result is '+result);
};
It's rather irritating to have to escape all the newlines to enter a rule set, so next time I'll work on some code to read in the rules from a separate rules file.

Lodestone is now complete, and you can download it in the Part 13 branch from the Procedural Map Compasses repository  on Github to access all of this code. Part 1 has instructions, but the short version (on Windows) is to double-click mongoose-free-6.9.exe to start up a web server in the Part 13 directory, and then select the test.html link.  Obviously you can browse the code and other files, and make changes of your own.  You can also try out this code on the web at https://dragonsabound-part13.netlify.app/test.html although you'll only be able to run the code, not modify it. 


Suggestions to Explore
  • It's not currently possible to use a single quote in a Lodestone rule; the parser will always expect that to be the start of a quote string.  Likewise for a < sign.  How would you modify the Lodestone parser to allow those characters? 

Thursday, February 10, 2022

Map Compasses (Part 12): An Excursion into Tool Building

Welcome back to my blog series on implementing procedurally-generated map compasses! After implementing the Compass Design Language (CDL) that will be used to draw the compasses, and ruminating last time on procedural generation, I'm now ready to move to implementation of the procedural generation part of map compasses. The procedural generation's output will be descriptions of compasses in CDL, which I'll then pass to my CDL interpreter to draw:

For the generation of the CDL I will be using a generative grammar. Generative grammars look a lot like the grammar I wrote to parse CDL, but run “backwards" so that they generate output rather than parse input. You can think of generative grammars as replacement or expansion engines.  Typically you start with a single token that represents what you want to generate, and the grammar expands that into the final product.  For example, support you start with an input of  <name> and have this set of rules:

<name> => <lost> <coast>
<coast> => coast | shore | banks
<lost> => lost | forgotten | accursed

The grammar tries to find an expansion for “<name>" and finding the first rule, replaces <name> with “<lost> <coast>." The process then repeats. The vertical pipe symbol | in rules indicates random choices. So when the grammar tries to expand <lost> using the third rule, it chooses randomly from the three choices and (let's say) replaces <lost> with “forgotten" and similarly replaces <coast> with “banks" and at that point no further expansions are possible so the results is “forgotten banks."

Dragons Abound uses generative grammars in several places, such as generating place names as suggested above. I've written quite a bit about generative grammars previously, initially when I wrote about generating place names and then in more detail later when I wrote about generating map borders. If you're not familiar with generative grammars those posts will give you a more thorough introduction.

In Dragons Abound I've been using a tool called RiTa for generative grammars. I selected RiTa because it had several features I needed and I found it easy to add some that were missing. Since I've started using RiTa, version 2.0 has been released with some major new features. (Including some that I added to my version.) It's probably much better, but I have a considerable investment in my modified version, so I'm going to use that for map compasses.

UPDATE:  Or at least that was my intention. I've actually already written the next four blog posts, in which I develop 50 rules to generate two layer compasses like these:

But as I progressed along I found that I needed to add some additional features to my version of RiTa. First I wrote a parser that would take rules in a convenient text form (like the examples above) and turn them into RiTa rules. Then I added some capabilities that required rewriting a good chunk of the RiTa rules engine. But these fixes feel like they're “bolted on" and are just trying to compensate for a tool that isn't really fit for my purpose. I began to wonder if I wouldn't be better off re-writing the RiTa engine from scratch. I could incorporate my text-form parser and the modifications I've made as well as some additional features to make life a bit easier. In the end it seemed like a fun and intriguing side-project so in keeping with my philosophy, I'm going to make a bit of a right-turn here and write a procedural generation tool. Hopefully you'll find it fun and interesting as well.

Oh, and I need a name for the tool. Since I'm using it for procedural generation of map compasses, I'll call it Lodestone.

(Incidentally, this whole “I'm going to build my own tool because the available tools aren't exactly right for me" is a nasty trap that has killed more than one project.  If this were anything but a Forever Project I would not be doing this!  But it is and so I am.  And you'll end up with the code to a tool you can repurpose, so maybe it's a good thing from your viewpoint!  But as they say on TV, don't try this at home.)

The first step of this sub-project will be to define the format of the procedural generation rules and build a parser for them. A basic rule looks like this:
<start> => <first> <second> <third>;
It has a single non-terminal enclosed in angle brackets on the left-hand side of the rule, an arrow operator =>, and then a sequence of non-terminals (or other elements) on the right-hand side of the rule, terminated by a semi-colon.

There are a couple of difference here from RiTa and other tools. First, I'll require terminals to be enclosed in angle brackets. RiTa allows any sort of non-terminal, but delineating them this way makes parsing easier and (in my opinion) the rules easier to read. At any rate, I always write non-terminals this way so I'll require that in my tool. Second, I'll terminate a rule explicitly with a semi-colon. Rules tend to get long and complicated, and this will make it easy for me to split rules across lines and format them for easier reading.

A special kind of non-terminal has a name that starts with a $. This indicates a one-time terminal:
<$otr> => <fourth> <fifth>;
Normally a non-terminal is evaluated every time it is encountered, and if it can take on multiple values, they'll be randomly chosen each time. But a one-time terminal is only evaluated once, and then has that same value whenever it is used.

The right-hand side of the rule is a sequence of elements. In addition to non-terminals, the elements can include strings:
<first> => <second> and a string;
Note that the string is not quoted. Lodestone will generally do what you expect in picking out the string, but you can also use a quoted string if necessary (e.g., you want the string to include something that Lodestone might interpret as a non-terminal):
<first> => <second> 'and a quoted <string>';
Note the use of single quotes. Double quotes can be used without any special meaning.

An element can also be a Javascript expression enclosed in backticks:
<second> => `Math.PI/2`;
The Javascript expression is evaluated and the result goes into the output. Importantly, the Javascript expression can contain other Lodestone elements like non-terminals:
<second> => `Math.PI/<numSides>`;
And this does what you'd expect -- divides PI by the value of the non-terminal <$divisor>. You can also set the values of one-time non-terminals in Javascript:
<second> => `<$angle> = Math.PI/<numSides>`;
You cannot do this with a regular non-terminal; since they're evaluated every time they are referenced, giving them a value won't do anything.

If you want to evaluate multiple statements in the embedded Javascript you can do that by separating the statements with a semi-colon or the comma operator:
<second> => `<$divisor> = <numSides>/2; <$angle> = Math.PI/<$divisor>`;
The value of the embedded Javascript will be the value of the last statement.
The right-hand side of a rule can also have multiple choices separated by the vertical bar character:
<fourth> => 1 | 2 | 3 | 4;
In this case, the value of the rule is one of the choices selected at random. Every time <fourth> is used, this rule could pick a different number. But if the right-hand side was a one-time terminal like this:
<$otr> => 1 | 2 | 3 | 4;
Then a random choice would be selected only the first time, and <$otr> would have that value every time it was referenced. (Although as noted above, you can change this value in Javascript.)

Normally when a rule has multiple choices, they're all equally likely. To change that, a weight can provided in square brackets, like this:
<fifth> => cat [10] | dog;
Any choice that doesn't have a weight is given the default weight of 1. So in this case, the “cat" choice will be selected ten times as often as the “dog" choice. As with embedded Javascript, the value of the weight can be any Lodestone expression, so you could have these sorts of weights:
<fifth> => cat [<catweight>] | dog [`<catweight>/2`];
Lastly, a comment is any line that starts with a #. The entire line will be ignored.
# Ignored completely
A comment does not have to end with a semi-colon.

One last note: On the right-hand side of a Lodestone rule, whitespace is significant. For example:
<start> => <first> <second>;
<first> => cat;
<second> => dog;
These rules produce the string “cat dog" because the space between <first> and <second> is significant. On the other hand, these rules:
<start> => <first><second>;
<first> => cat;
<second> => dog;
Produce the string “catdog".

Whitespace is significant, but all whitespace outside of a quoted string gets reduced down to a single space character. So you could write the first rules as:
<start> => <first>
 <second>;
<first> => cat;
<second> => dog;
And the output would still be “cat dog". To get more whitespace, use a quoted string. But be careful! This rule with three spaces in the quoted string:
<start> => <first> '   ' <second>;
will actually produce five spaces between cat and dog, because there is whitespace on either side of the quoted string. This rule:
<start> => <first> ' ' <second>;
Would result in the expected three spaces between cat and dog.

And that's it for the definition of the rule language! So now I'll work on writing a parser.

Just as I did for CDL, I'll use Nearley for this parser. I'll also use a lexer. A lexer breaks a stream of characters into a stream of tokens, so that inside Nearley we won't have to do so much character-by-character processing. With careful definition of tokens, using a lexer can make a parser much easier to write and less likely to be ambiguous. (And in fact, I've defined the language with this in mind.) The Nearley guys recommend Moo, which looks straightforward, so I'll use that.

Much of the work of parsing will actually be done in the lexer. Moo uses regular expressions to define tokens, so the main work here is to craft regular expressions for all of the Lodestone tokens. I'll start off with a simple example, the token for the arrow operator:
 //
 // The arrow that separates lhs and rhs
 //
 arrow: /=>/,
Moo tokens are defined in a Javascript object. The key name is the type of the token, and the value is a regular expression. I'm not going to attempt to explain Javascript regular expressions here, but they're defined inside slashes, and in this case the regular expression just matches the “=>" characters.

A slightly more complex token is the end-of-rule token:
 //
 // End of rule
 //
 eor: /;\s*\r\n/,
There are two things to note here. First, the “\s" is a regular expression shorthand for “any whitespace character." Second, the asterisk after anything means “match zero or more" of the preceding thing. \r and \n match a carriage return and a newline, which is how the end of a line is defined on Windows. So in this case, the regular expression matches a semi-colon followed by any amount of whitespace followed by the end of the line.

Here's the regular expression for a one-time non-terminal:
 //
 // One-time non-terminal in the form <$test>
 //
 otnterm: /<\$[^>]+>/,
There are a couple of new things here. First of all, I have to put a backslash before the $ in this pattern because the $ has a special meaning in regular expressions (it means the end of a line) and so if I want to match an actual $ character, I have to escape it with a backslash to let Javascript know that I don't mean the end of line. After that is a pattern in square brackets. The square brackets are called a character class, and match any character inside the brackets. But the caret (^) at the beginning of the list of characters negates the list. So [^>] means “any character except a right angle bracket." The plus immediately following this is like the asterisk but means “match one or more times." And finally, there is a right angle bracket. So in total, this pattern says “match a string that starts with a left angle bracket, followed by a dollar sign, followed by any number of non-right angle brackets, and ends with a right angle bracket."

One more example that introduces a useful Moo feature:
 //
 // Embedded Javascript enclosed in backticks
 //
 jscript: {match:/`[^`]+`/, value: s => s.slice(1, -1)},
Instead of just a regular expression, the value of a token type can also be an object with various parts. The “match" key is the regular expression. (This one matches everything between two backticks.) The “value" key is a function that will process the matched text to produce a value. In this case, the value function slices off the first and last characters of the matched text. (This removes the backticks.) 

Here's the complete code for the lexer:
//
// Order matters in the lexer.  Earlier
// rules take precedence.
// 
const lexer = moo.compile({
  //
  // Comment is an entire line starting with #
  // 
  comment: {match: /^\s*\#.*\r\n/, lineBreaks: true},
  //
  // Or
  //
  or: /\s+\|\s+/,
  // 
  // White space, including newlines
  // 
  ws: { match: /\s+/, lineBreaks: true, value: s => ' '},
  // 
  // One-time non-terminal reference in the form <@test>
  //
  otntermref: /<\@[^>]+>/,
  // 
  // One-time non-terminal in the form <$test>
  //
  otnterm: /<\$[^>]+>/,
  // 
  // Non-terminal in the form <test>
  //
  nterm: /<[^\$][^>]*>/,
  // 
  // Embedded Javascript enclosed in backticks
  //
  jscript: {match:/`[^`]+`/, value: s => s.slice(1, -1)},
  //
  // A weight in the form [50]
  // 
  weight: {match:/\[[^\]]+\]/, value: s => s.slice(1, -1)},
  //
  // The arrow that separates lhs and rhs
  //
  arrow: /=>/,
  //
  // End of rule
  //
  eor: {match:/;\s*\r\n/, lineBreaks: true},
  //
  // A quoted string
  //
  qstring: {match: /'[^']*?'/, value: s => s.slice(1, -1), type: s => "string"},
  //
  // A string of characters
  //
  string: /[^`\'\<\[\]; \t\n\r]+/,
});
The easiest way to use the lexer in parsing is to match against the token type. This is done by using a % and the token name.  For example, in a Nearley parser rule, “%comment" would match a token with the comment type. Here's a simplified version of the Lodestone parser:
rule -> ws:? (%nterm | %otnterm) ws:? %arrow ws:? rhs ws:? %eor
rhs -> element | element %ws:? rhs
element -> %nterm | %otnterm | %string | %qstring | %jscript | %weight | %or
In Nearly, the “:?" suffix means to match zero or one. So this grammar says that a rule is possibly some initial whitespace, followed by a non-terminal or one-time non-terminal, the arrow, and then right-hand side of the rule followed by the end-of-rule. (With optional whitespace in various spots.) The right-hand side is then either a single element or an element followed by optional whitespace and a right-hand-side. This is a typical recursive grammar rule to describe one or more elements possibly separated by whitespace. Finally, an element is one of the tokens defined in the lexer -- the non-terminals, strings, quoted strings, embedded Javascript, weights, or the vertical bar.

The actual grammar is slightly more complex because it includes some post-processing so that the output of the parser is similar to the output of the CDL parser -- an array of structures with an opcode and a value. Here's an example of a simple rule and the result of parsing the rule:
<$test> => <rule>;
{
 op: 'rule',
 lhs: {type: 'otnterm',value: '<$test>'},
 rhs: [{type: 'nterm', value: '<rule>'}]
}
The rule is turned into a Javascript structure with an opcode of 'rule' to indicate that it is a rule, and then structures or lists representing the left and right-hand sides.  The objects with type and value are coming straight from the lexer. The parse of a whole grammar is simply an array of these rule structures.

When the parse is complete there is still some additional processing required to get the rules ready for use in a rules engine.

First, any Javascript and weight elements need to be further parsed. I'll do Javascript first since that's the simpler case. After the initial parsing above, the value of a Javascript or weight element is a string:
{type: 'jscript', value: '5'}
That's okay if the value is just a number as above. But the value of a Javascript can be anything that could be a right-hand side, as in this example:
{type: 'jscript', value: '<$totalweight>/3'}
Here the value of the Javascript element contains a non-terminal. Since the value of a Javascript element can be anything that would be on the right-hand side of a rule, I'll use the same rules to parse the value, by calling the parser again.   This shows an example rule and the result of parsing the rule and the parsing the value of the Javascript element:
<$test> => <rule> `50`;
{
 type: 'rule'
 lhs: {type: 'otnterm', value: '<$test>'...}
 rhs:
 {type: 'nterm', value: '<rule>', text: '<rule>', offset: 32, toString: …}
 {type: 'string', value: ' '}
 {type: "jscript" value: {type: 'string', value: '50' ...}
} 
I've highlighted the relevant portion. The value of the jscript element has been parsed to a string element with the value 50. Here's what an initial version of the Javaascript processing looks like:
function processJscript(rhs) {
    // For each element in the rhs of the rule
    for (let i=0;i<rhs.length;i++) {
	if (rhs[i].type == "jscript") {
	    // Each time we parse we need to create a new parser, otherwise
	    // it just continues on from where it ended.
	    // const parser = new nearley.Parser(Lodestone,'rhs');
	    const parser = new nearley.Parser(nearley.Grammar.fromCompiled(Lodestone),'rhs');
	    // Run the value through the initial parser
	    parser.feed(rhs[i].value);
            rhs[i].value = parser.results[0];
        };
    };
 return rhs;
};
As you can see, I'm feeding the value of the Javascript element into a parser with the 'rhs' starting point.

However, there's some legitimate Javascript that the parser will mistake for Lodestone.  For example, in Javascript [50] would be an array subscript, but the parser will mistake it for a rule weight.  There's a similar problem with the pipe symbol.  So after I've run the parser, I'll add a step turn those back into strings:
	    // Fix or and weight tokens
	    for (let j=0;j<rhs[i].value.length;j++) {
		let token = rhs[i].value[j];
		if (token.type == 'or' || token.type == 'weight') {
		    token.type = 'string';
		    token.value = token.text;
		};
	    };
And now the value of a Javascript element will include things like “<foo>" and “<$foo>" which I want to be interpreted as Lodestone, but won't treat something like “[50]" as Lodestone.

The processing of weight elements looks much the same, with one additional wrinkle.  Weight elements can contain embedded Javascript.  When that happens, I need to call processJscript to take care of the embedded Javascript.  Since processJscript is expecting to get a list of elements to process, I can simply give it the result of the parse:
	    // Run the value through the initial parser
	    parser.feed(rhs[i].value);
	    // Now process any Jscript in the RHS
	    rhs[i].value = processJscript(parser.results[0]);
(At this point, you may be wondering why I didn't take care of properly parsing embedded Javascript in the original lodestone.ne parser.  The answer is that (1) it would have made the parser much more complex and difficult to understand, and (2) creating a parser that allows Javascript within weights would have been tricky.  I thought it was easier to understand and more straightforward to handle it as I have shown above.  But see the Suggestions below if you want to tackle this!)

The next processing is a bit of bookkeeping. When two string elements are next to each other, they can be merged into a single element to simplify the rule.
function processStrings(rhs) {
    let newRHS = [];
    let currentString = null;
    for(let i=0;i<rhs.length;i++) {
	// Current element is not a string
	if (rhs[i].type != "string") {
	    // If we've been building up a string
	    // element, release it.
	    if (currentString != null) {
		newRHS.push(currentString);
		currentString = null;
	    };
	    // If this is a Jscript or weight it needs
	    // to be recursively processed.
	    if (rhs[i].type == "jscript" || rhs[i].type == "weight") {
		rhs[i].value = processStrings(rhs[i].value);
	    };
	    // And save this element
	    newRHS.push(rhs[i]);
	} else {
	    if (currentString == null) {
		// If not building a string, start one.
		currentString = rhs[i];
	    } else {
		// Add this string to the currentString.
		currentString.value =
		    currentString.value.concat(rhs[i].value);
	    };
	};
    };
    if (currentString != null) {
	newRHS.push(currentString);
    }
    return newRHS;
};
The basic idea here is to walk through the right-hand side elements, pushing them into newRHS. But when a string element is encountered, it is set aside (in currentString) without pushing into newRHS. If it is followed by other string elements, they're concatenated one by one onto currentString. When a non-string element is encountered (or when the end of the right-hand side is encountered) whatever has been gathered up in currentString is added to the newRHS. One thing to watch out for in implementing this pattern is the final if that releases currentString at the end of the right-hand side. It's very easy to forget that. The only other tricky bit is the recursion in the middle of the function. As with the previous processing, since Javascript and weight elements contain right-hand sides, we must recurse down into those values and process them as well.

There's one more bit of processing to do before building the rule dictionary. That has to do with the possible random choices on the right-hand side of a rule. A rule like this:
<start> => <a> | <b> | <c> [50] | <d>;
is executed by selecting one of the right-hand side choices at random, based upon the weights. One thing that may not be immediately apparent is that the above rule is the same as these rules:
<start> => <a> | <b>;
<start> => <c> [50];
<start> => <d>;
When the rule engine is executing, it will be useful to have all the options gathered up in one place with their associated weights. The first step of that is to break any rule that has multiple choices on the right-hand side into multiple right-hand sides.

Processing the choices is similar in some ways to processing the strings. We walk through the elements in a right-hand side, collecting them up until we hit a vertical bar (an “or" token) or the end of the rule. The elements we've gathered so far constitute one choice for this rule, so we save them in a simple choice structure that has the elements in the choice and the weight for the choice. The result when we're done is a list of choices [{weight: 1, elements: [...]}, {weight: 1, elements: [...]}]:
function processChoices(rhs) {
    let rhsList = [];
    let currentRHS = [];
    let currentWeight = '1';
    for(let i=0;i<rhs.length;i++) {
	// Starting a new choice
	if (rhs[i].type == "or") {
	    rhsList.push({weight: currentWeight, elements: currentRHS});
	    // Reset the currentRHS
	    currentRHS = [];
	    currentWeight = '1';
	    continue;
	};
	// Recursive processing of Javascript and weights
	if (rhs[i].type == "jscript" || rhs[i].type == "weight") {
	    rhs[i].value = processChoices(rhs[i].value);
	};
	// Saving the weight of the current choice
	if (rhs[i].type == "weight") {
	    // Simplification if weight is just a string
	    if (rhs[i].value.length == 1 &&
		rhs[i].value[0].elements.length == 1 &&
		rhs[i].value[0].elements[0].type == "string") {
		currentWeight = rhs[i].value[0].elements[0].value;
	    } else {
		currentWeight = rhs[i].value;
	    };
	    continue;
	};
	// Otherwise put the element in currentRHS
	currentRHS.push(rhs[i]);
    };
    // Pick up the last choice
    if (currentRHS != []) {
	// Put the previous choice in the rhsList
	rhsList.push({weight: currentWeight, elements: currentRHS});
    };
    return rhsList;
};
There's a simplification that saves the weight as a simple string if that's possible. Otherwise, the weight is itself a right-hand side that has been processed into choices. (As is any Javascript element.)

The last step is to store all the rules into a dictionary and to coalesce any rules that have the same non-terminal on the left-hand side.
function processRules(rules) {
    const dict = {};
    // For each rule in the ruleset
    for (let rule of rules) {
	if (rule.type == "rule") {
	    rule.rhs = processWeights(rule.rhs);
	    rule.rhs = processJscript(rule.rhs);
	    rule.rhs = processStrings(rule.rhs);
	    rule.rhs = processChoices(rule.rhs);
	    // Now add it to the dictionary
	    const key = rule.lhs.value;
	    if (key in dict) {
		// Already here, so add the rhs to the existing rhs
		dict[key].rhs = dict[key].rhs.concat(rule.rhs);
	    } else {
		// New key; add to dict
		dict[key] = {rhs: rule.rhs};
	    };
	};
    };
    return dict;
};
We call the four processing steps on each right-hand side, and then index the result into the dictionary based on the left-hand side. (Note that comments never get processed here, so they just silently disappear.)

And that's it!  A bit of a whirlwind, but that's the complete definition of a generative grammar tool and and an implementation of its parser in a single blog post.  If some parts remain mysterious, please grab the code and play with it until you understand what is going on.

Unfortunately there's not much you can do with Lodestone yet, but you can download the Part 12 branch from the Procedural Map Compasses repository  on Github to access all of this code. Part 1 has instructions, but the short version (on Windows) is to double-click mongoose-free-6.9.exe to start up a web server in the Part 12 directory, and then select the test.html link.  Obviously you can browse the code and other files, and make changes of your own.  You can also try out this code on the web at https://dragonsabound-part12.netlify.app/test.html although you'll only be able to run the code, not modify it. 

Next time I'll write the engine that will execute the rules in the dictionary.

Suggestions to Explore
  • As suggested above, try writing a more complex grammar that parses the embedded Javascript and weight elements correctly so that no post-processing is required.

  • The simplest part of the rule engine will take non-terminals and expand them into other non-terminals based upon the rules. Create a simple rule set with all non-terminals:
    <start> => <a> <b>;
    <a> => <e>;
    <b> => <c>;
    <c> => <d>;
    
    Now create an interpreter that starts with “<start>" and repeatedly expands the terminal based on the rule set. When you can't expand the first non-terminal any more, “output" it and move on to expanding the next non-terminal. Expand the engine to include strings; these just get output as soon as they are encountered.