Friday, October 7, 2016

Is It Noisy In Here?

I'm not going to write a detailed explanation of noise and its use in procedural generation.  There's plenty of material on that topic for the interested.  I'll summarize by saying that for procedural generation, truly random numbers don't always work well.  Often it works better to use a random number generator that varies smoothly according to some input parameters, aka "noise".  This idea was famously popularized in 1985 by Ken Perlin when he invented "Perlin noise".  (He won an Academy Award for this work, so don't tell me that computer science isn't glamorous.)  Noise is often illustrated with this sort of "cloud" picture:

Here the dimensions (x, y) are the input parameters, and the noise output is a number from 0 to 1 shown as a shade from black to white.  You can see that the noise (shade) varies smoothly in interesting ways as you move across the photo.

What I will talk about here are a couple of pitfalls, issues and technical details about noise generation (specifically Perlin and Simplex noise) that I have not seen much mention of elsewhere.

The first pitfall is that some implementations of Perlin noise or Simplex noise on the Web are not correct.  For example, the first result in a Google search for "perlin noise javascript" is this page.  If you read through the code on that page you'll notice a curious thing:  there are no initialization parameters and no use of random numbers.  If you use that code, you'll generate exactly the same Perlin noise every time you use it!  That's not exactly an error, but it's probably not how you expect it to work.

The second result in the Google search has a more serious problem.  Take a look at the first twenty lines of the code and see if you see spot the problem:

// Ported from Stefan Gustavson's java implementation
// http://staffwww.itn.liu.se/~stegu/simplexnoise/simplexnoise.pdf
// Read Stefan's excellent paper for details on how this code works.
//
// Sean McCullough banksean@gmail.com

/**
 * You can pass in a random number generator object if you like.
 * It is assumed to have a random() method.
 */
var ClassicalNoise = function(r) { // Classic Perlin noise in 3D, for comparison 
   if (r == undefined) r = Math;
   this.grad3 = [[1,1,0],[-1,1,0],[1,-1,0],[-1,-1,0], 
                                 [1,0,1],[-1,0,1],[1,0,-1],[-1,0,-1], 
                                 [0,1,1],[0,-1,1],[0,1,-1],[0,-1,-1]]; 
   this.p = [];
   for (var i=0; i<256; i++) {
   this.p[i] = Math.floor(r.random()*256);
   }
   // To remove the need for index wrapping, double the permutation table length 
   this.perm = []; 
   for(var i=0; i<512; i++) {
  this.perm[i]=this.p[i & 255];
  }
};
The good news is that this code is using a random number generator to initialize the noise generator, so you won't get the same results every time you run it.  The bad news is that this code is filling a permutation table with random numbers.  This table is supposed to be shuffled numbers from 0 to 255.  Instead it's some random selection of numbers from 0 to 255.  Most of the time that will work pretty well.  Just hope you aren't the unlucky user who ends up with the table of 255 ones!

Assuming you find a correct implementation, there are also issues with the return value of Perlin and Simplex noise functions.  To start with, some implementations -- like the first one above -- return values from 0 to 1, while others -- like the second one above -- return values from -1 to 1.  So you need to be check that before using.  But a more serious issue with the return value is that it doesn't really range from -1 to 1.  Ken Perlin stated that his original implementation returned values from -1 to 1, but it actually returned values (for the two-dimensional case) from about -.70 to +.70.  This was (mostly) fixed in the design of Simplex noise, and some implementations of Perlin noise correct the range as well.  But if you're not certain, you should check the range of your noise function.

For advanced students, a further complication with return values arises when we consider using octaves.  The basic idea of octaves is to layer noise at different frequencies and different scales.  This can create a more natural result, and is how noise is most often used in procedural generation.  However, you need to be aware that this changes the range of the noise output.  The random noise tends to cancel out, so the range of the output narrows.  For example, if we have a noise source that has a range of -1 to 1 and we combine 6 layers with a persistence of 0.5, the resulting noise has a range of about -0.82 to 0.82.

(Even more advanced students will realize that I actually lied in that previous paragraph.  I'll get back to that in a moment.)

Unfortunately, range isn't the only issue with the return values of the noise functions.  It turns out that the output of these noise functions is not uniformly distributed.  If you make a histogram of the output of a Perlin noise function, it looks something like this (copied from here):


This shows that numbers near zero are much more likely than numbers near the edges of the range.  If you use a noise function to (say) create mountains, and your rule is that you have a mountain wherever the noise is greater than 0.5, you may be surprised to see that your mountains cover only about 5% of the map instead of the expected 25%!

(Now that we've seen the distribution of noise, let me return now to my earlier lie.  It's not exactly true that when you layer noise in octaves the output range is reduced.  Summing noise functions makes the output distribution taller and steeper.  On a practical level, this means that the range of the output numbers will narrow.  But it's still possible -- just very unlikely -- to get larger numbers.  Layering 6 octaves with a persistence of 0.5, the expected range is about -0.82 to 0.82, but theoretically you could get a number from approximately -2 to 2.  If a number outside of the expected range will break your code, you should check for that and clip anything outside that range.)

I've recommended testing your noise function to understand it's range, so let's talk now about how to do that.  Here's some Javascript code to generate a hundred million values of a one-dimensional noise function and check the output range by finding the highest and lowest returned values.  Can you see the problem with this code?

    var pn = new Noise();
    var minVal = 9999;
    var maxVal = -9999;
    for(var i=0; i < 100000000;i++) {
       var val = pn.noise(i);
       minVal = Math.min(minVal, val);
       maxVal = Math.max(maxVal, val);
    };
    console.log('Noise range was '+minVal+' to '+maxVal);


Don't feel bad if you don't.  There's no real error in this code -- the problem is in the noise generation code.  That starts off like this:

    Noise.prototype.noise = function(x) {
       var bx0 = Math.floor(x) % 255;

       [...]

As you can see, the whole number portion of the input "x" is taken modulo 255.  As a result, this noise function has a periodicity of 256 -- it starts repeating after 256 units in the "x" direction.   We think we're testing 100 million values, but we're actually only testing 256!  This periodicity is related to the 256 value permutation table we saw earlier.  In theory, a noise function can use a longer permutation table for a longer periodicity, but in practice you'll rarely see that implemented.

The consequence of this periodicity is that you need to scale your input parameters to the range [0, 255].  If you want to use noise to generate a landscape that is 10,000 units on a side, you need to divide your x and y coordinates by 40 before feeding them into the noise function.  (There's no periodicity with the fractional portion of the input parameters, so you can simply divide by the appropriate number.)

A less well-understood consequence concerns the octave function.  The typical implementation of octaves multiplies the input parameters by 2 every octave, so the noise being layered starts to repeat after the 8th octave (2^8 == 256).  In practice this is inconsequential because that many octaves of noise are rarely used.  But if for some reason you really need that many octaves, consider extending your noise generator to a greater periodicity.

In a future post I'll talk about ridged noise and at some point I'll get around to cleaning up the Javascript implementation I'm using and post it somewhere.

1 comment:

  1. This is super interesting! I didn't expect to see the Central Limit Theorem from Probability Theory in procedural generation, but it makes sense. Kepp up the great work!

    ReplyDelete