The second of three posts about the details of my NaNoGenMo 2016 project, Annales
1. Vocabularies | 2. TextGen | 3. Events
I'm proud that Annales is the only representative of Haskell in 2016's entries, but there's a reason that it's the only one: knocking together quick text generators is really a job for languages like Python or Javascript.
I chose Haskell because I'd already used it to write TextGen, a text generation library, to build @amightyhost's random armies. And also because I love coding in it. TextGen is a bit flaky and amateurish because it was my attempt to understand a couple of Haskell concepts by building my own versions: the State monad, and combinator libraries.
Haskell's brand of strictness means that random generation of anything requires a bit more explicit thought than in most languages, plus some concepts borrowed from category theory. A Haskell function always returns the same value for the same set of inputs, which on first glance would seem to make randomness impossible. But "random" numbers in any language are actually pseudo-random: they're generated by an algorithm whose outputs are sufficiently evenly distributed to seem random enough for most purposes. Haskell - if you're doing what I did and building certain things from scratch, rather than using existing libraries - requires you to explicitly pass around a value representing the state of the pseudo-random algorithm between functions which generate random values. The State monad provides an idiomatic way of doing the plumbing to pass the generator values around, and also let me write TextGen as a combinator.
Combinators allow a really elegant style of programming which involve building functions from smaller, simpler functions. A TextGen is a function which takes a random number generator, calculates a random chunk of text, and returns the text and the updated generator (it could be a randomly-generated list of any kind of value: Annales uses a specific version called TextGenCh which works with text).
The most basic text generator is one which always returns the same string: these are built with the word combinator
g = word "dog"
The combinator list makes a new generator which will return the output from several combinators in sequence:
l = list [ word "dog", word "and", word "cat" ]
Neither of these do anything random: that's where choose and perhaps come in. choose returns a generator which randomly chooses one of a list of generators, and returns its output:
o = choose [ word "dog", word "cat", word "mouse" ]
perhaps will generate its argument sometimes, based on a probability: the following generator returns "the dog" 50% of the time, and "the brown dog" the other 50%.
s = list [ word "the", perhaps ( 1, 2 ) $ word "brown", word "dog" ]
and weighted is a variant of choose which lets you weight the options:
e = weighted [ ( 60, common ), ( 30, strange ), ( 10, weird ) ]
There are a few other combinators to do syntactic twiddles, like aan, which prepends "a" or "an" to another generator, depending on whether its output starts with a consonant or a vowel.
There are also functions which load vocabulary files and return a TextGenCh which gives a random value from the list, which is how the nonsense words from Glossatory are integrated into the code.
All of these combinators return TextGenChs, so they can in turn be used to build up text generators of arbitrary complexity and depth.
(The technical answer to how this works is the one I've come to expect in Haskell: it's all done with lambdas. Each combinator takes its arguments and uses them to assemble and return a lambda which will, when called, run the appropriate arguments and thread the generator state between them and out the other end, using the generator, if required, to make any random choices on the way.)
With a bit of a makeover and some extra features, TextGen could be turned in to something quite elegant:
- Chuck out my home-grown state monad and replace it with StateT
- Integrate the vocabulary with the random-number generation state. (In Annales, the vocab files are grafted in via the variable which describes the state of the realm, which is pretty bad.)
- Write a format for describing generators and a parser for it
It would be a lot more usable if instead of writing Haskell code, you could write something like this, and then feed it to a program which would build and run the combinator:
animal = "animals.txt"
adjective = "adjectives.txt"
"the" adjective "," adjective animal "jumped over the" adjective animal
I thought that most of the work on Annales was going to be the fun job of writing generators which output lots of elegant pseudo-Gibbon prose and euphemisms for deaths on the battlefield, but I ran out of time for that, so the actual text of Annales is much too repetitive. Almost all of the text generation is done in Annales.Description, if you're interested in seeing what it looks like.
What took up most of the time was the event loop which generates the underlying incidents to be described.
1. Vocabularies | 2. TextGen | 3. Events