The first of three posts about the details of my NaNoGenMo 2016 project, Annales

1. Vocabularies | 2. TextGen | 3. Events

Annales was originally going to be a long-form version of @GLOSSATORY, a Twitter bot I made earlier this year. Glossatory is driven by a recurrent neural net trained on around 80,000 definitions from the lexical database WordNet. A couple of months ago I was toying with using it to build a dictionary with some internal cross-referencing - the code pulled out one random definition, then generated more definitions for the words used in that definition, and so on until it reached the word count - but the results weren't very interesting.

(For more on Glossatory, see its home page. It was trained and run using Justin Johnson's Torch-RNN recurrent neural network. The Glossatory neural net itself can be downloaded here)

When I had the idea of procedurally generating events in the history of a kingdom, I thought Glossatory would be a good source of names. In learning to imitate dictionary entries, Glossatory's 'terms' - the nonsense words being defined - have acquired a certain quality common to place names, personal names and unfamiliar terminology. Because of the use of Latin and Greek in scientific naming, they often reminded me of how the names of emperors sounded when I was a kid, alien but familiar, oddly displaced with regards to time, partaking both of the ancient and the science fictional.

extract.py is a Python script which extracts terms from a list of Glossatory definitions and partitions them into vocabulary files based on whether they matched patterns which seemed like a good way to build plausible names. English lets you make a lot of guesses about an unfamiliar word based on its last few letters, and that's what most of the patterns match on. Here are the patterns from an early draft:

  • gods: short words ending in 'sh', 'is', 'ne', ', 'ch', 'th', 'us', 'om', 'gh', 'or', 'rg', 'rh' or 'b'
  • religions: words ending in 'ism' or 'ity'
  • women: words ending in 'a'
  • men: words ending in 'us', 'on' or 'an'
  • tribes: words ending in 'es', 'i' or 'ae'
  • places: all other words

In the early versions, I only used Glossatory for proper nouns: most of the other vocabularies were cobbled together by hand, built with things like Old Disease Names, or extracted from the WordNet lexical database using the Python NLTK (Natural Language Toolkit), which is an absolute boon to anyone making generated texts. For another Twitter bot, @amightyhost, I'd written a script, hyponyms.py, which searches WordNet for all of the specific cases of a general concept, so if you ask it to search for 'animal' it returns 2,871 lines, starting:

post horse
Mexican hairless
dorbeetle
potter wasp
fall cankerworm
woodland caribou
Pekinese
lesser yellowlegs
white crappie
small white

I left this initial vocab alone for a while and concentrated on the events cycle, but as I played with that, I got sick of the original lists and started using Glossatory to make not just names but creatures, adjectives, abstract nouns, weapons, food and drink, and so on. This made the output more mysterious and got rid of a lot of the jokey anachronism of early drafts, in which rulers often "choked on a German Shepherd bone".

I didn't want to lose the English ruler surnames like "Streadina I the Groveling", though. For these, I used another WordNet script, which searches for all synsets (synonym sets) matching a particular part of speech. The script epithets.py dumps out about 30,000 adjectives:

able
unable
abaxial
dorsal
adaxial
ventral
acroscopic
basiscopic
abducent
abducting

A few of the original hand-crafted vocabulary files are still in the system, again, because I didn't want to let go of some silly gags, even if they didn't end up in the generated output.

Here are a few excerpts from the vocabularies. The rest can be seen here.

men.txt

Kan
Gastion Carinon
Barolar Lawman
Blastian
Acrergan
Tentarian
Dolitation
Importion
Batterton
Scolice Station
Cliptrocian

women.txt

Malosm Strail
Citoga
Bittena
Artichous Spea
Thermotocta
Bospita
Scradica
Ganda
Axerella

tribes.txt

Insteridae
Rapiaceae
Seculatiae
Dotidae
Garama Lappi
Currinidae
Hair Thili
Barboridae
Convertidae

monsters.txt

chanching loorbug
aughinala
biant carteclonidae
vantasmetel
maight
nouled bootch
night-fig-fullon
roverant cele

diseases.txt

telepips
transferver
fiisycouth
paynessy
permethover
ulterspouth
titeway pronover
sedfiroinsy
logatosy

1. Vocabularies | 2. TextGen | 3. Events


Comments

comments powered by Disqus