Annales was originally going to be a long-form version of @GLOSSATORY, a Twitter bot I made earlier this year. Glossatory is driven by a recurrent neural net trained on around 80,000 definitions from the lexical database WordNet. A couple of months ago I was toying with using it to build a dictionary with some internal cross-referencing - the code pulled out one random definition, then generated more definitions for the words used in that definition, and so on until it reached the word count - but the results weren't very interesting.
When I had the idea of procedurally generating events in the history of a kingdom, I thought Glossatory would be a good source of names. In learning to imitate dictionary entries, Glossatory's 'terms' - the nonsense words being defined - have acquired a certain quality common to place names, personal names and unfamiliar terminology. Because of the use of Latin and Greek in scientific naming, they often reminded me of how the names of emperors sounded when I was a kid, alien but familiar, oddly displaced with regards to time, partaking both of the ancient and the science fictional.
extract.py is a Python script which extracts terms from a list of Glossatory definitions and partitions them into vocabulary files based on whether they matched patterns which seemed like a good way to build plausible names. English lets you make a lot of guesses about an unfamiliar word based on its last few letters, and that's what most of the patterns match on. Here are the patterns from an early draft:
- gods: short words ending in 'sh', 'is', 'ne', ', 'ch', 'th', 'us', 'om', 'gh', 'or', 'rg', 'rh' or 'b'
- religions: words ending in 'ism' or 'ity'
- women: words ending in 'a'
- men: words ending in 'us', 'on' or 'an'
- tribes: words ending in 'es', 'i' or 'ae'
- places: all other words
In the early versions, I only used Glossatory for proper nouns: most of the other vocabularies were cobbled together by hand, built with things like Old Disease Names, or extracted from the WordNet lexical database using the Python NLTK (Natural Language Toolkit), which is an absolute boon to anyone making generated texts. For another Twitter bot, @amightyhost, I'd written a script, hyponyms.py, which searches WordNet for all of the specific cases of a general concept, so if you ask it to search for 'animal' it returns 2,871 lines, starting:
post horse Mexican hairless dorbeetle potter wasp fall cankerworm woodland caribou Pekinese lesser yellowlegs white crappie small white
I left this initial vocab alone for a while and concentrated on the events cycle, but as I played with that, I got sick of the original lists and started using Glossatory to make not just names but creatures, adjectives, abstract nouns, weapons, food and drink, and so on. This made the output more mysterious and got rid of a lot of the jokey anachronism of early drafts, in which rulers often "choked on a German Shepherd bone".
I didn't want to lose the English ruler surnames like "Streadina I the Groveling", though. For these, I used another WordNet script, which searches for all synsets (synonym sets) matching a particular part of speech. The script epithets.py dumps out about 30,000 adjectives:
able unable abaxial dorsal adaxial ventral acroscopic basiscopic abducent abducting
A few of the original hand-crafted vocabulary files are still in the system, again, because I didn't want to let go of some silly gags, even if they didn't end up in the generated output.
Here are a few excerpts from the vocabularies. The rest can be seen here.
men.txt Kan Gastion Carinon Barolar Lawman Blastian Acrergan Tentarian Dolitation Importion Batterton Scolice Station Cliptrocian women.txt Malosm Strail Citoga Bittena Artichous Spea Thermotocta Bospita Scradica Ganda Axerella tribes.txt Insteridae Rapiaceae Seculatiae Dotidae Garama Lappi Currinidae Hair Thili Barboridae Convertidae monsters.txt chanching loorbug aughinala biant carteclonidae vantasmetel maight nouled bootch night-fig-fullon roverant cele diseases.txt telepips transferver fiisycouth paynessy permethover ulterspouth titeway pronover sedfiroinsy logatosy