Two kinds of constraint

A fun quality of Mastodon - a social communication platform I hang out on a bit too much - is that individual instantiations can fork its basic form and add odd twists which suit just that community. Oulipo.social notoriously bans that fifth Latin glyph which is most common in many linguistic forms, in a spirit of linguistic play.

On a whim, I thought: what fraction of my bot GLOSSATORY's output would satisfy oulipo.social's constraint? GLOSSATORY is a computational graph - an abstract analogy to how scholars posit tiny portions of our own brains function, a form of AI which is popular right now - which was taught using 80,000 annotations from a scholarly linguistic corpus and which output is a sort of absurd but grammatical dictionary.

Luckily, GLOSSATORY logs its output, so I had a big body of work from which I could try to winnow out all oulipo-compliant parts. I found that about thirty out of six thousand posts satisfy this constraint. It's amazing, in a way, that as many compliant posts as this pop up simply at random, with no constraints at all on GLOSSATORY's input data. Although it's a small ratio, it's constant.

So by configuring GLOSSATORY's original bot to output a surplus of data, and automatically winnowing compliant parts into an oulipically-fit corpus, I had a supply of annotations without that fifth glyph, and could rig up GLOSSATORY's sibling to post this sort of thing:

SCALLANDRID: a small candy
SAMURAISM: a form of social organization
ROAD BOARD: a board that is part of a straight color
BOWLING: a starchy walking shot
STUFFING: a station with a short forward post
FILLING: a stairway to a moving fabric or foot
BODY OF POLYS: a small paint consisting of a long narrow coat of books
SPARK ANIMAL: an aircraft with a cigar
TYPHIKA: a family of orb; parasitic on or naturality
FIGHTING SHOT: a pitch that stands around a balloon
TONOGRAPH: a mark (usually with a window on strap)
POLYGYNOGRAPH: a social study of a proposition
CARAWAY STAR: a spar of sporting carrots

(I draw a bunch of GLOSSATORY's outputs daily for its Instagram and Mastodon illustration accounts, and in honour of its oulipo-launch, I'm just doing compliant posts for a bit.)

Its output, as is that of its non-compliant sibling, is fairly grammatical and without many spurious words. It's fascinating (ok, I find it fascinating: you may not) to watch how it transforms stylistically, just without a glyph. Oulipian GLOSSATORY is whimsical and succinct, not as abstract, lacking politics or nations, with its own flavour by comparison to its sibling.

On my posting about this bot to oulipo.social, Ojahnn had a galaxy-brain notion: to train a computational graph on a corpus of only compliant strings. Having no illicit glyphs in its input at all, such a graph would naturally output compliant posts, with a sort of total rigour from start to finish.

This is a good illustration of two rival forms of oulipian constraint. Firstly, winnowing compliant outputs from a body of ordinary words: many bots on oulipo.social in addition to GLOSSATORY follow this, providing compliant parts of famous works such as Dracula or Moby-Dick. A paradigm of surplus, in which oulipianism is a by-product of non-compliant but abundant corpora.

Or, I could build a word-calculator which was totally oulipian. A paradigm not of partition but filiation, not sorting linguistic ovids from caprids, but compliant ab ovo, or as if from an imaginary history and sociology without that glyph at all. (For it's not just about a glyph: that glyph has implications which join it to all words, animals, locations, humans and nations now and in history which you can't say without using it.)

By analogy, in composing oulipian blog posts and so on, I find my brain hunting though its inward dictionary, always looking for synonyms to supplant a non-compliant word for that which I am trying to impart - obviously, I'm working within a paradigm of winnowing. But a human who could do this in a purist way, without constantly back-tracking, but writing as if with a vocabulary containing only words without that glyph from birth? Such a woman or man I would acclaim as a prodigy, with a miraculous facility which I cannot match, an almighty Oulipian of Oulipians. But I'm distracting us from my main point.

I still didn't know if building a strict computational graph was a possibility. What could I train it on? I found that out of GLOSSATORY's original training data, which consists of 82,115 annotations, 671 did not contain that glyph:

DWARF: a plant or animal that is atypically small
THING: an action
GOING: advancing toward a goal
PRATFALL: a fall onto your buttocks
MISAPPROPRIATION: wrongful borrowing
SHY: a quick throw
GOLFING: playing golf
POP FLY: a short high fly ball
CLIP: a sharp slanting blow
SWAT: a sharp blow
JAB: a quick short straight punch
KISS: a light glancing touch
JUNCTION: an act of joining or adjoining things
SNAP ROLL: a fast roll
PROMOTION: act of raising in rank or position
UNIONIZATION: act of forming labor unions
BATH: you soak and wash your body in a bathtub
FAMILIAR: a spirit (usually in animal form) that acts as an assistant to a witch or wizard

Comparing ratios, GLOSSATORY's original output has about half as many compliant posts (a ratio of 0.004) as its training data. (Is this not just random? Did a clan of oulipian linguists lurk among its anonymous authors? That gloss of "FAMILIAR" looks suspiciously apt...)

671 is kind of small for training data for this kind of computational graph, but I put a GPU to work on it anyway. I was playing with configurations controlling how fast it trains, hoping that it would pull as much structural information as it possibly could from such a small corpus. But if this is too high, optimisation is too fast and its outputs grow boring and random, akin to birdcalls or glossolalia:

(training at 0.4)
OCAISHILILITARK: chand of doun pal bor ing cor of a a iting in a of a rof of of a inal or conal
SALLILA: a of in an rcaron an brous
TISHO: a lha tich shion of of dinting whing of for osil an insior a than ing strof a ma archam
BINN: a can insit

(0.3)
JAAA: aaa aa w naaftn itaoain of
JAHBTILICD: norrt wrt i in wnrk o naniroaaliriaaoa w aolth nnkinin acin anrl an dn a in inri
A: raaoihn min anaraaa lahin ooahing rn no in aioaringing oirihoa nhioaaoiin wwtaiio aaaihia o wlatacoaintin tain ihi ahon fwoanicrhov
R: aaas ara oooau bv lan r raaaoaoaahn an uaor

(0.2)
LALISUPOORON: a sof in a strack
CON: a mormoll card
SHITION: a cat a orting a coun wor cous ond frown of of flim wing a smal for an a srountainat
SHOON: a part of datind of man or pato busting a son cal wity

(0.1)
SATRICUS: can sillortigh a for for a ming ss
COSTATION: acy wanit of achollic
LACK: a coucorom in a srath
ARDA: wall bang faymach

(0.01)
CARK: a shint of cortion
PRISM: a bird of a for a tart of a dight woman of dusish grat
SANDILA: collass for wolf
PANK: a pusting of a grouns

(0.02)
WORT: a sharmation for a partion
CHING: a containing a for dout
JACK ALLLING: a doy homan with a farty rots
HATCH: a boration of hots

At its most grammatical, this graph has a lot of non-dictionary words and is a bit shaky by comparison with its sibling: this is natural, as its training data was so small. As it's hard to find a corpus of 80,000 valid annotations, purist-oulipian GLOSSATORY is a phantom, a drunk and utopian avatar of its original, although I will probably try tuning it and improving it. And it's too absurd, I think, to suit illustration. If I'm to draw a bot's output, it should contain actual words and situations in a funny juxtaposition, as in that aircraft smoking a cigar. Purist-GLOSSATORY is too radical in its dissociation of word from word, of glyph from glyph.

It has its own kind of silly charm, though, nocturnal and hallucinatory. It's also a bit scatological, as many dirty Anglo-Saxon words also lack that glyph. So GLOSSATORY now posts output from both its original, winnowing program, and, at night (in Australia) its wobbly and farting but rigorously glyph-omitting companion.

GLOSSATORY's original information locus

GLOSSATORY on GitHub