DField SolutionsMérnöki stúdió · Budapest
Loading · Töltődik
Skip to content
Category: Generative NLP

Glossa

Learns a voice, then speaks in it.

What it is

Glossa is a Markov-chain text generator that learns the style of a corpus and produces new text in that voice. The signature technique is to tokenize the input, tally each n-gram's possible continuations, and then sample forward with weighted probabilities from sentence-aware seeds, with its central test guaranteeing the generator never emits a transition it did not see in training. It is a from-scratch, dependency-light build you can download and run locally.

A pure Markov text generator: tokenize → tally each n-gram's continuations → sample forward, weighted, from sentence-aware seeds. 13 tests, centred on the invariant that the generator never emits a transition it didn't see in training.

What's inside

The full source, the tests, and CI. Open it, read it, change it. A zero-dependency core, free, in the MIT spirit.

Run it after unzip

pnpm install && pnpm dev