unipseudo-go

Unipseudo-go Pseudoword Generator

This program generates pseudowords (pronounceable, fake words that look like real words) by chaining trigrams (sequences of three characters) extracted from a provided list of real words using Markov chains.

The generated pseudowords are guaranteed to:

For more info, see:

New, B., Bourgin, J., Barra, J., & Pallier, C. (2024). UniPseudo: A universal pseudoword generator. Quarterly Journal of Experimental Psychology, 77(2), 278–286. https://doi.org/10.1177/17470218231164373 [PDF]

Origin

This Go program is a direct port of the original R script, pseudoword-generation-by-markov-on-trigrams.R.

This script runs the Unipseudo web tool.

The Go port maintains the exact structural and algorithmic behavior of the R script—such as position-dependent trigram selection and robust UTF-8 handling—while optimizing the generation process using pre-indexed trigram maps for O(1) transitions.

Installation

You have two options for installing and running the pseudoword generator: downloading a pre-compiled binary (no dependencies required) or building from source.

You do not need to install Go to run this program. Pre-compiled binaries are available for Linux, Windows, and macOS (both Intel and Apple Silicon/ARM).

  1. Go to the Releases page of this repository.
  2. Download the binary that matches your operating system and architecture.
  3. Extract the downloaded archive.
  4. Give the binaries execution rithg and run it from your terminal or command prompt:
     chmod +x pseudoword-generator
    ./pseudoword-generator [options]
    

    (On Windows, use pseudoword-generator.exe)

Option 2: Build from Source

If you prefer to compile the program yourself, you will need to install Go.

  1. Download and install Go from the official Go website.
  2. Clone this repository to your local machine.
  3. You can either compile the binary using the provided script:
    ./build.sh
    ./pseudoword-generator [options]
    

    Or run the code directly without compiling an explicit executable:

    go run pseudoword_generator.go [options]
    

Usage

Command-line Options

The program accepts the following flags:

Examples

Generate 5 pseudowords of length 8 using the default French dictionary:

go run pseudoword_generator.go -n 5 -l 8

Sample Output:

murfaten
réseste
délaine
fillonti
cleulât

Generate 20 short pseudowords (length 6):

go run pseudoword_generator.go -n 20 -l 6

How it Works

  1. Data Loading: The program reads a list of words from the specified text file (one word per line). It filters out words that are too short based on the minimum length (-m) flag.
  2. Model Building: Words are padded with spaces to denote word boundaries. The program extracts trigrams and catalogs them based on their exact starting position in the word.
  3. Generation:
    • It picks a random initial trigram (starting at position 0).
    • It builds the rest of the word letter by letter. For each step, it looks at the last two letters (the “bigram”) and randomly selects a compatible third letter from the pool of trigrams found at that specific position in the model words.
    • If it hits a dead end (no valid trigram continues the sequence), it throws away the current attempt and restarts.
  4. Filtering: It ensures the newly generated string hasn’t been generated already during this run and does not exist in the original model dataset.

Word lists

License

This project is distributed under the terms of the GNU General Public License v3.

Copyright (c) 2026 Christophe Pallier

If you use this software, please cite this repository as:

Pallier, C. (2026). unipseudo-go (Version 1.0.3) [Computer software]. GitHub. https://github.com/chrplr/unipseudo-go

Bibtex entry:

 @software{unipseudo-go2026,
   author = {Pallier, Christophe},
   title = {Unipseudo-go: A pseudoword generator using trigram Markov chains},
   version = {1.0.3},
   date = {2026-04-16},
   url = {https://github.com/chrplr/unipseudo-go},
   publisher = {GitHub},
   abstract = {A high-performance port of the UniPseudo tool for generating pronounceable pseudowords using Markov chains.},
   keywords = {psycholinguistics, pseudowords, markov-chains, go},
   license = {GPL-v3.0}
 }