Lexical Decision Experiment

Home Page: http://chrplr.github.io/PCBS-LexicalDecision
Github repository: https://github.com/chrplr/PCBS-LexicalDecision

The aim of this project was to create a psycholinguistics experiment implementing a lexical decision task in the visual modality.

The experiment consists in a succession of trials in which a written stimulus is displayed on the screen and the participant must indicate, by pressing one of two response keys, if this stimulus is a word or not. The response time is recorded.

In our experiment, the word stimuli are nouns and verbs of varying lexical frequencies (frequencies of occurrence in the language) to allow us to assess the influences of these two factors (Category: Noun vs. Verb; Frequency: High vs. Low) on the speed of word recognition.

Table of Contents

Preparation of the stimuli
- Words
- Pseudowords
Creating the list of trials
Running the experiment
Analyzing the results

Preparation of the stimuli

Words

To get lexical frequency information, we used the Lexique database. More precisely, we downloaded the table Lexique383.tsv available at http://www.lexique.org/databases/Lexique383/Lexique383.tsv (Under Linux, this can be done with the command line curl -O http://www.lexique.org/databases/Lexique383/Lexique383.tsv)

We then randomly selected four subsets of twenty nouns and verbs, of length comprised between 5 and 8 letters, in two frequency ranges.

The code to randomly pick items verifying some criteria is in the script select-words-from-lexique.py. We ran it as follows:

mkdir -p stimuli
python select-words-from-lexique.py -n 20 --cgram NOM --max-freq 5.0 --min-letters 5     \
                                    --max-letters 8 --database Lexique383.tsv  > stimuli/nomlo.txt
python select-words-from-lexique.py -n 20 --cgram NOM --min-freq 100.0 --min-letters 5   \
                                    --max-letters 8 --database Lexique383.tsv > stimuli/nomhi.txt
python select-words-from-lexique.py -n 20 --cgram VER --max-freq 5.0 --min-letters 5     \
                                    --max-letters 8 --database Lexique383.tsv  > stimuli/verlo.txt
python select-words-from-lexique.py -n 20 --cgram VER --min-freq 100.0 --min-letters 5    \
                                    --max-letters 8 --database Lexique383.tsv > stimuli/verhi.txt

Thus, the words for the experiment are saved in the files nomhi.txt, nomlo.txt, verhi.txt and verlo.txt in the subfolder stimuli/

Here is an example of ouput:

nomhi	nomlo	verhi	verlo
façon	fleuves	avait	amènera
cause	véranda	changer	dégagée
bateau	vacarme	aller	envoyons
équipe	bluff	parlé	sauterai
école	lever	avons	ralenti
bureau	tarés	croyais	frustré
trucs	cobra	veulent	défiguré
lumière	samouraï	trouver	extraire
besoin	mystique	regardez	secoué
début	yacks	était	brouillé
force	contes	savais	tairai
reste	lézard	payer	envisage
travail	boche	faisait	données
homme	raton	rentrer	abrite
faute	piercing	ferais	apaiser
années	frayeur	parlez	bousculé
bonjour	dérision	demander	parleras
table	citation	donnez	cuits
heures	entrées	parles	perdant
hommes	réforme	jouer	rallume

Pseudowords

To create 80 pseudowords, we used the Wuggy pseudoword generator, feeding it with the words generated at the previous step.

We obtained 80 pseudowords, that we saved in the file stimuli/pseudomots.txt

Creating the list of trials

Finally, we wrote create-experimental-list.py to merge the the files nomlo.txt, nomhi.txt, verhi.txt, verlo.txt and pseudomots.txt into a single csv file (resources/trials.csv) describing the trials.

python create-experimental-list.py > resources/trials.csv

This file has three columns named Category’,’Frequency’ and ‘Item’:

head resources/trials.csv
Category,Frequency,Item
NOUN,HIFREQ, façon
NOUN,HIFREQ, cause
NOUN,HIFREQ, bateau
NOUN,HIFREQ, équipe
NOUN,HIFREQ, école
NOUN,HIFREQ, bureau
NOUN,HIFREQ, trucs
NOUN,HIFREQ, lumière
NOUN,HIFREQ, besoin

Running the experiment

To run the experiment on your computer, you must have Python and the modules listed in requirements.txt installed ¹

Once Python and the relevant modules are installed, you can download http://github.com/chrplr/PCBS-LexicalDecision/archive/refs/heads/master.zip and unzip it.

The experiment is ran with the script run-lexical-decision.py:

cd PCBS-LexicalDecision
python run-lexical-decision.py resources/trials.csv

Analyzing the results

After each run of lexical-decision.py, a file (with extension .xpd) is created in the subfolder named data.

It is possible to visualize and analyze the results stored in such file by passing it as argument to the script analyze-lexical-decision-times.py. For example:

python analyze_lexical_decision-times.py data/lexical-decision_01_202103282213.xpd

This generates the following graphics (with our data), showing the reactions times as a function of Category (Noun vs. Verb) and Lexical Frequency (High or low)

Average Reaction times:

And computes the ANOVA of logRT as a function of Category (Noun vs. Verb) and Lexical Frequency (High or low):

                      df    sum_sq   mean_sq         F    PR(>F)
frequency            1.0  0.004681  0.004681  0.295198  0.588498
category             1.0  0.012364  0.012364  0.779794  0.379989
frequency:category   1.0  0.020338  0.020338  1.282675  0.260963
Residual            76.0  1.205025  0.015856       NaN       NaN

1: On our system (Ubuntu 20.04 with Anaconda Python 3), we used:

 conda create --name lexdec python=3.7
 conda activate lexdec
 pip install -r requirements.txt