13. Programming a Lexical decision task

In a lexical decision experiment, a string of characters is flashed at the center of the screen and the participant has to decide if it is a actual word or not, indicating his/her decision by pressing a left or right button. Reaction time is measured from the word onset, providing an estimate of the speed of word recognition.

Let us program such a task.

13.1. Step 1: stimuli in constants

Modify the parity task script to display either a word or a pseudoword at each trial (in a random order).

For testing purposes, you can use the following variables:

words = ['bonjour', 'chien', 'président']
pseudos = ['lopadol', 'mirance', 'clapour' ]

A solution is proposed in lexdec_v1.py

13.2. Step 2: read stimuli from a csv file

Then modify the lexical decision script to read the stimuli from a comma-separated text file (stimuli.csv) with two columns. Here is the content of stimuli.csv:


(hint: To read a csv file, one can use pandas.read_csv())

A solution is proposed in lexdec_v2.py

Note: You can use a file comparator, e.g. meld, to compare the two versions:

meld lexdec_v1.py lexcdec_v2.py

13.3. Select words in a lexical dabatase

  1. Go to http://www.lexique.org

    Click on “Recherche en Ligne” and play with the interface:

  2. how many words of grammatical category (cgram) ‘NOM’, and of length 5 (nblettres), of lexical frequency (freqfilms2) comprised between 10 and 100 per millions are there in this database? (answer=367). Save these words (i.e. the content of the field Words) into a words.csv file (you may have to clean manually, ie. remove unwanted columns, using Excel or Libroffice Calc).

13.4. Automatising database searches with R and Python

To select words, rather than using the interface at http://www.lexique.org, one can write scripts in R or Python. This promotes reproducible science.

  1. Open https://github.com/chrplr/openlexicon/tree/master/documents/Interroger-Lexique-avec-R and follow the instructions in the document interroger-lexique-avec-R.pdf

  2. Read https://github.com/chrplr/openlexicon/tree/master/scripts

To select 100 five letters long nouns for our lexical decision, execute:

import pandas
lex = pandas.read_csv("http://www.lexique.org/databases/Lexique382/Lexique382.tsv", sep='\t')
subset = lex.loc[(lex.nblettres == 5) & (lex.cgram == "NOM") & (lex.freqfilms2 > 10) & (lex.nombre == 's')]
samp = subset.sample(100)
samp2 = samp.rename(columns = {'ortho':'item'})
samp2.item.to_csv('words.csv', index=False)

This creates words.csv.

13.5. Generate nonwords

  1. Write a function that returns a nonword (a string containing random characters)

    def pseudo(length):
        """ returns a nonword of length `length` """

    Solution at create_nonwords.py

  2. Use this function to create a list of 100 nonwords and save it in a file "pseudowords.csv" (one pseudoword per line) (see https://www.pythontutorial.net/python-basics/python-write-text-file/)

13.6. Create a stimuli file

Merge words.csv and pseudowords.csv into a single stimuli2.csv file:

import pandas
w = pandas.read_csv('words.csv')
w['category'] = 'W'
p = pandas.read_csv('pseudowords.csv')
p['category'] = 'P'
allstims = pandas.concat([w, p])
allstims.to_csv('stimuli2.csv', index=False)

13.7. Use sys.argv to pass the name of the file containing the list of stimuli

Modify lexdec_v2.py to be able to pass the name of the stimuli file as an argument on the command line:

python lexdec_v3.py stimuli2.csv

(hint: use sys.argv[]; see https://www.geeksforgeeks.org/how-to-use-sys-argv-in-python/)

Solution at lexdec_v3.py

13.8. Improving the pseudowords

  1. Check out the pseudoword generator UniPseudo

  2. Generate a new list of pseudowords and add them to a new stimuli3.csv file

13.9. Data analysis

After running:

python lexdec_v3.py stimuli2.csv

the subject’s responses are stored in the subfolder data/ contains a file lexdec...xpd

You can download this xpd file as an example.

  1. Use pandas.read_csv(..., comment='#') to read the responses into a pandas dataframe.

  2. Compute the average reaction times for words and for pseudo-words.

  3. Plot the distribution of reactions times using seaborn.boxplot()

  4. Use scipy.stats.ttest_ind() to perform a Student t-test compairn gthe RTs of Words and Non-Words.

Check a solution analyze_RT.py

13.10. Finally

Check out the example of a ‘real’ lexical decision experiment at https://chrplr.github.io/PCBS-LexicalDecision/)