14. Data Analyses 

14.1. Permutation tests 

Read about the principle of permutation tests
Implement a python script that uses a permutation test to compare two samples.
Check out the solution I propose: permutation_test/permutation_test.py.

14.2. Bootstrap 

Implement the bootstrap to obtain confidence intervals on the means of a sample.

14.3. Basic Data Analysis with R 

See http://www.pallier.org/examples-of-basic-data-analyses-with-r.html#examples-of-basic-data-analyses-with-r

14.4. Comparing means using Easy ANOVA (Analysis of Variance)

See http://www.pallier.org/easy-anova-with-r.html#easy-anova-with-r

Note

Check out David Lakens’s github and, in particular, https://lakens.github.io/statistical_inferences/repository

14.5. Frequency Analysis 

See short-intro-fourier

15. Lexical Statistics 

15.1. Zipf law 

The script Zipf/word_count.py computes the distribution of frequencies of occurences in a list of words. Use it to compute the distribution of word frequencies in Alice in Wonderland.

Note: To remove the punctuation, you can use the following function:
```
import string
def remove_punctuation(text):
   punct = string.punctuation + chr(10)
   return text.translate(str.maketrans(punct, " " * len(punct)))
```
Zipf law states that the product rank X frequency is roughly constant. This ‘law’ was discovered by Estoup and popularized by Zipf. See http://en.wikipedia.org/wiki/Zipf%27s_law. Create the Zipf plot for the text of Alice in Wonderland showing, on the y axis, the log of the frequency and on the x axis the word rank (sorting words from the most frequent to the least frequent).
Display the relationship between word length and word frequencies from the data in lexical-decision/lexique382-reduced.txt
Generate random text (each letter from a-z being equiprobable, and the spacecharacter being 8 times more probable) of 1 million characters. Compute the frequencies of each ‘pseudowords’ and plot the rank/frequency diagram.
To know more about lexical frequencies:
- Read Harald Baayen (2001) Word Frequency Distributions Kluwer Academic Publishers.
- Read Michel, Jean-Baptiste, Yuan Kui Shen, Aviva P. Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, et al. 2010. “Quantitative Analysis of Culture Using Millions of Digitized Books.” Science, December. https://doi.org/10.1126/science.1199644. (use scholar.google.com to find a pdf copy). Check out google ngrams at https://books.google.com/ngrams. (Note that at the bottom of the page, there is a message “Raw data is available for download here”).

15.2. Benford’s law 

Learn about Benford’s law. Write a Python script that displays the distribution of the most significant digit in a set of numbers. Apply it to the variables in Benford-law/countries.xlsx.

A solution: Benford-law/Benford.py

15.3. Neuroimaging 

Check out nilearn and nistats and MNE-python
See stats-and-data-analyses/Example of a single subject-single run fMRI analysis with nistats.ipynb