# 14. Data Analyses

## 14.2. Bootstrap

• Implement the bootstrap to obtain confidence intervals on the means of a sample.

# 15. Lexical Statistics

## 15.1. Zipf law

• The script `Zipf/word_count.py` computes the distribution of frequencies of occurences in a list of words. Use it to compute the distribution of word frequencies in Alice in Wonderland.

Note: To remove the punctuation, you can use the following function:

```import string
def remove_punctuation(text):
punct = string.punctuation + chr(10)
return text.translate(str.maketrans(punct, " " * len(punct)))
```
• Zipf law states that the product rank X frequency is roughly constant. This ‘law’ was discovered by Estoup and popularized by Zipf. See http://en.wikipedia.org/wiki/Zipf%27s_law. Create the Zipf plot for the text of Alice in Wonderland showing, on the y axis, the log of the frequency and on the x axis the word rank (sorting words from the most frequent to the least frequent).

• Display the relationship between word length and word frequencies from the data in `lexical-decision/lexique382-reduced.txt`

• Generate random text (each letter from a-z being equiprobable, and the spacecharacter being 8 times more probable) of 1 million characters. Compute the frequencies of each ‘pseudowords’ and plot the rank/frequency diagram.

• To know more about lexical frequencies:

A solution: `Benford-law/Benford.py`