10. Representations of numbers, text, images 

10.1. Representation of integers 

There are 10 kinds of people: those who count in binary and the others.

Computers represent everything as series of 0 and 1, also known as bits (for “binary digits”).

A number represented in basis ‘b’ by four digits ‘d₃d₂d₁d₀’, has a value of: d₃.b³ + d₂.b² + d₁.b¹ + d₀.b⁰

In binary, there are only two possibilities for the digits: {0, 1}
In decimal, there are 10 possible characters
In hexadecimal, 16 possible characters 0-9, A, B, C, D, E, F:: D8F1 = 14*16^3 + 8*10^2 + 15*16^1 + 1*16^0

Just like a number can be written in base 10, it can be written in base 2 (or in any other base):

= 10  + 2 = 1.(10^1) + 2.(10^0) => '12' in base 10
= 8 + 4 = 2^3 + 2^1 => '1010' in base 2

= 30  + 3 = 3.(10^1) + 3.(10^0) => '33' in base 10
= 32 + 1 = 2^5 + 2^0 => '100001' in base 2

Here are the binary representions of the first integers:

:   0
:   1
:  10
:  11
: 100
: 101
: 110
: 111
...

To learn more about how integer numbers are represented in binary format, you can check out http://csunplugged.org/binary-numbers

Exercise 1: Convert (manually) into decimal the following binary numbers:

101
1000
1011
11111111

…

Answers: 5, 8, 11, 255

10.1.1. From binary to decimal 

Exercise 2: Let us write a function that, given the binary representation of a: number as a string of 0 and 1, returns its value as a integer.

Let us first suppose that we want to convert a string containing exactly 8 binary digits (e.g. ‘01011010’) into decimal. How would you do that?

…

def todec8bits(s):
    """ converts a 8 bits string (binary representation)into a integer """
    return int(s[0])*128 + int(s[1])*64 + int(s[2])*32 + \
           int(s[3])*16 + int(s[4])*8 + int(s[5])*4 + \
           int(s[6])*2 + int(s[7])

todec8bits("00001010")
todec8bits("01010101")

One issue with this code is that it handles only 8 bits strings

todec8bits("0101010")
todec8bits("010101010")

A better version of the fonction would be:

def todec8bits(s):
    """ converts a 8 bits string (binary representation)into a integer """
    assert len(s) == 8  # 's' should be exactly 8 bits long
    return int(s[0])*128 + int(s[1])*64 + int(s[2])*32 + \
           int(s[3])*16 + int(s[4])*8 + int(s[5])*4 + \
           int(s[6])*2 + int(s[7])

Remark: On your computers, integers are represented either as 32 or 64 bits, depending on your processor/operating system.

Why is this is relevant? Suppose you perform an EEG recording with 256 electrodes every milliseconds for one hour. How large is the data?

Beware: in some programming languages, the computer can make mistakes if you add too large numbers!

Here is another solution demonstrating several python features (list comprehensions, zip constructions, increment operator, …):

def todec(s):
    """ converts a 8 bit strings into an integer """
    assert len(s) == 8  # 's' should be exactly 8 bits long
    pow2 = [2 ** n for n in range(7, -1, -1)]
    n = 0
    for b, p in zip(s, pow2):
        n += int(b) * p
    return n

…

Exercise: modify the function above to handle strings of any size as input.

Here is a code that works with strings of unlimited size:

def todec(s):
    """ convert a string of 0 and 1 representing a binary number into an integer """
    n = 0
    for b in s:
        n = n * 2 + int(b)
    return n

for i in ['101', '1000', '1011', '11111111']:
    print(todec(i))

Can you understand how/why it works ?

10.1.2. From decimal to binary 

Now we will go in the other direction: Our aim is to write a program that, given a number (in decimal), computes its binary representation.

Exercise: If you have an idea how to program it, please proceed. Else, I propose that you follow the following steps:

Examine the script below and execute it for various values of the variable num. Note that the sign % stands for the modulo division operation which produces the remainder of an integer division. If x and y are integers, then the expression x % y yields the remainder when x is divided by y.

Do you understand the last line? Do you see a limitation of this program?

num = 143
d3 = int(num/1000) % 10  # thousands
d2 = int(num/100)  % 10  # hundreds
d1 = int(num/10)   % 10  # dec
d0 =  num % 10
print(str(d3) + str(d2) + str(d1) + str(d0))

Adapt the above program to print the binary representation of num

…

num = 17
b0 = num % 2
b1 = int(num/2) % 2
b2 = int(num/4) % 2
b3 = int(num/8) % 2
b4 = int(num/16)  % 2
b5 = int(num/32)  % 2
b6 = int(num/64)  % 2
b7 = int(num/128) % 2
b8 = int(num/256) % 2
print(str(b8) + str(b7) +  str(b6) + str(b5) + str(b4) + str(b3) + str(b2) + str(b1) + str(b0))

…

Modify the above program to print the binary representations of all the integers between 0 and 255.

…

def tobin(num):
    """ Returns the binary represention (strings of bits) of a 0 <= num <= 255 """
    b7 = int(num/128) % 2
    b6 = int(num/64)  % 2
    b5 = int(num/32)  % 2
    b4 = int(num/16)  % 2
    b3 = int(num/8) % 2
    b2 = int(num/4) % 2
    b1 = int(num/2) % 2
    b0 = num % 2
    return (str(b7) +  str(b6) + str(b5) + str(b4) + \
            str(b3) + str(b2) + str(b1) + str(b0))

for n in range(256):
    print(n, ':', tobin(n))

…

(Advanced) Write an improved version that uses a loop and does not have a limitation in size.

…

def binary(n):
    """ returns the binary representation of ``n`` """
    if n == 0:
        return '0'
    s = ''
    while n > 0:
        b = str(n % 2)
        s = b + s
        n = n // 2
    return s

…

Study the following code. Do you understand why it works?

def binary(num):
    """ returns the binary representation of ``num`` """
    if num == 0:
        return '0'
    if num == 1:
        return '1'
    return(binary(int(num /2)) + binary(num % 2))

print(binary(1234))

…

Answer: It is a recursive function which calls itself. See http://en.wikipedia.org/wiki/Recursion_%28computer_science%29

…

Remark: measures of memory size

1 byte = 8 bits
1 Kilobyte (KB) = 1024 bytes
1 Megabyte (MB) = 1024 kbytes = 1048576 bytes
1 Gigabytes (GB) = 1024 Mbytes
Terabyte, Petabyte, Exabyte…

Exercise (advanced): Write a function that return the hexadecimal representation (base 16) of a number.

To go further:

If you want to know how negative integer numbers are represented, see http://en.wikipedia.org/wiki/Two%27s_complement
To understand how real numbers (a.k.a. “floats”) are encoded, read What Every Programmer Should Know About Floating-Point Arithmetic and https://docs.python.org/2/tutorial/floatingpoint.html#tut-fp-issues

10.2. Representation of text 

A text file is nothing but a sequences of characters.

For a long time, characters were encoded using ASCII code.

In Python, you can know the code of a character with the function ord:

print(ord('a'))
print(ord('@'))

The inverse of ord is chr.

lookup the ASCII representation of your first name in the table and use the chr function of Python to print it.

…

For example, if you name is ‘ZOE’, you would type:

print(chr(90)+chr(79)+chr(69))

Remark: ASCII codes use one byte (=8bits) per character. This is fine for English, but cannot cover all the caracters of all alphabets. It cannot even encode french accented letters.

Unicode was invented that associate a unique 2 bytes number to each character of any human script. It is possible to write text files using these number, but more economic to encode the most common letters with one byte, and keep the compatibility with ASCII (UTF-8).

print("".join([chr(c) for c in range(20000, 21000)]))

10.2.1. Strings 

In Python, text can be stored in objects called strings.

String constants are enclosed between single quotes:

'Bonjour le monde!'

Or double quotes:

"Bonjour le monde !"

Or “triple” quotes for multilines strings:

"""
Bonjour le monde!

Longtemps je me suis levé de bonne heure,
Les sanglots longs des violons,
...
"""

They have a type str:

type('bonjour')

To convert an object to a string representation:

str(10)
a = dict((("a",1), ("b",2)))
str(a)

A string is nothing but a sequence of characters:

a = 'bonjour'
print(a[0])
print(a[1])
print(a[2])
print(a[2:4])
print(len(a))

for c in 'bonjour':
    print(c)

Operations on strings:

a = 'bonjour'
b = 'hello'
a + b
a + ' ' + b

A set of functions to manipulate strings is available in the module ‘string’:

str.upper(a)
str.lower('ENS')

10.2.2. search/replace a substring within a string 

a = 'alain marie jean marc'
print(a.find('alain'))
print(a.find('marie'))
print(a.find('ma'))
print(a.find('marc'))
print(a.find('o'))

a.replace('marie','claude')
print(a)

10.2.3. splitting a strings at delimiters 

a = 'alain marie jean marc'
a.split(" ")

Read https://docs.python.org/3/library/stdtypes.html#string-methods to learn about more string functions.

10.2.4. Interactive input from the command line 

name = input('Comment vous appelez-vous ? ')
print("Bonjour " + name + '!')

10.2.5. Reading and writing to text files 

With Atom, create a text file containing a few lines of arbitrary content, an save it under the filename ‘test.txt’
with ipython running in the same directory where you saved test.txt

with open('test.txt', 'r') as f:
    o = f.read()
    print(o)
    lines = o.split("\n")
    print(lines)

10.2.6. Counting lines and words in a text file 

Download Alice in Wonderland

with open('alice.txt') as f:
    o = f.read()
    print(o)
    lines = o.split("\\n")
    print(lines)

Exercise: Write a program that counts the number of lines, and number of: words in alice.txt (we suppose that words are separated by spaces).

…

with open('alice.txt') as f:
       o = f.read()
       print(o)
       lines = o.split("\n")

       nlines = len(lines)

       nw = 0
       for l in lines:
          nw += len(l.split(" "))

       print(nlines)
       print(nw)

Write a program that detects if a text file contains the word ‘NSA’

…

def spot_nsa(filename):
    """ detects if the text file pointed to by filename contains 'NSA' """
    with open(filename) as f:
        o = f.read()
        lines = o.split("\n")
        found = False
        for l in lines:
            if "NSA" in l.split(" "):
                found = True
                break
    return found

10.3. Representation of images 

Images can be stored either:

as bitmaps, that is a two dimensional arrays of dots (formats: bmp, png, gif, jpeg…)
as vectorized formats, the image contain instruction for drawing objects (eps, pdf, svg, …).

Here we are just going to manipulate bitmaps.

Each dot (pixel) is either ‘0’ (black) or ‘1’ (white).

What is the size in kilobytes of a 1024x768pixels black and white image ?

…

Answer: 1024*768/8/1024=96 KB

Execute the following code in ipython:

import numpy as np
import matplotlib.pyplot as plt

a = np.array([[0, 0, 0, 0, 0, 0, 0],
              [0, 0, 1, 1, 1, 0, 0],
              [0, 0, 1, 1, 1, 0, 0],
              [0, 0, 1, 0, 1, 0, 0],
              [0, 0, 1, 1, 1, 0, 0],
              [0, 0, 1, 1, 1, 0, 0],
              [0, 0, 0, 0, 0, 0, 0]])
plt.imshow(a, cmap=plt.cm.gray, interpolation='nearest')
plt.show()

Numpy’s arrays are a new type of object. There are similar to lists, but optimised for mathematical computations. Notably, they can be multidimensional (i.e. you can use a[i,j] notation). You can learn more about arrays in the documents https://scipy-lectures.org/ and https://www.projectpro.io/data-science-in-python-tutorial/numpy-python-tutorial

Here is another example:

…

a = np.zeros((200,200))
for i in range(200):
    a[i, i] = 1
plt.imshow(a, cmap=plt.cm.gray, interpolation='nearest')
plt.show()

a[0:200:2,] = 1
plt.imshow(a, cmap=plt.cm.gray, interpolation='nearest')
plt.show()

10.3.1. Grey level pictures 

Each dot is now associated to an integer value, e.g. ranging from 0 to 255 for 8-bits codes, coding for a grey level (smaller=darker). Each dot needs one byte.

How large is the file for an image 1024x768 pixels with 256 grey levels?

The following code displays an image:

from skimage import data
from skimage.color import rgb2gray

original = data.astronaut()
grayscale = rgb2gray(original)
plt.imshow(grayscale,  cmap=plt.cm.gray)
plt.show()

This code runs a low pass (averaging) filter on it:

import scipy.ndimage
bl = scipy.ndimage.gaussian_filter(grayscale, 3)
plt.imshow(bl,  cmap=plt.cm.gray)
plt.show()

Edge detector It is easy to implement an edge detector with a neural network. See https://courses.cit.cornell.edu/bionb2220/UnderstandingLateralInhibition.html.

Using the ndimage.convolve function, apply the following filters to the image and diplay the results.

from skimage import data
from skimage.color import rgb2gray

original = data.astronaut()
grayscale = rgb2gray(original)

kernel1 = np.array([[-1, -1, -1],
                    [-1,  8, -1],
                    [-1, -1, -1]])

bl = scipy.ndimage.convolve(grayscale, kernel1)
plt.imshow(bl,  cmap=plt.cm.gray)
plt.show()

kernel2 = np.array([[-1, -1, -1, -1, -1],
                    [-1,  1,  2,  1, -1],
                    [-1,  2,  4,  2, -1],
                    [-1,  1,  2,  1, -1],
                    [-1, -1, -1, -1, -1]])
bl=scipy.ndimage.convolve(grayscale, kernel2)
plt.imshow(bl,  cmap=plt.cm.gray)
plt.show()

More manipulations are available at http://scipy-lectures.github.io/advanced/image_processing/.

10.3.2. Colored bitmaps 

Each dot is now associated to three bytes, representing the Red, Gree and Blue intensities (see http://www.colorpicker.com/).

How large is the file for a 1024x768 RGB image?

Exercice: What are the RGB triplets for BLACK, WHITE, RED, YELLOW?

from skimage import data
plt.imshow(data.astronaut())
plt.show()