Thursday, May 10, 2012

Basic linear algebra with Numpy on Python 3.2

I'm taking Stanford's online course on machine learning through Coursera. The second week has a good overview of linear algebra and matrix operations. The instructor has provided a useful PowerPoint deck in which he explains the basics. I'm going to go through this pdf and implement the linear algebra using NumPy. I have NumPy 1.6 installed with Python 3.2.
Ml SlideMachine Learning: Linear Algebra Reviews Lecture3
First, open an interactive Python shell and load NumPy:
>>> from numpy import *

Create a Matrix

Let's try just creating the 4x2 matrix he shows in slides 2 and 3.The basic form for creating arrays is to use the array method with parenthesis:
a = array()
Inside the parenthesis, nest some square brackets, and in those brackets just put comma-separated lists of elements in more square brackets. Each square bracket represents a row. Make sure they all have the same number of columns. So to create the 4x2 matrix from slides 2 and 3 use the following
>>> a = array([[1402,191],[1371,821],[949,1437]])
>>> print(a)
[[1402  191]
 [1371  821]
 [ 949 1437]] 

Matrix Elements

Next he talks about accessing the matrix element-wise, so a11 should access '1402.'
>>> a[1,1]
821
Instead of returning the first row of the first column, it gave us the second row of the second column. This is because NumPy is using 0-indexing (start at 0) instead of 1-indexing (start at 1). So to get the first row of the first column we index from 0:
>>> a[0,0]
1402

Matrix Addition

Next let's create two 3x2 matrices and add them together. First, instantiate the matrices:
>>> b = array([[1,0],[2,5],[3,1]])
>>> c = array([[4,0.5],[2,5],[0,1]])
>>> print(b)
[[1 0]
 [2 5]
 [3 1]]
>>> print(c)
[[ 4.   0.5]
 [ 2.   5. ]
 [ 0.   1. ]]
Add b and c and see what happens:
>>> b+c
array([[  5. ,   0.5],
       [  4. ,  10. ],
       [  3. ,   2. ]])
As you can see, NumPy correctly performed an element-wise addition.

Scalar Multiplication

Next, multiply a scalar by a 3x2 matrix. We'll use matrix 'b' from above:

>>> 3 * b
array([[ 3,  0],
       [ 6, 15],
       [ 9,  3]], dtype=int32)
Again, NumPy correctly multiplied each element of the matrix by 3. Note that this can be done in any order (i.e. scalar * matrix = matrix * scalar).
Division is just multiplication by a fraction:
>>> d / 4
array([[ 1.  ,  0.  ],
       [ 1.5 ,  0.75]])

Combination of Operands

Order of operations is important. In this slide the instructor sets up three vectors (3x1) and provides an example in which he multiples by a scalar then adds then subtracts then divides. Let NumPy do it and see what happens:
>>> e = array([[1],[4],[2]])
>>> f = array([[0],[0],[5]])
>>> g = array([[3],[0],[2]])
>>> 3 * e + f - g / 3
array([[  2.        ],
       [ 12.        ],
       [ 10.33333333]])
As before, NumPy produces the same answer as the instructor found by doing it by hand.

Matrix-vector multiplication

We can multiply a matrix by a vector as long as the number of columns of the matrix is the same as the number of rows of the vector. In other words, the matrix must be as wide as the vector is long.
>>> h = array([[1,3],[4,0],[2,1]]) # 3x2
>>> i = array([[1],[5]]) # 2x1
>>> h * i
Traceback (most recent call last):
  File "<pyshell#54>", line 1, in <module>
    h * i
ValueError: operands could not be broadcast together with shapes (3,2) (2,1)
My multiplication operation didn't work, and it helpfully gave me the shapes of the arrays for which multiplication failed. This is because the multiplication operator '*' causes element-wise multiplication. For that to work, the matrices need to be of the same shape (hence the error message shosed us the shapes were different). What we want is the dot product.

The Dot Product

To multiply two matrices using the dot product, use the dot() method:
>>> h = array([[1,3],[4,0],[2,1]]) # 3x2
>>> i = array([[1],[5]]) # 2x1
>>> dot(h,i)
array([[16],
       [ 4],
       [ 7]])
That gives us the correct answer, according to the slides. A longer example:
>>> j = array([[1,2,1,5],[0,3,0,4],[-1,-2,0,0]])
>>> k = array([[1],[3],[2],[1]])
>>> dot(j,k)
array([[14],
       [13],
       [-7]])
This one checks out with the slides as well.

Matrix-Matrix Multiplication

As far as NumPy is concerned, matrix-matrix multiplication is just like matrix-vector multiplication.
>>> l = array([[1,3,2],[4,0,1]])
>>> m = array([[1,3],[0,1],[5,2]])
>>> dot(l,m)
array([[11, 10],
       [ 9, 14]])
He goes on to give one more example with a pair of 2x2 matrices:
>>> n = array([[1,3],[2,5]]) # 2x2
>>> o = array([[0,1],[3,2]]) # 2x2
>>> dot(n,o)
array([[ 9,  7],
       [15, 12]])
Next he provides some context by using the example of home prices. He sets up a 4x2 and 2x3 matrix and multiplies them to quickly come up with price predictions:
>>> p = array([[1,2104],[1,1416],[1,1534],[1,852]]) # 4x2
>>> q = array([[-40,200,-150],[0.25,0.1,0.4]]) # 2x3
>>> dot(p,q)
array([[ 486. ,  410.4,  691.6],
       [ 314. ,  341.6,  416.4],
       [ 343.5,  353.4,  463.6],
       [ 173. ,  285.2,  190.8]])

Matrix Multiplication Properties

Show that matrix multiplication is not commutative:
>>> A = array([[1,1],[0,0]]) # 2x2
>>> B = array([[0,0],[2,0]]) # 2x2
>>> dot(A,B)
array([[2, 0],
       [0, 0]])
>>> dot(B,A)
array([[0, 0],
       [2, 2]])
To test this another way I asserted equality between the two operations and found a neat element-wise comparison:
>>> dot(A,B) == dot(B,A)
array([[False,  True],
       [False, False]], dtype=bool)
To show the associative property of arrays, create another 2x2 array C and multiply them in different groupings, but in the same order, to show that the result is always the same:
>>> C = array([[1,3],[0,2]]) # 2x2
>>> A = array([[1,1],[0,0]]) # 2x2
>>> B = array([[0,0],[2,0]]) # 2x2
>>> C = array([[1,3],[0,2]]) # 2x2
>>> dot(A,dot(B,C))
array([[2, 6],
       [0, 0]])
>>> dot(dot(A,B),C)
array([[2, 6],
       [0, 0]])

Identity Matrix

NumPy comes with a built-in function for producing an identity matrix. Just pass it the dimension (numnber of rows or columns) as the argument. Optionally tell it to output elements as integers in order to clean up the output:
>>> identity(3)
array([[ 1.,  0.,  0.],
       [ 0.,  1.,  0.],
       [ 0.,  0.,  1.]])
>>> identity(3, dtype=int)
array([[1, 0, 0],
       [0, 1, 0],
       [0, 0, 1]])
Show that for any matrix A, AI=IA=A. We can use the same A from before:
>>> A = array([[4,2,1],[4,8,3],[1,1,0]]) # 3x3
>>> I = identity(3, dtype=int)
>>> dot(A,I)
array([[4, 2, 1],
       [4, 8, 3],
       [1, 1, 0]])
>>> A
array([[4, 2, 1],
       [4, 8, 3],
       [1, 1, 0]])
>>> dot(A,I)==dot(I,A)
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)
>>> dot(A,I) == A
array([[ True,  True,  True],
       [ True,  True,  True],
       [ True,  True,  True]], dtype=bool)

Inverse and Transpose

Show that if A is an mxm matrix, and if it has an inverse, then A(A-1) = (A-1)A = I. To do this, we can use the same 3x3 matrix A from above:
>>> A = array([[4,2,1],[4,8,3],[1,1,0]]) # 3x3
>>> inv(A)
Traceback (most recent call last):
  File "<pyshell#112>", line 1, in <module>
    inv(A)
NameError: name 'inv' is not defined
We got this error because though we loaded NumPy, we need to also load the special linear algebra library:
>>> from numpy.linalg import *
Then we can try the inversion again:
>>> inv(A)
array([[ 0.3, -0.1,  0.2],
       [-0.3,  0.1,  0.8],
       [ 0.4,  0.2, -2.4]])
Now show that that A(A-1) = (A-1)A = I:
>>> dot(A, inv(A))
array([[  1.00000000e+00,  -2.77555756e-17,   0.00000000e+00],
       [ -2.22044605e-16,   1.00000000e+00,   0.00000000e+00],
       [ -5.55111512e-17,  -1.38777878e-17,   1.00000000e+00]])
>>> dot(inv(A), A)
array([[  1.00000000e+00,   0.00000000e+00,  -5.55111512e-17],
       [ -2.22044605e-16,   1.00000000e+00,  -5.55111512e-17],
       [  0.00000000e+00,   0.00000000e+00,   1.00000000e+00]])
Note that this was meant to return the identity matrix, but that the float operations returned not zeros but numbers very close to zero. Note also that a matrix of all zeros has no inverse:
>>> C = array([[0,0],[0,0]])
>>> C
array([[0, 0],
       [0, 0]])
>>> inv(C)
Traceback (most recent call last):
  File "<pyshell#121>", line 1, in <module>
    inv(C)
  File "C:\Python32\lib\site-packages\numpy\linalg\linalg.py", line 445, in inv
    return wrap(solve(a, identity(a.shape[0], dtype=a.dtype)))
  File "C:\Python32\lib\site-packages\numpy\linalg\linalg.py", line 328, in solve
    raise LinAlgError('Singular matrix')
numpy.linalg.linalg.LinAlgError: Singular matrix
The error message agrees with the machine learning instructor, who calls these special matrices "singular" or "degenerate."

Matrix Transpose

Lastly, show some matrix transposition, whereby the rows are flipped element-wise:

>>> A = array([[1,2,0],[3,5,9]])
>>> A.transpose()
array([[1, 3],
       [2, 5],
       [0, 9]])
If we name the transposed array 'B', then we can show that the ijth element of A is the jith element of B:
>>> B = A.transpose()
>>> A[0,2] == B[2,0]
True
>>> A[1,2] == B[2,1]
True 

Conclusion

We used NumPy and NumPy's library linalg to go through the linear algebra review slides from Coursera's Machine Learning course.





Wednesday, May 2, 2012

Installing NumPy for Python 3 in Windows 7

It's time to do some scientific computing, which, in the Python world, means using NumPy. I'm in a Windows 7 (64-bit) environment running Python 3.2.3 (64-bit).

Getting NumPy installed for Python 2 or Python 3 in Ubuntu was easy. Getting it to work in Windows turned out to be more tricky.

The Short Takeaway

  • In a Windows 7 environment (even a 64-bit Windows 7 environment), you must install the 32-bit version of Python 3. The 64-bit version will not work with NumPy 1.6. 
  • Furthermore, the 32-bit version of Python 3 must be installed 'just for me', and not 'for everyone on this computer'. 
  • Finally, make sure you select the proper NumPy version (for Python 3.2), not the default version from SourceForge (which is for Python 2.6).
In this post I'm assuming you have already installed Python 3 and that you're running Windows 7. Specifically, I'm running Windows 7 Professional, 64-bit, Service Pack 1. What follows is the whole story of the troubleshooting, in case it helps out anyone else having the same issues.

Step 1: Ensure that NumPy Isn't Already Installed

Open a Python prompt and ensure that you don't already have NumPy installed:

>>> from numpy import *
Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    from numpy import *
ImportError: No module named numpy

Step 2: Download NumPy from SourceForge

The SourceForge homepage for NumPy can be found at http://sourceforge.net/projects/numpy/. From there find the latest version of NumPy for Windows, and download it to your default download location. For me as write this, that means downloading NumPy 1.6.1. Note that the default SourceForge download link pointed to a version of NumPy compiled for Python 2.6. I had to navigate a bit deeper to find a version for Python 3.

Step 3: Open the installation file

Double-click the file from wherever you downloaded it to to start the installation wizard. It's trustworthy open-source software, so it's safe to click through all the prompts and allow it to be installed.


This leads us to a crucial error:

It reads: Python version 3.2 required, which was not found in the registry. Yikes. I definitely have Python 3.2, but it's telling me it looked in the Windows registry and couldn't find it. I heard somewhere that installing Python for 'just this user (me)' instead of 'for all users of this computer' is one way to get around this.

Step 4: Reinstall Python 'just for me'

So to comply, I uninstall Python, and reinstall it, being careful to install it 'just for me' as shown below.


Step 5: Try installing NumPy again

With Python 3.2.3(64-bit) reinstalled properly, I try the NumPy installer again. This time it finds Python in the registry (presumably), and installs NumPy 1.6 without issue. Now test it out.

Step 6: Test out NumPy

To make sure it installed correctly, go into the Python interpreter and try importing NumPy:
from numpy import *
This returns the following mess:
Traceback (most recent call last):
  File "<pyshell#0>", line 1, in <module>
    from numpy import *
  File "C:\Python32\lib\site-packages\numpy\__init__.py", line 137, in <module>
    from . import add_newdocs
  File "C:\Python32\lib\site-packages\numpy\add_newdocs.py", line 9, in <module>
    from numpy.lib import add_newdoc
  File "C:\Python32\lib\site-packages\numpy\lib\__init__.py", line 4, in <module>
    from .type_check import *
  File "C:\Python32\lib\site-packages\numpy\lib\type_check.py", line 8, in <module>
    import numpy.core.numeric as _nx
  File "C:\Python32\lib\site-packages\numpy\core\__init__.py", line 5, in <module>
    from . import multiarray
ImportError: DLL load failed: %1 is not a valid Win32 application

The last line is telling: "not a valid Win32 application." Some of the online forums seem to suggest that it could be a problem with 64-bit Python. Here's the exact version I'm running: "Python 3.2.3 (default, Apr 11 2012, 07:12:16) [MSC v.1500 64 bit (AMD64)] on win32".

 Step 7: Uninstall 64-bit Python, install 32-bit Python

So I'm going to uninstall NumPy and uninstall this 64-bit version of Python 3.2.3, and in its place install a 32-bit version of Python 3.2.3. Again, be careful to install 'just for me.' When this is done, I have this version installed: "Python 3.2.3 (default, Apr 11 2012, 07:15:24) [MSC v.1500 32 bit (Intel)] on win32". Now try NumPy again.

Step 8: Try installing NumPy again

Using the same NumPy binary as every time before, re-install it. To re-cap, I'm installing this in the context of a 32-bit Python 3.2.3 installation. I get no errors from the installation of NumPy, so it's time to test it.

Step 9: Test NumPy

One more time, in the Python interpreter, try importing NumPy:
>>> from numpy import *
This time it returns nothing, meaning it worked! Try creating a NumPy array, and see if it returns the proper type:
type(array([1,6,3,7]))
<class 'numpy.ndarray'>
That worked too, which means our task of installing NumPy for Python 3 in Windows 7 has been completed.

Conclusion

Here are the condensed steps for getting NumPy to work with Python 3 in Windows 7
  1. Regardless of whether you have a 32-bit or a 64-bit operating system, install the 32-bit version of Python 3.2
  2. Make sure you have installed Python 3.2 'just for me', and not 'for all users of this computer'.
  3. Make sure you download the correct version of NumPy from SourceForge, not the default that it offers as the latest version (which is for Python 2.6 instead of Python 3)

Installing NumPy 1.6 for Python 3 in Ubuntu 12.04

I have a fresh install of Ubuntu 12.04. I want to use NumPy with Python 3. Note that Ubuntu 12.04 ships with Python 2.7 under the 'python' namespace and also ships with Python 3 under the 'python3' namespace.

Go to terminal, check python version

$ python --version
Python 2.7.3
$ python3 --version
Python 3.2.3
We want to use Python 3, so for the rest of this tutorial, make sure you're using python3 and not just python.

Go into the Python 3 interpreter:

$ python3
Python 3.2.3 (default, Apr 12 2012, 21:55:50) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> 

See if you have NumPy already installed:

>>> from numpy import *
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named numpy
Nope, it's not there, so we need to install. Exit the Python interpreter.
>>> exit()

Install NumPy using apt-get

Instead of browsing web forums and finding the right package to download and compile, just let Ubuntu's built-in package manager, apt-get, do the work. This is just like installing NumPy for the standard python installation, but because we need NumPy for the Python3 installation, we'll affix the '3' where needed:
$ sudo apt-get install python3-numpy python3-scipy
Let it have all the dependencies it wants (just type 'yes' when prompted).

See if it worked

To see if it worked, go back into the Python interpreter, import NumPy, create a NumPy array, and make sure it's a NumPy array:
>>> from numpy import *
>>> type(array([1,2,4,5]))
<type 'numpy.ndarray'>

Yup it worked.

Conclusion

In Ubuntu 12.04, just use apt-get to install NumPy for python 3 using
$ sudo apt-get install python3-numpy python3-scipy
Everything just works.

Installing NumPy 1.6 on Python 2.7 in Ubuntu 12.04

I have a fresh install of Ubuntu 12.04. I want to use NumPy, and I'm okay with using the default Python version that ships with Ubuntu 12.04, which is Python 2.7. Note that Ubuntu 12.04 also ships with Python 3 under the 'python3' namespace. 

Go to terminal, check python version

$ python --version
Python 2.7.3

Go into the Python interpreter:

$ python
Python 2.7.3 (default, Apr 20 2012, 22:44:07) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.

See if you have NumPy already installed:

>>> import numpy
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: No module named numpy
Nope, it's not there, so we need to install. Exit the Python interpreter.
>>> exit()

Install NumPy using apt-get

Instead of browsing web forums and finding the right package to download and compile, just let Ubuntu's built-in package manager, apt-get, do the work:
$ sudo apt-get install python-numpy python-scipy
Let it have all the dependencies it wants (just type 'yes' when prompted).

See if it worked

To see if it worked, go back into the Python interpreter, import NumPy, create a NumPy array, and make sure it's a NumPy array:
$ python
>>> from numpy import *
>>> type(array([1,2,4,5]))
<type 'numpy.ndarray'>

Yup it worked.

Conclusion

In Ubuntu 12.04, just use apt-get to install NumPy using
$ sudo apt-get install python-numpy python-scipy
Everything just works.