Friday, March 12, 2010

Selecting array elements in R and numpy

I was a bit confused about the behavior of vectorize in numpy (last post), so I posted a question to SO. Of course, they knew the answer immediately, as I described.

But there is one more thing that came up, and that has to do with the selection of elements from an array. I wasn't aware that you can do indexing by a test that returns a boolean vector in numpy, but you can. In R, it's common to see something like this. Here is a matrix m. We can leave a row out by "-":

> m = 1:9
> dim(m) = c(3,3)
> m = t(m)
> m
[,1] [,2] [,3]
[1,] 1 2 3
[1,] 4 5 6
[2,] 7 8 9
> m[-1,]
[,1] [,2] [,3]
[1,] 4 5 6
[2,] 7 8 9

The test m[,1] > 2 asks for all rows in which the first column is greater than 2:

> m[,1]
[1] 1 4 7
> sel = m[,1] > 2
> sel
[1] FALSE TRUE TRUE

Note: I screwed this up in the original example. We use the selector to get those rows:

> m[sel,]
[,1] [,2] [,3]
[1,] 4 5 6
[2,] 7 8 9

How would you do this in Python with numpy?

import numpy as np
A = np.arange(1,10)
A.shape = (3,3)
print A


[[1 2 3]
[4 5 6]
[7 8 9]]


B = A[A[:,0]>2]
print B


[[4 5 6]
[7 8 9]]


sel = A[:,0]>2
print sel


[False  True  True]


print A[sel,:]


[[4 5 6]
[7 8 9]]

It works!