Object Oriented Programming in Python

Think of objects as namespace dictionaries: keys are strings, values are functions or integers or other objects, anything that Python can deal with.

Objects look like this:

class foo:
  # Constructor
  def __init__(self, value=3):
    self.value = value
    self.count = 3
  def update(self, newvalue):
    self.value = newvalue
    self.count += 1
  def num_updates(self):
    return self.count
  def get_value(self):
    return self.value

foo1 = foo()
foo2 = foo(5)
foo1.update(7)
foo1.update(11)
print "foo1: %d %d" % (foo1.get_value(), foo1.num_updates())
print "foo2: %d %d" % (foo2.get_value(), foo2.num_updates())

Notice the default argument for __init__. What will this code snippet print?

Everything is public, but there are strong conventions for using methods whenever possible. For example, in our last example, we could write:

foo1.value = 13

but this would be bad. It wouldn't have the same side effects that foo.update would. You can signal to other programmers that something is a really bad idea for them to peek or poke (a private implementation detail, for example) by prefacing its name with an underscore:

class Payment:
  def __init__(self, credit_card_number):
    self._ccn = credit_card_number

You can turn the functional syntax inside out to see how Python actually behaves behind the scenes:

item = foo()
foo.update(  # Call the class method directly
  item,      # on a class instance -- this becomes "self"
  5)         # and an integer -- this becomes "value"

which is exactly equivalent to:

item = foo()
item.update( # Call the attribute of "item" called "update" (which is a
             # reference to foo.update) and automatically pass "item" as the
             # first argument...
  5)         # on a value, which becomes the second argument.

Class methods need "self". Not mandatory to call it "self", but it's strongly advised, so other people will understand your code.

What happens if you forget "self"?

Inheritance

Suppose we were librarians and wanted to define a class hierarchy to keep our books in. We might start with something like this:

class Book:
  def __init__(self, title, author):
    self.title = title
    self.author = author
  def info(self):
    return "%s by %s" % (self.title, self.author)

Now we realize we want a special kind of book to distinguish illustrated children's books from regular books. We can derive a subclass, like this:

class ChildrensBook(Book):
  def __init__(self, title, author, illustrator):
    Book.__init__(self, title, author)
    self.illustrator = illustrator
  def info(self):
    bookinfo = Book.info(self)
    return "%s, illustrated by %s" % (bookinfo, self.illustrator)

What happens when we do this? Think about the call stack.

book = ChildrensBook("Foo", "Author", "Illustrator")
print book.info()

An interesting trick you can do is store a method reference and call it later:

bi = book.info
print bi()
# 'Foo by Author, illustrated by Illustrator'
book = None
print bi()
# 'Foo by Author, illustrated by Illustrator'

Here we've thrown away the reference to the ChildrensBook object itself, but since we still have this reference to one of its methods, the method call hangs onto the object for us. (This probably isn't something you need to know, but thinking about how you would implement this kind of feature could give you some insight into how programming languages are implemented.)

Operator overloading uses special names in Python: __add__, __mul__, and so forth. Here's a fairly complete Fraction class which uses this:

class Fraction:
	def __init__(self, num, denom):
		self.n, self.d = int(num), int(denom)
	def __add__(self, other):
		if isinstance(other, Fraction):
			return Fraction(self.n * other.d + other.n * self.d, self.d * other.d).reduced()
		if isinstance(other, int):
			return Fraction(self.n + self.d * other, self.d).reduced()
		raise TypeError("Cannot add to that type")
	__radd__ = __add__
	def __sub__(self, other):
		if isinstance(other, Fraction):
			return Fraction(self.n * other.d - other.n * self.d, self.d * other.d).reduced()
		if isinstance(other, int):
			return Fraction(self.n - self.d * other, self.d).reduced()
		raise TypeError("Cannot subtract from that type")
	def __mul__(self, other):
		if isinstance(other, Fraction):
			return Fraction(self.n * other.n, self.d * other.d).reduced()
		if isinstance(other, int):
			return Fraction(self.n * other, self.d).reduced()
		raise TypeError("Cannot multiply by that type")
	def __div__(self, other):
		if isinstance(other, Fraction):
			return Fraction(self.n * other.d, self.d * other.n).reduced()
		return NotImplemented
	def __rdiv__(self, other):
		if isinstance(other, int):
			return Fraction(self.d * other, self.n).reduced()
		return NotImplemented
	def __cmp__(self, other):
		if isinstance(other, Fraction):
			return (self.n * other.d).__cmp__(self.d * other.n)
		if isinstance(other, int):
			return other.__cmp__(1 + (self.n - 1) / self.d)
		raise TypeError("Cannot compare Fraction to that type")
	def ipart(self):
		return self.n / self.d
	def reduced(self):
		n, d = self.n, self.d
		f = gcd(n, d)
		self.n, self.d = n // f, d // f
		return self
	def reduce(self):
		self = self.reduced()
	def __str__(self):
		return "Fraction(%d / %d) = %s" % 
		        (self.n, self.d, 
		         str(decimal.Decimal(self.n) / 
		             decimal.Decimal(self.d)))

Regular Expressions in Python

There's a grammar to regular expressions, and different characters mean different things in different places. Keep this in mind.

Python's regular expressions (regex after this) are kept in the 're' module:

import re

Python's string handling can get in your way a bit when defining a regex pattern. For example, the backslash character '\' acts as a quoting character both in regex and in strings. So suppose you wanted a regex which looks for a single backslash. Because \ is a special character in strings, you have to escape it:

print '\\'
\

So let's try to match this pattern against something:

re.search('\\', 'word\\anotherword')
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.6/re.py", line 142, in search
    return _compile(pattern, flags).search(string)
  File "/usr/lib/python2.6/re.py", line 245, in _compile
    raise error, v # invalid expression
sre_constants.error: bogus escape (end of line)

What happened? re thinks a backslash is an escape character, too, so we passed it an escape character with nothing after it, which is bogus. We'd need to escape the backslashes twice to avoid this:

re.search('(\\\\)', 'word\\anotherword').groups()
# ('\\',)

So to prevent us from writing a zillion backslashes, we can tell Python to use 'raw strings', which don't treat \ like a special character. This happens when you put the letter r in front of a string constant:

re.search(r'(\\)', 'word\\anotherword').groups()
#    here ^
# ('\\',)

This makes regex a lot easier on the eyes, don't forget it.

Regex has lots of metacharacters, which prove very useful.

. Dot means any character (except a newline, unless you specify otherwise; see the documentation).

^ Caret means the start of a string.

$ Dollar means the end of a string.

* Star means repeat the previous thing zero or more times. For example, r'Go*gle' matches 'Ggle', 'Gogle', 'Google', and 'Goooooooooooogle'.

+ Plus means repeat the previous thing one or more times. For example, r'Go+gle' matches 'Gogle', 'Google', and 'Goooooooooooogle', but not 'Ggle'.

() Parens take multiple things and builds a larger unit out of them. For example, r'ab*' matches 'abbbbbbb' but not 'ababab', but r'(ab)*' matches 'ababab' but not 'abbbbbb' or 'aaaaab'.

{m} Curly braces with one number in them requires exactly that many copies of the previous thing. For example, r'Go{2}gle' matches only 'Google'.

[] Square brackets build character classes; anything in a character class matches. For example, if you wanted to match the words "pool", "poel", "peol", and "peel", you could write r'p[oe]{2}l'. (Why does this match 'peol' and 'poel'?) You can also invert character classes by putting '^' at the beginning, so r'[^aeiou]' matches any character but a vowel. There are some predefined character classes, like '\s' (whitespace, including tabs and spaces) and \d (digits, 0 through 9).

What does this regex match?

r'^\s*[^#].*$'

Let's walk through it.

First, ^ means the start of a string.
'\s' means any whitespace character, and * means there can be zero or more of them.
[ means we're starting a character class definition. The ^ means that it's inverted - it will match anything but the characters specified. Then # is one of the characters, and ] is the end of the character class definition. So this means "match any character except a #".
dot means any character, and star means any number of them.
$ means the end of the line.

So the regex is "beginning of the line, then any amount of whitespace, then a non-# character, then any number of any character, then the end of the line."

Can someone give a more concise description of what this does?

body>

.	Dot means any character (except a newline, unless you specify otherwise; see the documentation).
^	Caret means the start of a string.
$	Dollar means the end of a string.
*	Star means repeat the previous thing zero or more times. For example, r'Go*gle' matches 'Ggle', 'Gogle', 'Google', and 'Goooooooooooogle'.
+	Plus means repeat the previous thing one or more times. For example, r'Go+gle' matches 'Gogle', 'Google', and 'Goooooooooooogle', but not 'Ggle'.
()	Parens take multiple things and builds a larger unit out of them. For example, r'ab' matches 'abbbbbbb' but not 'ababab', but r'(ab)' matches 'ababab' but not 'abbbbbb' or 'aaaaab'.
{m}	Curly braces with one number in them requires exactly that many copies of the previous thing. For example, r'Go{2}gle' matches only 'Google'.
[]	Square brackets build character classes; anything in a character class matches. For example, if you wanted to match the words "pool", "poel", "peol", and "peel", you could write r'p[oe]{2}l'. (Why does this match 'peol' and 'poel'?) You can also invert character classes by putting '^' at the beginning, so r'[^aeiou]' matches any character but a vowel. There are some predefined character classes, like '\s' (whitespace, including tabs and spaces) and \d (digits, 0 through 9).