Sue Evans & David Walser
Press space bar for next page
Whenever we describe a data type, we not only need to give a definition
of what that type is but also the operations that can be done on that
type. The definition may include the range of possible values that type
can hold.
Example: integer
Choosing the right data structure to solve a problem affects how easy the problem is to solve, how readable the code will be, and how efficient the resulting program will be.
You should substitute "algorithms" for "flowcharts" and "data structures" for "tables" in that last quote to use modern terminology.
There are several ways to create new lists:
>>> # assign the literal empty list >>> empty = [] >>> # use the list constructor >>> empty = list() >>> # give a literal assignment of items >>> items = ["bread", "milk", "cheese", "cider"] >>> items ['bread', 'milk', 'cheese', 'cider'] >>> # use the list constructor with a string >>> letters = list("hello") >>> letters ['h', 'e', 'l', 'l', 'o'] >>> # use string.split which returns a list of strings >>> words = "a bunch of words".split() >>> words ['a', 'bunch', 'of', 'words']
Operators:
>>> items[2] 'cheese' >>> items[3] 'cider' >>> items[3] = "lemonade" >>> items[3] 'lemonade'
>>> "milk" in items True >>> "juice" in items False
>>> "two words".split() == ["two", "words"] True >>> ["this", "that"] == ["that", "this"] False
Methods:
Built-in functions that operate on lists:
append(item) - add an item to the end of the list
>>> items ['bread', 'milk', 'cheese', 'lemonade'] >>> items.append('ham') >>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham'] >>> items.append('ham') >>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham', 'ham'] >>>
count(item) - count occurences of an item in the list
>>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham', 'ham'] >>> items.count('ham') 2 >>> items.count('cheese') 1 >>>
remove(item) - remove the first occurence of an item in the list
>>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham', 'ham'] >>> items.remove('ham') >>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham'] >>>
extend(list) - append multiple items to end of the list
>>> items.extend(['lettuce', 'tomatoes']) >>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes'] >>>
index(item) - locate the index of an item in the list
>>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes'] >>> items.index('milk') 1 >>>
insert(index, item) - insert an item before the one at the index
>>> items ['bread', 'milk', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes'] >>> items.insert(2, 'eggs') >>> items ['bread', 'milk', 'eggs', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes'] >>>
reverse() - reverse the order of the items in the list
>>> items ['bread', 'milk', 'eggs', 'cheese', 'lemonade', 'ham', 'lettuce', 'tomatoes'] >>> items.reverse() >>> items ['tomatoes', 'lettuce', 'ham', 'lemonade', 'cheese', 'eggs', 'milk', 'bread'] >>>
sort() - sort the list
>>> items ['tomatoes', 'lettuce', 'ham', 'lemonade', 'cheese', 'eggs', 'milk', 'bread'] >>> items.sort() >>> items ['bread', 'cheese', 'eggs', 'ham', 'lemonade', 'lettuce', 'milk', 'tomatoes'] >>>
len(list) - count number of items in the list
>>> items ['bread', 'cheese', 'eggs', 'ham', 'lemonade', 'lettuce', 'milk', 'tomatoes'] >>> len(items) 8 >>>
del(list[index]) - remove the item at the index from the list
>>> items ['bread', 'cheese', 'eggs', 'ham', 'lemonade', 'lettuce', 'milk', 'tomatoes'] >>> del(items[4]) >>> items ['bread', 'cheese', 'eggs', 'ham', 'lettuce', 'milk', 'tomatoes'] >>>
Use a list when order matters.
An object that is mutable can be changed in-place. Lists are
mutable.
Example: list[2] = "foo"
>>> list = ['cat', 'dog', 'bird', 'hamster'] >>> list ['cat', 'dog', 'bird', 'hamster'] >>> list[2] = "foo" >>> list ['cat', 'dog', 'foo', 'hamster'] >>>
Strings are immutable (aka, not mutable).
Example: str[3] = 'f' is not supported
>>> str = 'fooooo' >>> str[3] = 'f' Traceback (most recent call last): File "", line 1, in ? TypeError: object does not support item assignment >>>
Lists, and other data structures can be used as return values from functions. They can also be used as arguments. Mutable data structures can be changed by the function without returning them.
# fillList() takes an empty list and fills it with the integers 0 - 9 # Input: aList, an empty list # Output: None, but the list has been modified def fillList(aList): for i in range(10): aList.append(i) def main(): print print "This program uses a function to fill a list with values" print "Notice that since the list is mutable, we need only pass" print "it to the function and let the function modify it. The list" print "doesn't need to be returned." print myList = [] print "Before calling fillList, myList = ", myList fillList(myList) print "After calling fillList, myList =", myList main()
Let's see it work!
This program uses a function to fill a list with values Notice that since the list is mutable, we need only pass it to the function and let the function modify it. The list doesn't need to be returned. Before calling fillList, myList = [] After calling fillList, myList = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
So lists are definitely mutable!
Strings are immutable so let's see the difference:
# fillString() takes an empty string and concatenates letters to the string # Input: aString, an empty string # Output: None, and the original string has NOT been modified def fillString(aString): for ch in "letters": aString += ch def main(): print print "This program uses a function to fill a string with characters" print "Notice that since the string isn't mutable, the function won't" print "be able to modify the original string. The string would have to" print "be returned. In fact, since strings are immutable, for each" print "character that is concatenated onto the string, a new string has" print "to be made." print myString = "" print "Before calling fillString, myString = ", myString fillString(myString) print "After calling fillString, myString =", myString main()
What happens when we run this:
This program uses a function to fill a string with characters Notice that since the string isn't mutable, the function won't be able to modify the original string. The string would have to be returned. In fact, since strings are immutable, for each character that is concatenated onto the string, a new string has to be made. Before calling fillString, myString = After calling fillString, myString =
names.dat
Nicholas, Charles Finin, Tim desJardins, Marie Evans, Sue
Example output:
1 DESJARDINS, MARIE 2 EVANS, SUE 3 FININ, TIM 4 NICHOLAS, CHARLES
How about a top-down design.
import string # printGreeting() explains the program to the user # Inputs: None # Output: None def printGreeting(): print "This program reads in a list of names from a file," print "capitalizes them, sorts them alphabetically, and prints" print "them out in a numbered table." # readNames() reads in the names from a file # and puts them into a list called names # Input: filename, the name of the file to open # Output: names, the list of names def readNames(filename): names = [] file = open(filename, 'r') for line in file: name = line.strip() names.append(name) file.close() return names # capitalizeNames() capitalizes names (changes the list in-place) # Inputs: the list of names # Outputs: None, but the list was modified by this function def capitalizeNames(names): for i in range(len(names)): names[i] = names[i].upper() # printTable() prints the table of numbered names # one per line def printTable(names): length = len(names) for i in range(length): print (i + 1), "\t", names[i] def main(): printGreeting() # get filename from user filename = raw_input("What's the name of the file ? ") # process the names names = readNames(filename) capitalizeNames(names) names.sort() # print the table printTable(names) main()
Definition:
Sets are an unordered collection of items, where duplicates are not allowed.
Items must be immutable (such as numbers or strings).
Operations:
>>> empty = set()
>>> cards = set(["jack", "queen", "king"]) >>> cards set(['king', 'queen', 'jack']) >>>
>>> cards1 = set(["jack", "jack", "queen", "king", "king"]) >>> cards1 set(['king', 'queen', 'jack']) >>>
Operators
>>> cards = set(['jack', 'queen', 'king']) >>> cards set(['king', 'queen', 'jack']) >>> 'queen' in cards True
>>> cards2 = set(['queen', 'jack', 'king']) >>> cards2 set(['king', 'queen', 'jack']) >>> cards == cards2 True
Methods
>>> cards set(['king', 'queen', 'jack']) >>> cards.add('ace') >>> cards set(['king', 'queen', 'jack', 'ace']) >>>
>>> cards set(['king', 'queen', 'jack', 'ace']) >>> cards2 = cards.copy() >>> cards2 set(['king', 'queen', 'jack', 'ace']) >>>
>>> cards2 set(['king', 'queen', 'jack', 'ace']) >>> cards2.remove('king') >>> cards2 set(['queen', 'jack', 'ace']) >>>
>>> cards set(['king', 'queen', 'jack', 'ace']) >>> cards2 set(['queen', 'jack', 'ace']) >>> cards.difference(cards2) set(['king']) >>> >>> cards2.add('deuce') >>> cards2 set(['deuce', 'queen', 'jack', 'ace']) >>> cards.difference(cards2) set(['king']) >>> cards2.difference(cards) set(['deuce']) >>>
>>> cards set(['king', 'queen', 'jack', 'ace']) >>> cards2 set(['queen', 'jack', 'ace']) >>> cards.intersection(cards2) set(['queen', 'jack', 'ace']) >>>
>>> cards set(['king', 'queen', 'jack', 'ace']) >>> cards3 set(['trey', 'deuce']) >>> cards3.add('ace') >>> cards3 set(['trey', 'deuce', 'ace']) >>> allcards = cards.union(cards3) >>> allcards set(['trey', 'king', 'deuce', 'jack', 'ace', 'queen']) >>>
>>> allcards set(['trey', 'king', 'queen', 'jack', 'ace', 'deuce']) >>> cards3 set(['trey', 'deuce', 'ace']) >>> cards3.issubset(allcards) True >>> allcards.issubset(cards3) False >>>
>>> allcards set(['trey', 'king', 'queen', 'jack', 'ace', 'deuce']) >>> cards3 set(['trey', 'deuce', 'ace']) >>> allcards.issuperset(cards3) True >>> cards3.issuperset(allcards) False >>>
>>> cards3 set(['trey', 'deuce', 'ace']) >>> cards3.clear() >>> cards3 set([]) >>>
Built-in functions that operate on sets
>>> allcards set(['trey', 'king', 'queen', 'jack', 'ace', 'deuce']) >>> len(allcards) 6 >>> allcards.clear() >>> len(allcards) 0 >>>
When would you use a set?
When order doesn't matter and you don't want to allow duplicates.
names = set() name = raw_input("Enter a name (or . to quit): ") while name != ".": names.add(name) name = raw_input("Enter a name (or . to quit): ") for name in names: print name
linuxserver1.cs.umbc.edu[128] python nameset.py Enter a name (or . to quit): sue Enter a name (or . to quit): ben Enter a name (or . to quit): dan Enter a name (or . to quit): dawn Enter a name (or . to quit): tess Enter a name (or . to quit): dan Enter a name (or . to quit): dawn Enter a name (or . to quit): . sue ben dan dawn tess linuxserver1.cs.umbc.edu[129]
import string # partAB() prints part A or part B courses the student still needs to take # Inputs: the set of courses needed for this part, needed # the set of courses taken, taken # the part being processed, part # Output: None def partAB(needed, taken, part): print if needed == taken: print 'You have satisfied the Part', part, 'requirements.' else: remaining = needed.difference(taken) print 'Part', part, 'Requirements:' for course in remaining: print "You need to take", course def main(): # set up course sets requiredSet = set(['CMSC 201', 'CMSC 202', 'CMSC 203', 'CMSC 304']) more = set(['CMSC 313', 'CMSC 331', 'CMSC 341', 'CMSC 345', 'CMSC 411']) more2 = set(['CMSC 421', 'CMSC 441']) requiredSet = requiredSet.union(more) requiredSet = requiredSet.union(more2) requiredMath = set(['MATH 151', 'MATH 152', 'MATH 221']) # start userClasses as the empty set userClasses = set() # get the classes the user has taken putting them into a set print "You should have a file that contains the names of the courses" print "you've taken toward the CS major. There should be one course" print "per line." filename = raw_input("Please enter the name of this file : ") file = open(filename, 'r') for line in file: course = line.strip() individual = set() individual.add(course) # removed all the code that counted classes and credits # all used if individual.issubset(some set of courses): userClasses = userClasses.union(individual) file.close() # call functions to produce output for each part partAB(requiredSet, userClasses, 'A') partAB(requiredMath, userClasses, 'B') main()
CMSC 201 MATH 151 CMSC 202 CMSC 203 MATH 152
You should have a file that contains the names of the courses you've taken toward the CS major. There should be one course per line. Please enter the name of this file : courses.txt Part A Requirements: You need to take CMSC 411 You need to take CMSC 313 You need to take CMSC 421 You need to take CMSC 341 You need to take CMSC 345 You need to take CMSC 331 You need to take CMSC 441 You need to take CMSC 304 Part B Requirements: You need to take MATH 221
Oh, no!
We have the right answer, but our list of courses is out of order!
So after trying the 2 different data structures, lists and sets for HW4, what data structure do you think would be the best for this program ?
Definition:
Associative arrays are tuples of key-value pairs. The keys must be immutable (such as numbers or strings). Values can be any type. They associate one value with each key. Keys aren't in any particular order. Associative arrays are implemented with the Python dictionary type.
Operations:
Associative arrays are built into many languages, but they are called many different things:
Creating new dictionaries:
# use the constructor to make an empty dictionary empty = dict() # use the empty dictionary literal empty = {} # use the constructor with some initial pairings # in this case the keys are strings numbers = dict(one=1, two=2) # use the dictionary literal with some initial pairings capitals = {'MD':'Annapolis', 'ME':'Augusta'}
Operators:
Methods:
Built-in functions that operate on dictionaries:
Operators:
>>> capitals = {'MD':'Annapolis', 'ME':'Augusta'} >>> capitals {'ME': 'Augusta', 'MD': 'Annapolis'} >>> capitals['MI'] = 'Lansing' >>> capitals {'ME': 'Augusta', 'MD': 'Annapolis', 'MI': 'Lansing'} >>> capitals['MD'] 'Annapolis' >>> capitals['MI'] 'Lansing'
>>> 'ME' in capitals True >>> 'VA' in capitals False
Methods:
copy() - create a copy of the dictionary
>>> cap2 = capitals.copy() >>> cap2 {'ME': 'Augusta', 'MD': 'Annapolis', 'MI': 'Lansing'}
clear() - remove all entries from the dictionary
>>> cap2.clear() >>> cap2 {}
get(key) - find the value associated with a key
>>> cap = capitals.get('MD') >>> cap 'Annapolis' >>> cap = capitals.get('VA') >>> cap >>>
get(key, default) - use a default value when a key isn't found
>>> cap = capitals.get('MD', 'unknown') >>> cap 'Annapolis' >>> cap = capitals.get('VA', 'unknown') >>> cap 'unknown'
has_key(key) - another way to test for key containment
>>> capitals.has_key('ME') True >>> capitals.has_key('ND') False
keys() - create a list of the keys in the dictionary
>>> capitals.keys() ['ME', 'MD', 'MI']
values() - create a list of the values in the dictionary
>>> capitals.values() ['Augusta', 'Annapolis', 'Lansing']
Built-in functions that operate on dictionaries:
len(dict) - count # of keys in the dictionary
>>> len(capitals) 3
del(dict[key]) - remove a key (and its value) from the dictionary
>>> del(capitals['ME']) >>> capitals {'MD': 'Annapolis', 'MI': 'Lansing'} >>>
Why use an associative array?
When you want to associate keys with values. Keys don't have to be numbers, or if they are, they don't have to be contiguous and starting from zero (like in a list).
import random counts = {} for i in range(1000): # simulates rolling 2 dice a = random.randrange(1, 7) b = random.randrange(1, 7) sum = a + b # gets the old count for that roll or # 0 if that sum hasn't been rolled yet count = counts.get(sum, 0) # writes the new count to an existing entry or # creates a new dictionary entry with a key of # sum and a count of 1 counts[sum] = count + 1 # sort the keys, sums is a List of keys sums = counts.keys() sums.sort() # print table print "Sum\tOccurences" print "-" * 18 for sum in sums: print sum, "\t", counts[sum]
ite207-pc-01.cs.umbc.edu[126] python dice.py Sum Occurences ------------------ 2 38 3 60 4 92 5 104 6 141 7 169 8 132 9 120 10 68 11 43 12 33 ite207-pc-01.cs.umbc.edu[127]
Here's the data file
Plymouth 02360 Bar_Harbor 04609 Mystic 06355 Cape_May 08204 Hyde_Park 12538 Niagara_Falls 14301 Paradise 17562 Rehoboth_Beach 19971 D.C. 20001 UMBC 21250 Key_West 33040 Nashville 37201 Custer 57730 West_Yellowstone 59758 Whitefish 59901 Chicago 60610 St._Louis 63121 Kansas_City 66101 Baton_Rouge 70801 Austin 78701 Boulder 80301 Cody 82414 Ketchum 83340 Moab 84532 Tempe 85281 Taos 87571 Las_Vegas 89110 Big_Sur 93920 Carmel 93923 Monterey 93940 San_Francisco 94110
Here's what the output should look like:
linuxserver1.cs.umbc.edu[149] python zipcodes.py Enter the filename of the zipcode file : zipcodes.txt Which location (Q to quit) ? Plymouth 02360 Which location ? Carmel 93923 Which location ? UMBC 21250 Which location ? Oshkosh unknown Which location ? Niagara_Falls 14301 Which location ? Q linuxserver1.cs.umbc.edu[150]
What data structures would you use for these programs?