Charles Nicholas | nicholas@umbc.edu |
ITE 356 |
Office Hours: Monday and Wednesday 2:30-4pm or by appointment. |
410-455-2594 | I don't check voicemail often, so email is better! |
TA: |
Ran Liu <rliu2@umbc.edu> office hours: Monday and Tuesday 9-11am WebEx: https://umbc.webex.com/meet/rliu2 |
Grader: | TBD |
This course is an introduction to the theory and implementation of software systems designed to search through large collections of text. Did you ever wonder how World-Wide Web search engines work? Ever wondered why they don't? You'll learn about it here. Information retrieval (IR) is one of the oldest branches of computer science, and has influenced nearly every aspect of computer usage: "search and replace" in a word processor, querying a card catalog, grep'ing through your source code, filtering the spam out of your email, searching the Web.
This course will have two main thrusts. The first is to cover the fundamentals of IR: retrieval models, search algorithms, and IR evaluation. The second is to give a taste of the implementation issues by having you write (a good chunk of) your own text search engine and test it out on a sample text collection. This will be a semester-long project, details to follow.
You will need to have taken the equivalent of CMSC 341 (Data Structures), and an algorithms course (441 or 641) is recommended. Linear algebra (MATH 221) and Statistics (STAT 355) are recommended but not required; they give background which will be helpful in understanding many IR concepts.
We are using Introduction to Information Retrieval as the textbook.
Details about which chapters will be covered, and when, will follow. The slides to be used in class will be based on those provided by the authors of the textbook, but I may modify them from time to time. It'd be a good idea to study the slides BEFORE each class. Other papers and resources are available. Suggestions to add to this list are welcome.
The text from earlier offerings of the course, Modern Information Retrieval, second edition, by Ricardo Baeza-Yates and Berthier Ribeiro-Neto., may be useful as a reference. You can see the slides for that book at http://www.mir2ed.org.
There will be a multi-phase programming project, details to be announced, worth 50% of the grade. There will be a mid-term exam, worth 25% of the grade. There will also be a writing project, worth 25%.
Those students enrolled in CMSC 676 will be expected to write a paper of the depth that might lead to a Master's Writing Project or Thesis. Graduate stduents will also be expected to present their writing projects at the end of the semester, and undergraduates are welcome to do so. These presentations will take the place of the final exam, and no final exam as such is planned.
Generative AI: For this class, if you use ChatGPT (or similar chatbots or AI-based generation tools), you must describe exactly how you used it, including providing the prompt, original generation, and your edits. This applies to prose, code, or any form of content creation. Not disclosing is an academic integrity violation. If you do disclose, your answer may receive anywhere from 0 to full credit, depending on the extent of substantive edits, achievement of learning outcomes, and overall circumvention of those outcomes.
Use of AI/automatic tools for grammatical assistance (such as spell-checkers or Grammarly) or small-scale predictive text (e.g., next word prediction, tab completion) is okay. Provided the use of these tools does not change the substance of your work, use of these tools may be, but is not required to be, disclosed.
Students are expected to do their own assignments. We may allow collaboration on certain assignments during the semester, but we will tell you so as that happens. If you submit for credit work that is not your own, there will be consequences, perhaps including zero on that assignment, reduction in final grade, or forfeiture of current or future prospects for financial aid from CSEE. Here is a web site that explains UMBC's position on Academic Integrity.
Do you know about Retriever Essentials? It's there if you need them. According to their web site, "Retriever Essentials is a faculty, staff, and student-led partnership that promotes food access in the UMBC community. However, we offer more than just free groceries, we also offer toiletries, baby items, and meal swipes. The services we provide that are listed below are 100% free. You can find more in-depth information regarding each of our services in the attached documents."
We also incorporate the Syllabus Language provided by the UMBC Office of Equity and Civil Rights for this semester, as given here:
https://ecr.umbc.edu/sample-title-ix-responsible-employee-syllabus-language/
We will follow the textbook closely. I reserve the right to make minor changes along the way, but the basic structure will be as follows. Some chapters are long enough or important enough to warrant coverage over two lectures.
We will cover the chapters in the text in order, at the rate of approximately one chapter per week. The 676 presentations will take place in early May. The following is subject to change as progress of the class warrants.
Week | Dates in 2024 | Topics/Activities |
1 | 1/30, 2/1 | Introduction Chapter 2 (ppt,pdf) |
2 | 2/6, 2/8 | Discuss Terrier, and demo of Terrier Desktop installation NO OFFICE HOURS February 7, 2024. Send email and we can schedule a time to meet. |
3 | 2/13, 2/15 | CLASS on TUESDAY 2/13 is REMOTE - nobody needs to (or should try to) attend in person! Thanks! Begin to discuss first phase of programming project AS of this date, we reserve the right to grant extra credit to people who are in class when it starts at 5:30! CLASS on THURSDAY 2/15 is REMOTE - nobody needs to (or should try to) attend in person! Thanks! Demo of the USPTO search engine located at
Patent Public Search
Possible Paper Topics:
Please send your term project idea to me in an email to nicholas@umbc.edu within ten days. Recording for 2/15/2024 |
4 | 2/20, 2/22 | CLASS on TUESDAY 2/20 and THURSDAY 2/22 is REMOTE - nobody needs to (or should try to) attend in person! Thanks! Finish slides from before Project Parameters:
Phase 1 is due this Friday. Recording for 2/22/2024 |
5 | 2/27,2/29 | CLASS on TUESDAY 2/27 and THURSDAY 2/29 is REMOTE - nobody needs to (or should try to) attend in person! Thanks! Release Phase 2 of the project. Please submit project ideas by today! |
6 | 3/5, 3/7 | CLASS on TUESDAY 3/5 and THURSDAY 3/7 is REMOTE More on Phase 2 of the project Feedback on paper topics: most if not all have been approved. When citing references, give me title and author and venue, not just a link. Many of you chose topics that you may find are too broad, so focus as you need to. More on Probabilistic IR |
3/12, 3/14 | CLASS on TUESDAY 3/12 and THURSDAY 3/14 is REMOTE Phase 2 of project is due Monday, March 11 |
|
3/19. 3/21 | Spring Break | |
8 | 3/26, 3/28 | CLASS on TUESDAY 3/26 and THURSDAY 3/28 is HYBRID. We will meet in ENG 231, or WebEx, as the student prefers. A demo of Zipf's Law using Google Colab. (Inside UMBC only) The midterm exam will really be on Thursday of this week. There will be no recording, The exam will be available over Blackboard at 5:30pm. Go to Blackboard, select this class, select Course Materials, and select Midterm. Open book and open notes. Web search is allowed, but no AI help. No other collaboration is allowed. Topics include material from the slides presented in class and textbook Chapters 1-8, PLUS the Levenshtein distance. |
9 | 4/2, 4/4 | CLASS on TUESDAY and THURSDAY this week is HYBRID. We will meet in ENG 231, or WebEx, as the student prefers. Go over the exam. Recording for 4/4/2024 |
10 | 4/9, 4/11 | Evaluation of IR systems Chapter 8 (ppt, pdf) Schedule your presentations using the link found here.
|
11 | 4/16, 4/18 | MRS coverage of Latent Semantic Analysis is a little thin. On Thursday, a special topic: authorship attribution. |
12 | 4/23, 4/25 | Class will be ONLINE ONLY today. Tuesday, April 23. Due to Dr. Nicholas and a minor illness. (Allergies, I think, not COVID.) Class will be hybrid on Thursday 4/25. Chapter 10 (ppt) as time permits Recording from 4/25/2024 Schedule your presentations using the link found here. Format for student presentations: You can use your own, but I can suggest:
|
13 | 4/30, 5/2 | On Tuesday, I'll be giving a dry run of my Research Day talk. Recording from 4/30/2024 Student presentations
for Thursday. Recording from 5/2/2024 Please participate in CSEE Research Day on Friday May 3! |
14 | 5/7, 5/9 | Project 5 is due on May 8. Student presentations for Tuesday
Recording from 5/7/2024 Student presentations for Thursday
For each speaker, fill in this feedback form. Extra credit +10 if you submited your paper by 11:59pm May 6. |
15 | 5/14 | Student presentations for Tuesday
For each speaker, fill in this feedback form. |
NO FINAL EXAM, the writing project takes the place of the final exam |