- A PDF file with Salton and Bucley's 1988 IPM article.
- Some interesting papers: Salton '75, Singhal '96, Singhal '97, Harman '95
- A few pages taken from Modern Information Retrieval by Baeza-Yates and Ribeiro-Neto. My handwritten notes on this excerpt.
- For an introduction to language models in IR, look at Ponte and Croft from SIGIR'98. My handwritten notes on language models.
- The seminal paper on LSI by Deerwester et al, from 1990 .
- A seminal paper on n-grams by Marc Damashek.
- The survey paper on Distributed IR by Jamie Callan.
- Google's cluster architecture is described in Barroso, where the focus is more on computer architecture and performance than IR per se. This paper is really old, but is the most recent I know of on this topic.
- Cross language IR: Oard's tech report is an excellent overview, now somewhat dated. Dumais et. al discuss the use of LSI in CLIR. Another version of Dumais with my notes. An earlier version of their paper has color illustrations.
- Thesaurus processsing is a form of query expansion, and an article by Susan Gauch is well-cited. A recent article by Abdelali, Cowie and Soliman on use of semantic expansion appeared in the May 2007 issue of IP&M
- A paper on n-grams by McNamee and Mayfield, as published and with my written notes.
- The Salton and Buckley paper on relevance feedback from 1990 is still cited, amnd may be the most lucid explanation.