Projects


The project is meant to give you experience with real data in the context of an unstructured exploration. There are only two hard requirements:

If you need help finding a dataset, talk to me. But there are many floating around on the internet these days. See the note on the main class page about using Kaggle or other competition datasets. One way around that issue is to combine data from a couple of competitions. In years past I've required students to use Open Baltimore data. That's not a requirement this year but it is one source to consider.

You can do the project in teams of 1, 2, or 3. I expect more work from teams of 2, and even more from teams of 3, so consider that as you form teams and come up with ideas.

Note that there is an item on the syllabus for a 1-page project description. That is an informal writeup of who is doing the project, what data you will be using, and what you intend to do. It will not be graded but I will use it to provide feedback as to whether your project is reasonable. If you don't turn it in I will harrass you mercilessly until you do.

The final report should be single spaced, 12 point font, with 1 inch margins. Other than that, any style is fine. My expectation is that the report will contain at least the following:

In terms of length, given that there will be visualizations, tables, histograms, etc. in the document, a paper shorter than 10 pages will be met with some skepticism (see the discussion above about complexity). A paper longer than 30 pages may test the limits of the reader's attention span. Note that teams with more than one person should have tried more things. It is not the case that the length of the paper should be twice as long for teams of two as compared to an individual project. Rather, I'd expect the larger teams to explore more of the data, try more things, and present more insights.

In the end, the best projects will be ones that learn something interesting from the data. That is, if you tell me that the data says that most crime occurs after midnight on the weekends, I won't be surprised or find that particularly insightful. But if you tell me that crime patterns by weapon seem to move geographically with a weekly cycle (for example), I'd think that was pretty interesting.

The preferred format for the final project is a single Jupyter notebook. But a word document or PDF are also fine.