General assignment information. Note that this isn't a template notebook, hence there's no 🚀 above. You will create a blank notebook for this one.
- 10 minutes to pandas
- Indexing Basics
- Group by: split-apply-combine
- Beginning up to "GroupBy object attributes"
- "Aggregation" up to "The
aggregate()method"
You'll do the following in a notebook. Make it read like a blog post. Pretend you're explaining to a peer who hasn't taken this class. You don't need to teach them to code, but they should be able to follow what's going on.
- Find a dataset.
- It must have at least one numeric column.
- Don't spend too long on this step.
- If there's more than one numeric column, pick one.
- Create a new notebook.
- Using pandas:
- Read in the data.
- Compute:
- The mean
- The median
- The mode
- Do a
groupby()with an aggregation.
- Do the same thing, but with pure Python (without pandas).
- Write a conclusion, covering both:
- The takeaways of the analysis
- Reflecting on the process
- Did you use an external source, including generative AI? Please explain, or say that you didn't.
- Read The Joys (and Woes) of the Craft of Software Engineering
- Note not everything in there is applicable to data analysis
- Filtering/indexing
DataFrames - Learn about functions
- Brackets in Python and pandas
- Coding Style Guides - Please skim these; I don't expect you to understand and follow everything in them. The most important guidelines to pay attention to are indentation and keeping each statement on its own line.
- Guide to commenting your code
- Quartz Guide to Bad Data
- Learn about data dictionaries
- Glance through pandas' comparison with other tools for any you are familiar with
- Selecting Subsets of Data in Pandas: Part 1 and Part 2
Reminder about the between-class participation requirement.