STAT207
STAT207 (Data Science Exploration) is a four credit hour data science course intended as a continuation of STAT107.
Content Covered
- Data science pipeline
- Dataset collection
- Data cleaning, manipulation, and representation
- Descriptive analytics
- Basic probability
- Statistical inference (confidence intervals, hypothesis testing)
- Linear and logistic regression
- Classifier models and machine learning
- Python for data science
This course delves deeper into topics covered in STAT107 and covers new models, functions, and algorithms for data science.
Prerequisites
STAT207 assumes a basic understanding of Python programming and data science topics covered in STAT107. However, it is possible to take this course without taking STAT107 if you have prior experience with Python and data science/statistics.
When to Take It
This course is most useful for students pursuing a Statistics minor or interested in data science. It is the prerequisite for CS307.
Course Structure
This class has weekly Python notebook homework assignments, designed to be completed individually. There are also weekly Python notebook labs that can be completed in groups of 2-3 students during the lab section.
There are 2 midterm exams and a cumulative final exam. All exams have an in-person written section and a take-home coding section. The written section is mostly content from lectures and homework, while the coding section is similar to the homework and labs. You are allowed to use a calculator during the written portion of the exam.
There is a final project that consists of doing data science research/analysis on a dataset of your choice and is completed with a partner. You will be expected to present your findings to your lab section.
Instructors
Prof. Julia Deeke and Prof. Tori Ellison teach the lecture sections of this course, with TAs running the lab sections.
Course Tips
STAT207 is a good introductory data science course, and while STAT107 is a prerequisite, it is not strictly enforced. It assumes that you are somewhat familar with Python (specifically NumPy and Pandas) so if you are not taking STAT107 first, you should be familiar with these libraries.
The homework and labs are not very difficult and are designed to help you learn the material. The exams are more challenging, but practice exams are provided. The written section draws from topics covered in lecture, so even though you may feel comfortable with the material, it is suggested to attend lecture for more context since the lecture notes are not comprehensive or detailed.
This course moves more quickly and has much less hand-holding than STAT107, so you should be prepared to adjust to the faster pace. This is especially noticeable on the Python coding side of the course, where you are expected to be able to write in Python notebooks without much guidance. For ECE students taking this course, this should not be an issue, but you may need to learn Python on your own since it is not covered in the ECE curriculum.
Life After
STAT207 is a prerequisite for CS307, which is a more advanced data science course. More advanced statistics courses include STAT400 and STAT410.