Curriculum
Introduction to R and Visualization (2 weeks)
Prerequisites: While no prerequisites are required, people with some experience working with data are best suited for this program.
- Introduction to data science and fundamentals of R–1 week, 2 classes
- Fundamentals of R programming
- Wrangling and cleaning data–1 week, 2 classes
- Introduction to data visualization
Machine Learning and Text Mining in R (6 weeks)
Prerequisites: Students should be comfortable using R to manipulate data and must know how to create basic visualizations.
- Introduction to data science and use cases–1 week, 2 classes
- Introduction to foundational statistics
- Best practices for model building–1 week, 2 classes
- Clustering
- Principal component analysis–1 week, 2 classes
- Midway capstone presentations and outlines
- Processing and cleaning text data–1 week, 2 classes
- Text mining in R
- Introduction to classification and kNN–1 week, 2 classes
- Logistic regression
- Decision trees and random forests–1 week, 2 classes
- Final presentations and feedback
Image*
Week 1: Topic and success requirements selected.
Weeks 1-2: Project plan developed including the skill sets and technology required, and data set identified.
Week 2: Data set acquired and exploratory analysis and visualization performed.
Week 3: Initial analysis performed & peer review.
Week 4: Analysis refined.
Weeks 5-6: Application development and peer review.
Week 6: Final presentations and conclusions.
* This content is in the process of Section 508 review. If you need immediate assistance accessing this content, please submit a request to idealab@hhs.gov. Content will be updated pending the outcome of the Section 508 review.