R Programming for Data Sciences

FOR/STT 875

(Coming summer 2017)

R has emerged as a preferred programming language in a wide range of data intensive disciplines (e.g., O’Reilly Media’s 2014 Data Science Data Science Salary Survey found that R is the most popular programming language among data scientists). The goal of this course is to teach applied and theoretical aspects of R programming for data sciences. Topics will cover generic programming language concepts as they are implemented in high-level languages such as R. Course content focuses on design and implementation of R programs to meet routine and specialized data manipulation/management and analysis objectives. Attention will also be given to mastering concepts and tools necessary for implementing reproducible research.

What is R?

  • An open source (and freely available for Windows, Mac OS X, and Linux) environment for statistical computing and graphics
  • Full-featured programming language that can essentially do anything
    • In particular, it is a scripting language (with similarities to Matlab and Python) that allows for reproducibility and automating tasks

Why Learn R?

  • R is one of the highest paid IT skills
  • R is the most-used data science language after SQL
  • R is used by 70% of data miners
  • R is #15 of all programming languages
  • R is growing faster than any other data science language
  • R is the #1 Google search for Advanced Analytics software
  • R has more than 2 million users worldwide
  • R is used by statisticians, scientists, social scientists and has the widest statistical functionality of any software
  • R users add functionality via packages all the time
  • R can interact with other software, databases, the operating system, the web, etc.

Tentative Syllabus

View the proposed syllabus for FOR/STT 875.

Course Structure

FOR/STT 875 is delivered entirely online through the course management system D2L. It will be an active, project-based learning environment that focuses on:

  • History and overview of R
  • Install and configuration of R programming environment
  • Basic language elements and data structures
  • R+Knitr+Markdown+GitHub
  • Data input/output
  • Data storage formats
  • Subsetting objects
  • Vectorization
  • Control structures
  • Functions
  • Scoping Rules
  • Loop functions
  • Graphics and visualization
  • Grammar of data manipulation (dplyr and related tools)
  • Debugging/profiling
  • Statistical simulation

Get Involved

Take a four-question survey about the course and stay updated about developments and new offerings.

Registration

FOR/STT 875: R Programming for Data Sciences is available for undergraduate, graduate and lifelong education students. There are no prerequisite or co-requisite courses.  

MSU Students

  • Undergraduates have two ways they can enroll:
    • Honors College students, contact the MSU Forestry undergraduate advisor, .(JavaScript must be enabled to view this email address), for registration.
    • All other undergraduates, contact course instructor Dr. Andrew Finley, .(JavaScript must be enabled to view this email address), for permission to enroll.