2025-08-13

Course Overview & Foundations

Welcome to the course! Today, we will lay the groundwork for our journey into data science using R.

  • Course Overview
  • The R Environment & RStudio
  • Basic R Syntax & Data Types
  • Introduction to R Markdown

Course overview

  • See the Syllabus for complete breakdown

  • Typical class:

    • Short lecture on the concept we will focus on
    • In class exercise to apply the concepts we just covered
  • Evaluation:

    • Class attendance/Participation [5%]
    • Problem sets (0/1/2 scale) [15%]
    • Team projects [15%]
    • 2 midterm exams [40%]
    • Final project [25%]
  • Note that 40% of your grade is from exams where you have to answer questions and write code in real time

  • Feel free to use GenAI but you won’t be able to use it on the exams

How to succeed

  • Stay up to date with the course outline (I will update regularly)
  • Come to class prepared and on time
  • Ask questions if you start getting lost
    • In class
    • Tutor OH
    • My OH

What is Data Science?

Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.

The “Data Science Lifecycle”

  • Ask: Define the problem and form a hypothesis.
  • Wrangle: Get, clean, and prepare data.
  • Explore: Conduct exploratory data analysis (EDA) and visualize the data.
  • Model: Build predictive or descriptive models.
  • Communicate: Share findings and insights.

Why R?

R is a powerful, open-source programming language and environment specifically designed for statistical computing and graphics.

  • Rich Ecosystem: Thousands of packages for every data task imaginable.

  • Visualization: Unrivaled capabilities with packages like ggplot2.

  • Statistical Power: Built by statisticians, for statisticians.

  • Reproducibility: Excellent tools for creating dynamic reports and dashboards.

  • Python is also commonly used. It is more general use than R.

Introducing RStudio

RStudio is an integrated development environment (IDE) that makes working with R much easier and more productive. It consists of four main panes:

  • Console: Where you run R code directly.
  • Source: Where you write and save your R scripts and R Markdown files.
  • Environment/History: Where you see the variables and functions you have created.
  • Files/Plots/Packages/Help/Viewer: A multi-tab pane for file management, viewing plots, managing packages, and more.

Basic R Syntax: Assignment

Let’s start with the basics. R’s syntax is simple and intuitive for a programming language.

Assignment Operator

Use <- to assign a value to a variable. The = sign also works, but <- is the conventional R style.

# Assign a value to x
x <- 10

# Print the value of x
print(x)
## [1] 10

Live Example: Assignment

Let’s see this in action. When you “knit” this document, the code below will be executed and the output will be displayed.

# Assign a value to a variable
my_number <- 42
my_text <- "Hello, R!"

# Print the variables
paste("my_number is:", my_number)
## [1] "my_number is: 42"
paste("my_text is:", my_text)
## [1] "my_text is: Hello, R!"

Basic arithmetic

You can use R for basic arithmetic operations.

  • Addition: +
  • Subtraction: -
  • Multiplication: *
  • Division: /
  • Exponentiation: ^ or **
5 * 3 + 2
## [1] 17

Data Types: Numbers

The two main numeric data types are integers and doubles (real numbers).

  • Integer: Whole numbers, specified with L (e.g., 10L).
  • Double (Numeric): Decimal numbers, the default numeric type.
  • Note: class() allows you to see the data type of your variable.
class(10)
## [1] "numeric"
class(10L)
## [1] "integer"

Data Types: Strings (Character)

Text data is stored as a character string. You can use either single quotes (') or double quotes (").

my_string <- "Data Science is fun!"
print(my_string)
## [1] "Data Science is fun!"

Data Types: Logical (Boolean)

Logical values are either TRUE or FALSE. You can also use T and F as shorthand.

5 > 3
## [1] TRUE
5 == 5
## [1] TRUE

Introduction to R Markdown

R Markdown is a powerful tool for creating dynamic, reproducible documents that combine R code and its output with narrative text.

  • Narrative Text: Use Markdown syntax for headings, lists, and bold text.
  • Code Chunks: Write R code in special blocks called “chunks.”
  • Output: The results of your code, including tables and plots, are automatically embedded.

The Structure of an R Markdown File

An R Markdown file (.Rmd) has three main components:

1. The YAML Header

title: "My First R Markdown Document"
author: "Your Name"
date: "2025-08-04"
output: html_document

This specifies document metadata and the output format.

R Markdown: Narrative & Code

2. The Narrative Text

This is just regular text using Markdown syntax.

# A main heading This is some **bold** text.

3. The Code Chunks

Code chunks are enclosed in backticks and curly braces:

```r
# Your R code goes here
summary(cars)

Live Example: R Markdown Concept

When you knit an R Markdown file, the code is executed and the output is inserted into the final document.

# Imagine this is an R Markdown chunk
my_vector <- c(1, 2, 3, 4, 5)
mean(my_vector)
## [1] 3

The “Knit” Button

In RStudio, the “Knit” button is what brings it all together. It executes your R code and renders the final document to your desired format (e.g., HTML, PDF, Word).

  • It’s the key to reproducibility.
  • It ensures your analysis and the report are always in sync.

Summary and Next Steps

Today’s Summary

  • We discussed what Data Science is.
  • We introduced R and RStudio.
  • We learned basic R syntax for assignment and arithmetic.
  • We explored fundamental data types: numbers, strings, and logicals.
  • We had a first look at the power of R Markdown.

Looking Ahead to Day 2

  • Next class, we’ll dive deeper into R Markdown and introduce version control with Git and GitHub, which are essential for collaboration.

For rest of class

  • Set up R, Rstudio

  • Work on the day1_activity

  • Set up Git, GitHub

  • Ask me questions if you run into issues