Welcome to the course! Today, we will lay the groundwork for our journey into data science using R.
- Course Overview
- The R Environment & RStudio
- Basic R Syntax & Data Types
- Introduction to R Markdown
2025-08-13
Welcome to the course! Today, we will lay the groundwork for our journey into data science using R.
See the Syllabus for complete breakdown
Typical class:
Evaluation:
Note that 40% of your grade is from exams where you have to answer questions and write code in real time
Feel free to use GenAI but you won’t be able to use it on the exams
Data science is an interdisciplinary field that uses scientific methods, processes, algorithms, and systems to extract knowledge and insights from structured and unstructured data.
R is a powerful, open-source programming language and environment specifically designed for statistical computing and graphics.
Rich Ecosystem: Thousands of packages for every data task imaginable.
Visualization: Unrivaled capabilities with packages like ggplot2
.
Statistical Power: Built by statisticians, for statisticians.
Reproducibility: Excellent tools for creating dynamic reports and dashboards.
Python is also commonly used. It is more general use than R.
RStudio is an integrated development environment (IDE) that makes working with R much easier and more productive. It consists of four main panes:
Let’s start with the basics. R’s syntax is simple and intuitive for a programming language.
Use <-
to assign a value to a variable. The =
sign also works, but <-
is the conventional R style.
# Assign a value to x x <- 10 # Print the value of x print(x)
## [1] 10
Let’s see this in action. When you “knit” this document, the code below will be executed and the output will be displayed.
# Assign a value to a variable my_number <- 42 my_text <- "Hello, R!" # Print the variables paste("my_number is:", my_number)
## [1] "my_number is: 42"
paste("my_text is:", my_text)
## [1] "my_text is: Hello, R!"
You can use R for basic arithmetic operations.
+
-
*
/
^
or **
5 * 3 + 2
## [1] 17
The two main numeric data types are integers and doubles (real numbers).
L
(e.g., 10L
).class()
allows you to see the data type of your variable.class(10)
## [1] "numeric"
class(10L)
## [1] "integer"
Text data is stored as a character string. You can use either single quotes ('
) or double quotes ("
).
my_string <- "Data Science is fun!" print(my_string)
## [1] "Data Science is fun!"
Logical values are either TRUE
or FALSE
. You can also use T
and F
as shorthand.
5 > 3
## [1] TRUE
5 == 5
## [1] TRUE
R Markdown is a powerful tool for creating dynamic, reproducible documents that combine R code and its output with narrative text.
An R Markdown file (.Rmd
) has three main components:
title: "My First R Markdown Document" author: "Your Name" date: "2025-08-04" output: html_document
This specifies document metadata and the output format.
This is just regular text using Markdown syntax.
# A main heading
This is some **bold** text.
Code chunks are enclosed in backticks and curly braces:
```r # Your R code goes here summary(cars)
When you knit an R Markdown file, the code is executed and the output is inserted into the final document.
# Imagine this is an R Markdown chunk my_vector <- c(1, 2, 3, 4, 5) mean(my_vector)
## [1] 3
In RStudio, the “Knit” button is what brings it all together. It executes your R code and renders the final document to your desired format (e.g., HTML, PDF, Word).
Set up R, Rstudio
Work on the day1_activity
Set up Git, GitHub
Ask me questions if you run into issues