---
title: "STATS 220"
subtitle: "Data import`r emo::ji('arrow_down')`/export`r emo::ji('arrow_up')`"
type: "lecture"
date: ""
output:
xaringan::moon_reader:
css: ["assets/remark.css"]
lib_dir: libs
nature:
ratio: 16:9
highlightStyle: github
highlightLines: true
countIncrementalSlides: false
---
```{r initial, echo = FALSE, cache = FALSE, results = 'hide'}
library(knitr)
options(htmltools.dir.version = FALSE, tibble.width = 60, tibble.print_min = 6)
opts_chunk$set(
echo = TRUE, warning = FALSE, message = FALSE, comment = "#>",
fig.path = 'figure/', cache.path = 'cache/', cache = TRUE,
fig.align = 'center', fig.width = 12, fig.height = 8.5, fig.show = 'hold',
dpi = 120
)
```
```{r xaringan-panelset, echo = FALSE}
xaringanExtra::use_panelset()
```
```{r external, include = FALSE, cache = FALSE}
read_chunk('R/02-import-export.R')
```
## Atomic vector (1d)
.center[]
```{r vector}
```
.footnote[image credit: Jenny Bryan]
???
* an ensemble of scalars -> vectors
---
## 1d `r emo::ji("arrow_right")` 2d
.pull-left[
.center[]
]
.pull-right[
```{r tibbles}
```
]
.footnote[image credit: Jenny Bryan]
???
* an ensemble of vectors -> rect data/tabular data, like spreadsheet
---
class: inverse middle
## Beyond 1d vectors
### 1. Lists
### 2. Matrices and arrays
### 3. Data frames and tibbles
???
* Common data strs beyond 1d
* start with the most flex one
* briefly talk about mat
* focus on data frames, more specifically tibbles
---
.left-column[
## data strs
### - lists
]
.right-column[
An object contains elements of **different data types**.
```{r lists}
```
]
???
* to create a list using `list()`
* put 4 atomic vectors inside my lst
* a list of 4 elements, or length of 4
---
.left-column[
## data strs
### - lists
]
.right-column[
.pull-left[
## data type
```{r lists-type}
```
## data class
```{r lists-cls}
```
]
.pull-right[
## data structure
```{r lists-str, results = "hold"}
```
]
]
???
* vis rep: a container, 4 items inside
* primitive: original, cannot be modified
* class: type + attrs, can be modified
* rstudio values uses `str()`
---
.left-column[
## data strs
### - lists
]
.right-column[
.pull-left[
```{r ref.label = "lists", echo = 2}
```
]
.pull-right[
.center[]
]
]
---
.left-column[
## data strs
### - lists
]
.right-column[
A list can contain other lists, i.e. **recursive**
```{r lists-rec}
```
]
???
* most flex: put a list into a list
* a named list
---
.left-column[
## data strs
### - lists
]
.right-column[
.pull-left[
Test for a list
```{r is-list}
```
]
.pull-right[
Coerce to a list
```{r as-list}
```
]
]
???
* to test if an object is one type, funs prefixed `is`
* to coerce/convert from one type to another type, funs prefixed with `as`
* from a vector of integers to a list
---
.left-column[
## data strs
### - lists
]
.right-column[
.pull-left[
Subset by `[]`
```{r lst-sub}
```
]
.pull-right[
Subset by `[[]]`
```{r lst-sub2}
```
]
.center[![](img/pepper.png)]
.footnote[image credit: Hadley Wickham]
]
---
.left-column[
## data strs
### - lists
### - matrices
]
.right-column[
2D structure of homogeneous data types
* `matrix()` to construct a matrix
```{r matrix}
```
* `as.matrix()` to coerce to a matrix
* `is.matrix()` to test for a matrix
]
???
* we don't deal with matrix in 220, matrix for computational stats.
---
.left-column[
## data strs
### - lists
### - matrices
]
.right-column[
**array**: more than 2D matrix
```{r array}
```
]
---
.left-column[
## data strs
### - lists
### - matrices
### - tibbles
]
.right-column[
A data frame is a **named list** of vectors of the **same length**.
```{r data-frame}
```
]
---
.left-column[
## data strs
### - lists
### - matrices
### - tibbles
]
.right-column[
The underlying data type is a list.
```{r df-type}
```
.pull-left[
.center[data class]
```{r df-cls}
```
]
.pull-right[
.center[data attributes (meta info)]
```{r df-attrs}
```
]
]
???
* `data.frame` represents tabular data in R
* attributes: colnames and rownames
---
.left-column[
## data strs
### - lists
### - matrices
### - tibbles
]
.right-column[
A tibble is a **modern reimagining** of the data frame.
```{r ref.label = "tibbles"}
```
* `as_tibble()` to coerce to a tibble
* `is_tibble()` to test for a tibble
]
???
* why we call it `tibble`
---
.left-column[
## data strs
### - lists
### - matrices
### - tibbles
]
.right-column[
.center[
]
```{r tbl-type}
```
]
???
* multi cls: left to right, specific to more general
---
## Why tibble not data frame?
.pull-left[
```{r ref.label = "data-frame"}
```
]
.pull-right[
```{r eval = FALSE}
sci_tbl <- tibble(
department = dept,
count = nstaff,
percentage = count / sum(count)) #<<
sci_tbl
```
```{r ref.label = "tibbles", highlight.output = c(1, 3), echo = FALSE}
```
]
???
* tibble's display: friendly & informative
---
## Glimpse data
```{r glimpse}
```
Data types and their abbreviations
.pull-left[
* `chr`: character
* `dbl`: double
* `int`: integer
* `lgl`: logical
]
.pull-right[
* `fct`: factor
* `date`: date
* `dttm`: date-time
* more [column data types](https://tibble.tidyverse.org/articles/types.html)
]
???
text in pink suggest links
---
## Subsetting tibble
.left-column[
### - to 1d
]
.right-column[
* with `[[]]` or `$`
```{r subset-vct}
```
]
---
## Subsetting tibble
.left-column[
### - to 1d
### - by columns
]
.right-column[
* with `[]` or `[, col]`
.pull-left[
```{r subset-col1}
```
]
.pull-right[
```{r subset-col2}
```
]
]
---
## Subsetting tibble
.left-column[
### - to 1d
### - by columns
### - by rows
]
.right-column[
* with `[row, ]`
.pull-left[
```{r subset-row1}
```
]
.pull-right[
```{r subset-row2}
```
]
]
---
## Subsetting tibble
.left-column[
### - to 1d
### - by columns
### - by rows
### - by cols & rows
]
.right-column[
* with `[row, col]`
```{r subset-cr, results = "hold", eval = 1}
```
]
---
## Subsetting tibble
* Use `[[` to extract 1d vectors from 2d tibbles
* Use `[` to subset tibbles to a new tibble
+ numbers (positive/negative) as indices
+ characters (column names) as indices
+ logicals as indices
```{r ref.label = "subset-cr", eval = FALSE}
```
---
class: middle inverse
## The [tidyverse](https://www.tidyverse.org) is an opinionated [collection of R packages](https://www.tidyverse.org/packages/) designed for data science. *All packages share an underlying design philosophy, grammar, and data structures.*
---
## Use {tidyverse}
```{r tidyverse, message = TRUE, cache = FALSE}
library(tidyverse)
```
---
class: inverse middle
# Data import `r emo::ji('arrow_down')`
---
background-image: url(img/pisa.png)
.footnote[