The greatest value of a picture is when it forces us to notice what we never expected to see.
-- John W. Tukey
dino
#> # A tibble: 142 x 2#> x y#> <dbl> <dbl>#> 1 55.4 97.2#> 2 51.5 96.0#> 3 46.2 94.5#> 4 42.8 91.4#> 5 40.8 88.3#> 6 38.7 84.9#> # β¦ with 136 more rows
dino
#> # A tibble: 142 x 2#> x y#> <dbl> <dbl>#> 1 55.4 97.2#> 2 51.5 96.0#> 3 46.2 94.5#> 4 42.8 91.4#> 5 40.8 88.3#> 6 38.7 84.9#> # β¦ with 136 more rows
image credit: Steph Locke
A picture is worth a thousand words. -- Henrik Ibsen
sci_tbl
#> # A tibble: 4 x 2#> dept count#> <chr> <int>#> 1 Physics 12#> 2 Mathematics 8#> 3 Statistics 20#> 4 Computer Science 23
dept
: discrete/categoricalcount
: quantitative/numeric
barplot(as.matrix(sci_tbl$count), legend = sci_tbl$dept)
pie(sci_tbl$count, labels = sci_tbl$dept)
barplot(as.matrix(sci_tbl$count), legend = sci_tbl$dept)
pie(sci_tbl$count, labels = sci_tbl$dept)
Grammar makes language expressive. A language consisting of words and no grammar (statement = word) expresses only as many ideas as there are words. By specifying how words are combined in statements, a grammar expands a languageβs scope.
image credit: Thomas Lin Pederson
decomposed to
library(ggplot2)ggplot(data = sci_tbl) + geom_bar( aes(x = "", y = count, fill = dept), stat = "identity" )
ggplot(data = sci_tbl) + geom_bar( aes(x = "", y = count, fill = dept), stat = "identity" ) + coord_polar(theta = "y")
ggplot(data = <DATA>, mapping = aes(<MAPPINGS>)) + layer(geom = <GEOM>, stat = <STAT>, position = <POSITION>) + layer(geom = <GEOM>, stat = <STAT>, position = <POSITION>)
data
: tibble/data.frame.mapping
: aesthetic mappings between data variables and visual elements, via aes()
.layer()
: a graphical layer is a combination of data, stat and geom with a potential position adjustment.geom
: geometric elements to render each data observation.stat
: statistical transformations applied to the data prior to plotting.position
: position adjustment, such as "identity", "stack", "dodge" etc.+
: layer + layerggplot(data = sci_tbl, mapping = aes(x = dept, y = count)) + layer(geom = "bar", stat = "identity", position = "identity")
p <- ggplot(sci_tbl, aes(x = dept, y = count))p
ggplot()
initialise the plotlayer()
)p + geom_bar(stat = "identity")
p + geom_col()
stat = "identity"
leaves data as is.geom_col()
is a shortcut to geom_bar(stat = "identity")
.Generally, we use geom_*()
instead of layer()
in practice.
geom_*()
p + geom_point()
p + geom_segment(aes(xend = dept, y = 0, yend = count))
geom_segment()
: more aesp + geom_point() + geom_segment(aes(xend = dept, y = 0, yend = count))
sci_tbl
#> # A tibble: 4 x 2#> dept count#> <chr> <int>#> 1 Physics 12#> 2 Mathematics 8#> 3 Statistics 20#> 4 Computer Science 23
sci_tbl0
#> # A tibble: 63 x 1#> dept #> <chr> #> 1 Physics#> 2 Physics#> 3 Physics#> 4 Physics#> 5 Physics#> 6 Physics#> # β¦ with 57 more rows
ggplot(sci_tbl, aes(x = dept, y = count)) + geom_bar(stat = "identity")
ggplot(sci_tbl0, aes(x = dept)) + geom_bar(stat = "count")
p + geom_col(aes(colour = dept))
p + geom_col(aes(fill = dept))
p + geom_col(aes(fill = dept))
p + geom_col(fill = "#756bb1")
p + geom_col(aes(fill = dept), colour = "#000000")
colour
/color
, fill
:"red"
"#756bb1"
alpha
: opacity between 0 and 1shape
:"triangle open"
linetype
: "dashed"
size
, radius
: a numerical value (in millimetres)
Describe a bubble chart in terms of grammar of graphics.
gg: grammar of graphics {ggplot2}: the second version
coord_cartesian()
(default)coord_flip()
x
and y
)coord_map()
coord_polar()
p + geom_col(aes(fill = dept)) + coord_polar(theta = "y")
live demo:
ggplot()
ggplot(data)
ggplot(data, aes())
theme_grey()
/theme_gray()
theme_bw()
, theme_linedraw()
theme_light()
, theme_dark()
theme_minimal()
, theme_classic()
theme_void()
p + geom_col(aes(fill = dept)) + theme_bw()
library(ggthemes)p + geom_col(aes(fill = dept)) + theme_economist()
element_text()
image credit: Emi Tanaka
themes()
for fine tuneelement_text()
p + geom_col(aes(fill = dept)) + theme(axis.text.x = element_text(angle = 30, vjust = 0.1))
element_line()
image credit: Emi Tanaka
element_rect()
image credit: Emi Tanaka
mpg
data available from {ggplot2}
mpg
#> # A tibble: 234 x 11#> manufacturer model displ year cyl trans drv cty#> <chr> <chr> <dbl> <int> <int> <chr> <chr> <int>#> 1 audi a4 1.8 1999 4 auto(l5) f 18#> 2 audi a4 1.8 1999 4 manual(m⦠f 21#> 3 audi a4 2 2008 4 manual(m⦠f 20#> 4 audi a4 2 2008 4 auto(av) f 21#> 5 audi a4 2.8 1999 6 auto(l5) f 16#> 6 audi a4 2.8 1999 6 manual(m⦠f 18#> # ⦠with 228 more rows, and 3 more variables: hwy <int>,#> # fl <chr>, class <chr>
p_mpg <- ggplot(mpg, aes(displ, cty)) + geom_point(aes(colour = drv))p_mpg
facet_grid()
p_mpg + facet_grid(rows = vars(drv)) # facet_grid(~ drv)
grid
-> 2d matrix layoutfacet_grid()
p_mpg + facet_grid(cols = vars(drv)) # facet_grid(drv ~ .)
facet_grid()
p_mpg + facet_grid(rows = vars(drv), cols = vars(cyl)) # facet_grid(cyl ~ drv)
facet_grid()
facet_wrap()
p_mpg + facet_wrap(vars(drv, cyl), ncol = 3) # facet_wrap(~ drv + cyl, ncol = 3)
image credit: Emi Tanaka
movies <- as_tibble(jsonlite::read_json( "https://vega.github.io/vega-editor/app/data/movies.json", simplifyVector = TRUE))movies
#> # A tibble: 3,201 x 16#> Title US_Gross Worldwide_Gross US_DVD_Sales#> <chr> <int> <dbl> <int>#> 1 The Land Girls 146083 146083 NA#> 2 First Love, Last Ri⦠10876 10876 NA#> 3 I Married a Strange⦠203134 203134 NA#> 4 Let's Talk About Sex 373615 373615 NA#> 5 Slam 1009819 1087521 NA#> 6 Mississippi Mermaid 24551 2624551 NA#> # ⦠with 3,195 more rows, and 12 more variables:#> # Production_Budget <int>, Release_Date <chr>,#> # MPAA_Rating <chr>, Running_Time_min <int>,#> # Distributor <chr>, Source <chr>, Major_Genre <chr>,#> # Creative_Type <chr>, Director <chr>,#> # Rotten_Tomatoes_Rating <int>, IMDB_Rating <dbl>,#> # IMDB_Votes <int>
skimr::skim(movies)
#> ββ Data Summary ββββββββββββββββββββββββ#> Values#> Name movies#> Number of rows 3201 #> Number of columns 16 #> _______________________ #> Column type frequency: #> character 8 #> numeric 8 #> ________________________ #> Group variables None #> #> ββ Variable type: character ββββββββββββββββββββββββββββββββββββββββββββββββββββ#> skim_variable n_missing complete_rate min max empty n_unique whitespace#> 1 Title 1 1.00 1 66 0 3176 0#> 2 Release_Date 7 0.998 8 11 0 1603 0#> 3 MPAA_Rating 605 0.811 1 9 0 7 0#> 4 Distributor 232 0.928 3 33 0 174 0#> 5 Source 365 0.886 6 29 0 18 0#> 6 Major_Genre 275 0.914 5 19 0 12 0#> 7 Creative_Type 446 0.861 7 23 0 9 0#> 8 Director 1331 0.584 7 27 0 550 0#> #> ββ Variable type: numeric ββββββββββββββββββββββββββββββββββββββββββββββββββββββ#> skim_variable n_missing complete_rate mean sd#> 1 US_Gross 7 0.998 44002085. 62555311. #> 2 Worldwide_Gross 7 0.998 85343400. 149947343. #> 3 US_DVD_Sales 2637 0.176 34901547. 45895122. #> 4 Production_Budget 1 1.00 31069171. 35585913. #> 5 Running_Time_min 1992 0.378 110. 20.2 #> 6 Rotten_Tomatoes_Rating 880 0.725 54.3 28.1 #> 7 IMDB_Rating 213 0.933 6.28 1.25#> 8 IMDB_Votes 213 0.933 29909. 44938. #> p0 p25 p50 p75 p100 hist #> 1 0 5493221. 22019466. 56091762. 760167650 βββββ#> 2 0 8031285. 31168926. 97283797 2767891499 βββββ#> 3 618454 9906211. 20331558. 37794216. 352582053 βββββ#> 4 218 6575000 20000000 42000000 300000000 βββββ#> 5 46 95 107 121 222 βββββ#> 6 1 30 55 80 100 β
ββββ#> 7 1.4 5.6 6.4 7.2 9.2 βββ
ββ#> 8 18 4828. 15106 35810. 519541 βββββ
Are movies ratings consistent b/t IMDB & Rotten Tomatoes
ggplot(movies, aes(x = IMDB_Rating, y = Rotten_Tomatoes_Rating)) + geom_point(size = 0.5, alpha = 0.5) + geom_smooth(method = "gam") + theme(aspect.ratio = 1)
Are movies ratings consistent b/t IMDB & Rotten Tomatoes
ggplot(movies, aes(x = IMDB_Rating, y = Rotten_Tomatoes_Rating)) + geom_hex() + theme(aspect.ratio = 1)
The popularity of major genre
ggplot(movies, aes(y = Major_Genre)) + geom_bar()
The likeness of major genre
ggplot(movies) + geom_boxplot(aes(x = IMDB_Rating, y = Major_Genre))
The likeness of major genre
ggplot(movies) + geom_density(aes(x = IMDB_Rating, fill = Major_Genre))
The likeness of major genre
library(ggridges)ggplot(movies, aes(x = IMDB_Rating, y = Major_Genre)) + geom_density_ridges(aes(fill = Major_Genre))
{ggplot2} now has an official extension mechanism. This means that others can now easily create their own stats, geoms and positions, and provide them in other packages. This should allow the ggplot2 community to flourish, even as less development work happens in ggplot2 itself.
β‘οΈ https://exts.ggplot2.tidyverse.org/gallery/
The greatest value of a picture is when it forces us to notice what we never expected to see.
-- John W. Tukey
Keyboard shortcuts
β, β, Pg Up, k | Go to previous slide |
β, β, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
Esc | Back to slideshow |