class: center, middle, inverse, title-slide # STATS 220 ## Effective data visualisation👩🎨 --- class: inverse middle ## Graphical perception 👀 -- ### .blue[1.] Preattentive processing ### .blue[2.] Proximity ### .blue[3.] Position vs angle ### .blue[4.] Colour matters ??? * how human perceive a plot * given the same amount of info from plots, which data type help us to perceive more accurate info --- ## Preattentive processing .pull-left[ <img src="figure/t-shape-1.png" width="276" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="figure/shape-1.png" width="276" style="display: block; margin: auto;" /> ] -- .center[ <img src="figure/colour-1.png" width="276" style="display: block; margin: auto;" /> ] ??? * Have you noticed there are unusual data points? Can you locate them? * how about this one? --- ## Preattentive processing .small[.green[colour > form (shape)]] .pull-left[ <img src="figure/t-shape2-1.png" width="276" style="display: block; margin: auto;" /> ] .pull-right[ <img src="figure/shape2-1.png" width="276" style="display: block; margin: auto;" /> ] .center[ <img src="figure/colour2-1.png" width="276" style="display: block; margin: auto;" /> ] ??? * Which plot helps you to distinguish the data points? * Which plot consumes your least attention? * viewers can sense certain features, b/f our mind starts to pay attention to any specific objs. --- ## Proximity .small[.green[Make easy comparisons by grouping elements together]] .pull-left[ * compare time use by categories within each country <img src="figure/fill-cat-1.png" width="480" style="display: block; margin: auto;" /> ] .pull-right[ * compare time use by countries within each category <img src="figure/fill-country-1.png" width="480" style="display: block; margin: auto;" /> ] ??? * left: it's not easy to compare how much time spent on sleep across countries --- ## Position vs angle .small[.green[position > angle]] .pull-left[ <img src="figure/fill-country2-1.png" width="480" style="display: block; margin: auto;" /> ] .pull-right[ <img src="figure/angle-1.png" width="480" style="display: block; margin: auto;" /> .center[Pie charts are BAD‼️] ] ??? * "Other" and "Sleep", can u easily perceive these subtle differences from the pie charts? * But * A bar chart or dot chart is a preferable way of displaying this type of data. --- ## Absolute vs relative positions .small[.green[absolute > relative]] .pull-left[ <img src="figure/fill-country3-1.png" width="480" style="display: block; margin: auto;" /> ] .pull-right[ <img src="figure/rel-pos-1.png" width="480" style="display: block; margin: auto;" /> ] ??? * absolute -> share the same base line. we compare their heights * right: 100% bar chart. its much harder to compare NZ to others, bc NZ is rel. * The eye is good at judging abs and bad at judging relative. --- class: inverse middle ## Colour matters -- ### .blue[1.] Colour spaces ### .blue[2.] Colour scales ### .blue[3.] Colour blindness --- class: middle ## 3 ways to represent colour spaces 1. RGB 2. HSV/HSL 3. HCL .small[.green[for humans]] --- ## RGB * Red *(0-255)*: amount of .red[red] light * Green *(0-255)*: amount of .green[green] light * Blue *(0-255)*: amount of .blue[blue] light .center[<img src="img/rgb-viz-1.svg" width="60%">] .footnote[image credit: Claus O. Wilke] ??? * for computer & screen --- ## HSV * Hue *(0-360)*: hue of the colour * Saturation *(0-1)*: colourfulness relative to the brightness of the colour * Value *(0-1)*: subjective perception of amount of light emitted .center[<img src="img/hsv-viz-1.svg" width="70%">] .footnote[image credit: Claus O. Wilke] --- ## HSL * Hue *(0-360)*: hue of the colour * Lightness *(0-1)*: brightness relative to the brightness of a illuminated white * Saturation *(0-1)*: colourfulness relative to the brightness of the colour .center[<img src="img/hls-viz-1.svg" width="70%">] .footnote[image credit: Claus O. Wilke] --- ## HCL .small[.green[aka polar LUV]] * Hue *(0-360)*: hue of the colour * Chroma *(0-180)*: degree of vividness of a colour * Luminance *(0-100)*: amount of light emitted .center[<img src="img/cl-planes-1.svg" width="60%">] .footnote[image credit: Claus O. Wilke] ??? HCL: perceptually-based and device-independent --- ## Encoding too much <img src="figure/time-use-bad-1.png" width="960" style="display: block; margin: auto;" /> ??? * default ggplot2 colour scales -> not colour blind friendly * more than 7 qualitative colours, matching colours to categories are cumbersome * colour can be effective tool to enhance * choose colour wisely --- class: middle ## Colour scales .large[.green[**3**]] fundamental use cases for colours in data visualisations: 1. use colour to distinguish groups of data from each other 2. use colour to represent data values 3. use colour to highlight <hr> .large[.green[**3**]] types of colour palettes .small[[ColorBrewer](https://colorbrewer2.org/#type=sequential&scheme=BuGn&n=3)] 1. Qualitative 2. Sequential 3. Diverging --- ## Qualitative palettes .small[.green[for categorical data with no intrinsic ordering]] ```r colorspace::hcl_palettes("Qualitative", plot = TRUE, n = 7) ``` <img src="figure/colorspace-q-1.png" width="540" style="display: block; margin: auto;" /> ??? * use colour to distinguish discrete items/groups, but doesn't give impression of an order * a finite set of specific colours that are chosen to look clearly distinct * no one single colour stands out relative to the others --- ## Sequential palettes .small[.green[for ordered data from high to low]] ```r colorspace::hcl_palettes("Sequential", plot = TRUE, n = 7) ``` <img src="figure/colorspace-s-1.png" width="1020" style="display: block; margin: auto;" /> ??? * heatmap: used colour to represent data values, like temperature * representing continuous/ordered values * colours indicate which data values are larger or smaller * the diff bt colours shows the diff b/t data values * seq colour needs to be perceived to vary uniformly across its entire range by changing hues --- ## Diverging palettes .small[.green[for mid-range values and extremes at both ends]] ```r colorspace::hcl_palettes("Diverging", plot = TRUE, n = 7) ``` <img src="figure/colorspace-d-1.png" width="600" style="display: block; margin: auto;" /> ??? * vis the deviation of data values in one of 2 directions rel to a neutral midpoint * a straightforward eg is vis +/- values * think of a diverging scale as joining 2 seq sales at a common midpoint --- ## Use colour palettes .pull-left[ <br> ```r time_use %>% ggplot(aes(country, time_minutes)) + geom_col( aes(fill = category), position = "dodge") + * scale_fill_brewer(palette = "Dark2") + labs(y = "") + theme(legend.position = "bottom") ``` ] .pull-right[ <img src="figure/gg-palette2-1.png" width="540" style="display: block; margin: auto;" /> ] --- ## Set custom colours .pull-left[ <br> ```r time_use %>% ggplot(aes(country, time_minutes)) + geom_col( aes(fill = category), position = "dodge") + * scale_fill_manual( * values = c("#EF476F", "#FFD166", * "#06D6A0", "#118AB2", * "#073B4C", "grey")) + labs(y = "") + theme(legend.position = "bottom") ``` ] .pull-right[ <img src="figure/gg-manual2-1.png" width="540" style="display: block; margin: auto;" /> ] --- ## Colour-vision deficiency .pull-left[ .center[<img src = "https://clauswilke.com/dataviz/pitfalls_of_color_use_files/figure-html/red-green-cvd-sim-1.png", width = 90%></img>] * Red-green colour-vision deficiency (deuteranomaly & protanomaly) is the most common. ] .pull-right[ .center[<img src = "https://clauswilke.com/dataviz/pitfalls_of_color_use_files/figure-html/blue-green-cvd-sim-1.png", width = 90%></img>] * Blue-green colour-vision deficiency (tritanomaly) is rare but does occur. ] <br> ℹ️ *Approximately 8% of males and 0.5% of females suffer from some sort of color-vision deficiency.* .footnote[reference: Claus O. Wilke [Fundamentals of Data Visualization](https://clauswilke.com/dataviz/)] ??? * A small prop of people with impaired colour vision have difficulty to distinguish certain types of colours --- ## Choose colours using {colorspace} .pull-left[ * `colorspace::hclwizard()` .center[<img src = "img/wizard.png", height = "320px"></img>] ] .pull-right[ * `colorspace::hcl_color_picker()` .center[<img src = "img/hcl.png", height = "320px"></img>] ] --- class: middle ## Scales * Control how data is mapped to perceptual properties, and produce guides (axes and legends) which allow us to read the plot. * Important arguments: `breaks`, `labels`, and `limits`. * Naming scheme: `scale_[aes]_[datatype]()` ---
??? --- class: inverse middle ## Publication-ready visualisation 👩🎨 --- .left-column[ ## Towards publication-ready visualisation ] .right-column[ <blockquote class="twitter-tweet"><p lang="en" dir="ltr"><a href="https://twitter.com/hashtag/MakingOf?src=hash&ref_src=twsrc%5Etfw">#MakingOf</a> of last week's <a href="https://twitter.com/hashtag/TidyTuesday?src=hash&ref_src=twsrc%5Etfw">#TidyTuesday</a> plot <a href="https://t.co/Jsg41id5Nw">https://t.co/Jsg41id5Nw</a> <a href="https://t.co/dP6tSrjHy0">pic.twitter.com/dP6tSrjHy0</a></p>— Georgios Karamanis (@geokaramanis) <a href="https://twitter.com/geokaramanis/status/1374066377879908358?ref_src=twsrc%5Etfw">March 22, 2021</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] --- class: middle .pull-left[ ## .center[Exploratory data visualisation] * For **internal use** only. Need to be able to create rapidly because your first attempt will never be the most revealing. * Iteration is crucial for developing multiple displays of your data. ] .pull-right[ ## .center[Communication graphics] * When you **communicate** your findings, you need to spend much time polishing your graphics to eliminate distractions and focus on the storytelling. * Iteration is crucial to ensure all the bits and pieces works well: labels, color choices, tick marks... ] --- class: middle ## Case study: COVID-19 ```r covid19 <- read_csv("data/covid19-daily-cases.csv") covid19 ``` ``` #> # A tibble: 15,677 x 3 #> country_region date confirmed #> <chr> <date> <dbl> #> 1 Afghanistan 2020-03-01 1 #> 2 Afghanistan 2020-03-02 1 #> 3 Afghanistan 2020-03-03 2 #> 4 Afghanistan 2020-03-04 4 #> 5 Afghanistan 2020-03-05 4 #> 6 Afghanistan 2020-03-06 4 #> # … with 15,671 more rows ``` --- .left-column[ ## COVID-19 ### - scale-y ] .right-column[ ## Data as is .panelset[ .panel[.panel-name[Plot] <img src="figure/covid-scale0-1.png" width="840" style="display: block; margin: auto;" /> .panel[.panel-name[Code] ```r covid19 %>% ggplot(aes( x = date, y = confirmed, colour = country_region)) + geom_line() + * guides(colour = FALSE) # rm colour legend ``` ] ] ] ] ??? full screen of legends --- .left-column[ ## COVID-19 ### - scale-y ] .right-column[ ## Logarithmic scale .panelset[ .panel[.panel-name[Plot] <img src="figure/covid-scale1-1.png" width="840" style="display: block; margin: auto;" /> .panel[.panel-name[Code] ```r covid19 %>% ggplot(aes( x = date, * y = log10(confirmed), colour = country_region)) + geom_line() + guides(colour = FALSE) ``` ] ] ] ] ??? perceive the rate of infections, slowing down --- .left-column[ ## COVID-19 ### - scale-y ] .right-column[ ## Logarithmic scale .panelset[ .panel[.panel-name[Plot] <img src="figure/covid-scale-log10-1.png" width="840" style="display: block; margin: auto;" /> .panel[.panel-name[Code] ```r covid19 %>% ggplot(aes( x = date, y = confirmed, colour = country_region)) + geom_line() + guides(colour = FALSE) + * scale_y_log10() ``` ] ] ] .footnote[Rob J Hyndman's blog post on [Why log ratios are useful for tracking COVID-19](https://robjhyndman.com/hyndsight/logratios-covid19/)] ] --- .left-column[ ## COVID-19 ### - scale-y ### - scale-x ] .right-column[ ```r covid19_rel <- covid19 %>% group_by(country_region) %>% mutate(days = as.numeric(date - min(date))) %>% ungroup() covid19_rel ``` ``` #> # A tibble: 15,677 x 4 #> country_region date confirmed days #> <chr> <date> <dbl> <dbl> #> 1 Afghanistan 2020-03-01 1 0 #> 2 Afghanistan 2020-03-02 1 1 #> 3 Afghanistan 2020-03-03 2 2 #> 4 Afghanistan 2020-03-04 4 3 #> 5 Afghanistan 2020-03-05 4 4 #> 6 Afghanistan 2020-03-06 4 5 #> # … with 15,671 more rows ``` ] ??? log(0) -> Inf --- .left-column[ ## COVID-19 ### - scale-y ### - scale-x ] .right-column[ ## Relative days .panelset[ .panel[.panel-name[Plot] <img src="figure/covid-rel-p-1.png" width="840" style="display: block; margin: auto;" /> .panel[.panel-name[Code] ```r covid19_rel %>% ggplot(aes( x = days, y = confirmed, colour = country_region)) + geom_line() + scale_y_log10() + guides(colour = FALSE) ``` ] ] ] ] --- .left-column[ ## COVID-19 ### - scale-y ### - scale-x ### - highlight ] .right-column[ ## Highlight New Zealand .panelset[ .panel[.panel-name[Plot] <img src="figure/covid-rel-nz-1.png" width="840" style="display: block; margin: auto;" /> .panel[.panel-name[Code] ```r covid19_nz <- covid19_rel %>% filter(country_region == "New Zealand") p_nz <- covid19_rel %>% ggplot(aes(x = days, y = confirmed, * group = country_region)) + geom_line(colour = "grey", alpha = 0.5) + geom_line(colour = "#238b45", size = 1, data = covid19_nz) + scale_y_log10() + guides(colour = FALSE) p_nz ``` ] ] ] ] --- .left-column[ ## COVID-19 ### - scale-y ### - scale-x ### - highlight ### - annotate ] .right-column[ ## Label New Zealand .panelset[ .panel[.panel-name[Plot] <img src="figure/covid-rel-nz-lab-1.png" width="840" style="display: block; margin: auto;" /> .panel[.panel-name[Code] ```r p_nz <- p_nz + geom_label(aes( * x = max(days), y = max(confirmed), label = country_region), data = covid19_nz, colour = "#238b45", nudge_x = 3, nudge_y = .5) p_nz ``` ] ] ] ] --- .left-column[ ## COVID-19 ### - scale-y ### - scale-x ### - highlight ### - annotate ### - limits ] .right-column[ ## Expand limits .panelset[ .panel[.panel-name[Plot] <img src="figure/covid-rel-nz-lim-1.png" width="840" style="display: block; margin: auto;" /> .panel[.panel-name[Code] ```r p_nz <- p_nz + * scale_y_log10(labels = scales::label_comma()) + * xlim(c(0, 100)) p_nz ``` ] ] ] ] --- .left-column[ ## COVID-19 ### - scale-y ### - scale-x ### - highlight ### - annotate ### - limits ### - labels ] .right-column[ ## Every figure needs the title .panelset[ .panel[.panel-name[Plot] <img src="figure/covid-rel-nz-title-1.png" width="840" style="display: block; margin: auto;" /> .panel[.panel-name[Code] ```r p_nz <- p_nz + labs( x = "Days since March 1", y = "Confirmed cases (on log10)", title = "Worldwide coronavirus confirmed cases", subtitle = "highlighting New Zealand", caption = "Data source: John Hopkins University, CSSE" ) p_nz ``` ] ] ] ] --- .left-column[ ## COVID-19 ### - scale-y ### - scale-x ### - highlight ### - annotate ### - limits ### - labels ### - theme ] .right-column[ ## Apply themes .panelset[ .panel[.panel-name[Plot] <img src="figure/covid-rel-nz-theme-1.png" width="840" style="display: block; margin: auto;" /> .panel[.panel-name[Code] ```r # remotes::install_github("Financial-Times/ftplottools") p_nz + ftplottools::ft_theme() + theme( plot.title.position = "plot", plot.background = element_rect(fill = "#FFF1E0")) ``` ] ] ] ] --- class: inverse middle ## Interactive graphics --- ## Easily turn ggplot2 into [plotly](https://plotly-r.com) ```r library(plotly) *ggplotly(p_nz) ``` .center[<iframe src="figure/plotly.html" width="800" height="500" seamless="seamless" frameBorder="0"> </iframe>] --- class: inverse middle ## Generative art --- ## .center[A peek at [generative art works by Thomas Lin Pedersen](https://www.data-imaginist.com/art)] <img src="https://d33wubrfki0l68.cloudfront.net/b0ca94c787e0e509fe70e7b9285fafffb47a649a/3b7de/art/005_genesis/genesis338_hu19fc60717d90451585acb0fb6a565bcb_106689_500x500_fill_box_center_2.png" width="280px"> <img src="https://d33wubrfki0l68.cloudfront.net/62f4ab9648b5f2359756c5d7e7f4019defe7faf5/e68a3/art/008_emergence/emergence6802_hu7c58ad2a8fca059ec9ebad497d24566e_22821446_500x500_fill_box_center_2.png" width="280px"> <img src="https://d33wubrfki0l68.cloudfront.net/c3f652266f3310fa6ef7bc6f8c1f3080cac74d2e/522f0/art/009_prism/prism627_hu5fd6b3df9c854c0c26e8cdeebe574e51_31622651_500x500_fill_box_center_2.png" width="280px"> <img src="https://d33wubrfki0l68.cloudfront.net/64a2403d266498344f73af33668397e34e9b2b28/857ea/art/008_emergence/emergence526_huc93094888f54e1c55d93d2c9f1d65eda_17094834_500x500_fill_box_center_2.png" width="280px"> <img src="https://d33wubrfki0l68.cloudfront.net/1d067ea931f5be062c1a93bf29fd1e292e84c72b/e1949/art/005_genesis/genesis99_hue66520854e953909155239426845f632_183306_500x500_fill_box_center_2.png" width="280px"> <img src="https://d33wubrfki0l68.cloudfront.net/ba46a11d676efa5481fb8abc447539022241d5a5/4d746/art/003_unfold/unfold06_hu9cba8b0d8a0b0efc5b2fd5f2cd3f12c5_113220_500x500_fill_box_center_2.png" width="280px"> <img src="https://d33wubrfki0l68.cloudfront.net/19592674aaa73f13cd74cee56c819f9efe6cec2c/39caf/art/007_storms/storms4500_hu299e9f5ad6cb567eb7c45aeae728a52d_2314689_500x500_fill_box_center_2.png" width="280px"> <img src="https://d33wubrfki0l68.cloudfront.net/749bf5026fe35d4c1322a8ecde2c6a43294d463d/c2664/art/005_genesis/genesis9458_hu6cbae2b11ecd76683bfe4f6d913c2b92_401118_500x500_fill_box_center_2.png" width="280px"> --- ## Reading .pull-left[ .center[[<img src="https://d33wubrfki0l68.cloudfront.net/b88ef926a004b0fce72b2526b0b5c4413666a4cb/24a30/cover.png" height="520px">](https://r4ds.had.co.nz)] ] .pull-right[ * [Graphics for communication](https://r4ds.had.co.nz/graphics-for-communication.html) * [BBC Visual and Data Journalism cookbook for R graphics](https://bbc.github.io/rcookbook/) ]