+ - 0:00:00
Notes for current slide
Notes for next slide

STATS 220

Effective data visualisation👩‍🎨

1 / 43

Graphical perception 👀

2 / 43

Graphical perception 👀

1. Preattentive processing

2. Proximity

3. Position vs angle

4. Colour matters

2 / 43
  • how human perceive a plot
  • given the same amount of info from plots, which data type help us to perceive more accurate info

Preattentive processing

3 / 43

Preattentive processing

3 / 43

Preattentive processing

3 / 43
  • Have you noticed there are unusual data points? Can you locate them?
  • how about this one?

Preattentive processing colour > form (shape)

4 / 43
  • Which plot helps you to distinguish the data points?
  • Which plot consumes your least attention?
  • viewers can sense certain features, b/f our mind starts to pay attention to any specific objs.

Proximity Make easy comparisons by grouping elements together

  • compare time use by categories within each country
  • compare time use by countries within each category
5 / 43
  • left: it's not easy to compare how much time spent on sleep across countries

Position vs angle position > angle

Pie charts are BAD‼️

6 / 43
  • "Other" and "Sleep", can u easily perceive these subtle differences from the pie charts?
  • But
  • A bar chart or dot chart is a preferable way of displaying this type of data.

Absolute vs relative positions absolute > relative

7 / 43
  • absolute -> share the same base line. we compare their heights
  • right: 100% bar chart. its much harder to compare NZ to others, bc NZ is rel.
  • The eye is good at judging abs and bad at judging relative.

Colour matters

8 / 43

Colour matters

1. Colour spaces

2. Colour scales

3. Colour blindness

8 / 43

3 ways to represent colour spaces

  1. RGB
  2. HSV/HSL
  3. HCL for humans
9 / 43

RGB

  • Red (0-255): amount of red light
  • Green (0-255): amount of green light
  • Blue (0-255): amount of blue light

image credit: Claus O. Wilke

10 / 43
  • for computer & screen

HSV

  • Hue (0-360): hue of the colour
  • Saturation (0-1): colourfulness relative to the brightness of the colour
  • Value (0-1): subjective perception of amount of light emitted

image credit: Claus O. Wilke

11 / 43

HSL

  • Hue (0-360): hue of the colour
  • Lightness (0-1): brightness relative to the brightness of a illuminated white
  • Saturation (0-1): colourfulness relative to the brightness of the colour

image credit: Claus O. Wilke

12 / 43

HCL aka polar LUV

  • Hue (0-360): hue of the colour
  • Chroma (0-180): degree of vividness of a colour
  • Luminance (0-100): amount of light emitted

image credit: Claus O. Wilke

13 / 43

HCL: perceptually-based and device-independent

Encoding too much

14 / 43
  • default ggplot2 colour scales -> not colour blind friendly
  • more than 7 qualitative colours, matching colours to categories are cumbersome
  • colour can be effective tool to enhance
  • choose colour wisely

Colour scales

3 fundamental use cases for colours in data visualisations:

  1. use colour to distinguish groups of data from each other
  2. use colour to represent data values
  3. use colour to highlight

3 types of colour palettes ColorBrewer

  1. Qualitative
  2. Sequential
  3. Diverging
15 / 43

Qualitative palettes for categorical data with no intrinsic ordering

colorspace::hcl_palettes("Qualitative", plot = TRUE, n = 7)

16 / 43
  • use colour to distinguish discrete items/groups, but doesn't give impression of an order
  • a finite set of specific colours that are chosen to look clearly distinct
  • no one single colour stands out relative to the others

Sequential palettes for ordered data from high to low

colorspace::hcl_palettes("Sequential", plot = TRUE, n = 7)

17 / 43
  • heatmap: used colour to represent data values, like temperature
  • representing continuous/ordered values
  • colours indicate which data values are larger or smaller
  • the diff bt colours shows the diff b/t data values
  • seq colour needs to be perceived to vary uniformly across its entire range by changing hues

Diverging palettes for mid-range values and extremes at both ends

colorspace::hcl_palettes("Diverging", plot = TRUE, n = 7)

18 / 43
  • vis the deviation of data values in one of 2 directions rel to a neutral midpoint
  • a straightforward eg is vis +/- values
  • think of a diverging scale as joining 2 seq sales at a common midpoint

Use colour palettes


time_use %>%
ggplot(aes(country, time_minutes)) +
geom_col(
aes(fill = category),
position = "dodge") +
scale_fill_brewer(palette = "Dark2") +
labs(y = "") +
theme(legend.position = "bottom")

19 / 43

Set custom colours


time_use %>%
ggplot(aes(country, time_minutes)) +
geom_col(
aes(fill = category),
position = "dodge") +
scale_fill_manual(
values = c("#EF476F", "#FFD166",
"#06D6A0", "#118AB2",
"#073B4C", "grey")) +
labs(y = "") +
theme(legend.position = "bottom")

20 / 43

Colour-vision deficiency

  • Red-green colour-vision deficiency (deuteranomaly & protanomaly) is the most common.

  • Blue-green colour-vision deficiency (tritanomaly) is rare but does occur.


ℹ️ Approximately 8% of males and 0.5% of females suffer from some sort of color-vision deficiency.

reference: Claus O. Wilke Fundamentals of Data Visualization

21 / 43
  • A small prop of people with impaired colour vision have difficulty to distinguish certain types of colours

Choose colours using {colorspace}

  • colorspace::hclwizard()

  • colorspace::hcl_color_picker()

22 / 43

Scales

  • Control how data is mapped to perceptual properties, and produce guides (axes and legends) which allow us to read the plot.
  • Important arguments: breaks, labels, and limits.
  • Naming scheme: scale_[aes]_[datatype]()
23 / 43
24 / 43

Publication-ready visualisation 👩‍🎨

25 / 43

Towards publication-ready visualisation

26 / 43

Exploratory data visualisation

  • For internal use only. Need to be able to create rapidly because your first attempt will never be the most revealing.
  • Iteration is crucial for developing multiple displays of your data.

Communication graphics

  • When you communicate your findings, you need to spend much time polishing your graphics to eliminate distractions and focus on the storytelling.
  • Iteration is crucial to ensure all the bits and pieces works well: labels, color choices, tick marks...
27 / 43

Case study: COVID-19

covid19 <- read_csv("data/covid19-daily-cases.csv")
covid19
#> # A tibble: 15,677 x 3
#> country_region date confirmed
#> <chr> <date> <dbl>
#> 1 Afghanistan 2020-03-01 1
#> 2 Afghanistan 2020-03-02 1
#> 3 Afghanistan 2020-03-03 2
#> 4 Afghanistan 2020-03-04 4
#> 5 Afghanistan 2020-03-05 4
#> 6 Afghanistan 2020-03-06 4
#> # … with 15,671 more rows
28 / 43

COVID-19

- scale-y

Data as is

covid19 %>%
ggplot(aes(
x = date,
y = confirmed,
colour = country_region)) +
geom_line() +
guides(colour = FALSE) # rm colour legend
29 / 43

full screen of legends

COVID-19

- scale-y

Logarithmic scale

covid19 %>%
ggplot(aes(
x = date,
y = log10(confirmed),
colour = country_region)) +
geom_line() +
guides(colour = FALSE)
30 / 43

perceive the rate of infections, slowing down

COVID-19

- scale-y

Logarithmic scale

covid19 %>%
ggplot(aes(
x = date,
y = confirmed,
colour = country_region)) +
geom_line() +
guides(colour = FALSE) +
scale_y_log10()

Rob J Hyndman's blog post on Why log ratios are useful for tracking COVID-19

31 / 43

COVID-19

- scale-y

- scale-x

covid19_rel <- covid19 %>%
group_by(country_region) %>%
mutate(days = as.numeric(date - min(date))) %>%
ungroup()
covid19_rel
#> # A tibble: 15,677 x 4
#> country_region date confirmed days
#> <chr> <date> <dbl> <dbl>
#> 1 Afghanistan 2020-03-01 1 0
#> 2 Afghanistan 2020-03-02 1 1
#> 3 Afghanistan 2020-03-03 2 2
#> 4 Afghanistan 2020-03-04 4 3
#> 5 Afghanistan 2020-03-05 4 4
#> 6 Afghanistan 2020-03-06 4 5
#> # … with 15,671 more rows
32 / 43

log(0) -> Inf

COVID-19

- scale-y

- scale-x

Relative days

covid19_rel %>%
ggplot(aes(
x = days,
y = confirmed,
colour = country_region)) +
geom_line() +
scale_y_log10() +
guides(colour = FALSE)
33 / 43

COVID-19

- scale-y

- scale-x

- highlight

Highlight New Zealand

covid19_nz <- covid19_rel %>%
filter(country_region == "New Zealand")
p_nz <- covid19_rel %>%
ggplot(aes(x = days, y = confirmed,
group = country_region)) +
geom_line(colour = "grey", alpha = 0.5) +
geom_line(colour = "#238b45", size = 1, data = covid19_nz) +
scale_y_log10() +
guides(colour = FALSE)
p_nz
34 / 43

COVID-19

- scale-y

- scale-x

- highlight

- annotate

Label New Zealand

p_nz <- p_nz +
geom_label(aes(
x = max(days), y = max(confirmed),
label = country_region), data = covid19_nz,
colour = "#238b45", nudge_x = 3, nudge_y = .5)
p_nz
35 / 43

COVID-19

- scale-y

- scale-x

- highlight

- annotate

- limits

Expand limits

p_nz <- p_nz +
scale_y_log10(labels = scales::label_comma()) +
xlim(c(0, 100))
p_nz
36 / 43

COVID-19

- scale-y

- scale-x

- highlight

- annotate

- limits

- labels

Every figure needs the title

p_nz <- p_nz +
labs(
x = "Days since March 1",
y = "Confirmed cases (on log10)",
title = "Worldwide coronavirus confirmed cases",
subtitle = "highlighting New Zealand",
caption = "Data source: John Hopkins University, CSSE"
)
p_nz
37 / 43

COVID-19

- scale-y

- scale-x

- highlight

- annotate

- limits

- labels

- theme

Apply themes

# remotes::install_github("Financial-Times/ftplottools")
p_nz +
ftplottools::ft_theme() +
theme(
plot.title.position = "plot",
plot.background = element_rect(fill = "#FFF1E0"))
38 / 43

Interactive graphics

39 / 43

Easily turn ggplot2 into plotly

library(plotly)
ggplotly(p_nz)

40 / 43

Generative art

41 / 43

Graphical perception 👀

2 / 43
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow