This lab exercise is due 23:59 Monday 17 May (NZST).

  • You should submit an R Markdown file (i.e. file extension .Rmd).
  • You should submit your Rmd file on Canvas.
  • Late assignments are NOT accepted unless prior arrangement for medical/compassionate reasons.

In this lab exercise, you are going to practise reproducible reporting using R Markdown for the communication purpose. The data set used is akl-aqi19.csv that contains daily maximum AQIs recorded at Queen Street in 2019.

Suppose that you have created an Rproj for this course. You need to download akl-aqi19.csv here to data/ under your Rproj folder.

  • NO marks will be given to the submission that cannot be reproduced on the hosted runner.
  • The rendered report should be an HTML document. Marking is based on the rendered document, instead of the Rmd file.
  • Use headings appropriately to split sections.
  • Show and place all relevant source code, output, and narratives to the appropriate sections; otherwise, marks will be deducted.
  • Set the figure’s size appropriately for clear presentation.

Marking scheme for each question

  1. Clean and readable code: 0.1 pts
  2. Correct output: 0.3 pts
  3. Sound explanations: 0.1 pts

R chunk for global setting

Include the following chunks straightly after the YAML header:

```{r setup, include = FALSE}
library(knitr)
opts_knit$set(root.dir = here::here())
opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE, fig.retina = 3,
  comment = "#>")
```
library(tidyverse)
library(lubridate)
daily_aqi <- read_csv("data/akl-aqi19.csv")
aqi_cat <- fct_inorder(c("Good", "Moderate", "Unhealthy for Sensitive",
  "Unhealthy", "Very Unhealthy", "Hazardous"))
aqi_pal <- setNames(
  c("#00E400", "#FFFF00", "#FF7E00", "#FF0000", "#8F3F97", "#7E0023"),
  aqi_cat)

Data preparation

Add 3 new columns to daily_aqi:

  1. aqi_cat: divide max_aqi into 6 categories:
    • 0 to 50: Good
    • 51 to 100: Moderate
    • 101 to 150: Unhealthy for Sensitive
    • 151 to 200: Unhealthy
    • 201 to 300: Very Unhealthy
    • 301 and higher: Hazardous
  2. month: extract month of the year
  3. mday: extract day of the month

Print the newly created tibble.

brks <- c(0, 50, 100, 150, 200, 300, Inf)
daily_aqi_cat <- daily_aqi %>% 
  mutate(
    aqi_cat = cut(max_aqi, breaks = brks, labels = aqi_cat),
    month = month(date, label = TRUE),
    mday = mday(date)) 
daily_aqi_cat
#> # A tibble: 365 x 6
#>    date       location     max_aqi aqi_cat  month  mday
#>    <date>     <chr>          <dbl> <fct>    <ord> <int>
#>  1 2019-01-01 queen_street      36 Good     Jan       1
#>  2 2019-01-02 queen_street      30 Good     Jan       2
#>  3 2019-01-03 queen_street      35 Good     Jan       3
#>  4 2019-01-04 queen_street      41 Good     Jan       4
#>  5 2019-01-05 queen_street      54 Moderate Jan       5
#>  6 2019-01-06 queen_street      48 Good     Jan       6
#>  7 2019-01-07 queen_street      31 Good     Jan       7
#>  8 2019-01-08 queen_street      31 Good     Jan       8
#>  9 2019-01-09 queen_street      34 Good     Jan       9
#> 10 2019-01-10 queen_street      40 Good     Jan      10
#> # … with 355 more rows

Question 1 [0.5 pts]

  • Present an HTML table that counts the number of days by months and AQI categories.
  • Find out what happened in October when AQI reached to the “Very unhealthy” level, and provide a brief explanation in bold.
HINTS
  1. You need to use group_by(.drop = FALSE) for keeping zeros.

daily_aqi_cat %>% 
  group_by(month, aqi_cat, .drop = FALSE) %>% 
  summarise(days = n()) %>% 
  ungroup() %>% 
  pivot_wider(names_from = aqi_cat, values_from = days) %>% 
  rename(Month = month) %>% 
  knitr::kable()
Month Good Moderate Unhealthy for Sensitive Unhealthy Very Unhealthy Hazardous
Jan 26 5 0 0 0 0
Feb 25 3 0 0 0 0
Mar 22 9 0 0 0 0
Apr 24 6 0 0 0 0
May 20 11 0 0 0 0
Jun 13 17 0 0 0 0
Jul 13 18 0 0 0 0
Aug 11 20 0 0 0 0
Sep 15 15 0 0 0 0
Oct 14 15 0 0 2 0
Nov 11 19 0 0 0 0
Dec 20 11 0 0 0 0

SkyCity convention centre fire

Question 2 [0.5 pts]

  • Present a heatmap that gives an overview of AQI in Auckland over the year.
  • Explain what primarily influenced the air quality in the second half of the year in bold.

NOTE:

  1. Each tile is a square, with width & height to be 0.95.
  2. The text size for AQI values is 3.
  3. The legend on the top displays unused levels in one row.
  4. The colour scales match the AQI standard.
  5. The default expansion around the data plot is disabled.
HINTS
  1. You need to use expand = expansion() in scale_*() to disable expansion.

daily_aqi_cat %>% 
  ggplot(aes(x = mday, y = fct_rev(month), fill = aqi_cat)) +
  geom_tile(width = 0.95, height = 0.95) +
  geom_text(aes(label = max_aqi), size = 3) +
  labs(x = "Day of the Month", y = "Month", fill = "AQI") +
  scale_x_continuous(breaks = c(10, 20, 30), expand = expansion()) +
  scale_fill_manual(values = aqi_pal, drop = FALSE) +
  coord_fixed() +
  guides(fill = guide_legend(nrow = 1)) +
  theme_minimal() +
  theme(legend.position = "top")

Australian bushfires