This lab exercise is due 23:59 Monday 24 May (NZST).

  • You should submit an R file (i.e. file extension .R) containing R code that assigns the appropriate values to the appropriate symbols.
  • Your R file will be executed in order and checked against the values that have been assigned to the symbols using an automatic grading system. Marks will be fully deducted for non-identical results.
  • Intermediate steps to achieve the final results will NOT be checked.
  • Each question is worth 0.2 points.
  • You should submit your R file on Canvas.
  • Late assignments are NOT accepted unless prior arrangement for medical/compassionate reasons.

In this lab exercise, you are going to scrape top 50 horror films rated by users from IMDB. You shall use the following code snippet (and include them upfront in your R file) for this lab session:

library(rvest)
library(tidyverse)
link <- "https://www.imdb.com/search/title/?title_type=feature&num_votes=25000,&genres=horror&sort=user_rating,desc&view=simple&sort=user_rating"
horror <- read_html(link)
horror
#> {html_document}
#> <html xmlns:og="http://ogp.me/ns#" xmlns:fb="http://www.facebook.com/2008/fbml">
#> [1] <head>\n<meta http-equiv="Content-Type" content="text/html; cha ...
#> [2] <body id="styleguide-v2" class="fixed">\n            <img heigh ...

Question 1

Scrape top 50 horror films’ posters.

You should end up with a character vector of length 50, called film_poster.

head(film_poster)
#> [1] "https://m.media-amazon.com/images/M/MV5BNTQwNDM1YzItNDAxZC00NWY2LTk0M2UtNDIwNWI5OGUyNWUxXkEyXkFqcGdeQXVyNzkwMjQ5NzM@._V1_UX34_CR0,0,34,50_AL_.jpg"
#> [2] "https://m.media-amazon.com/images/M/MV5BZWFlYmY2MGEtZjVkYS00YzU4LTg0YjQtYzY1ZGE3NTA5NGQxXkEyXkFqcGdeQXVyMTQxNzMzNDI@._V1_UX34_CR0,0,34,50_AL_.jpg"
#> [3] "https://m.media-amazon.com/images/M/MV5BMmQ2MmU3NzktZjAxOC00ZDZhLTk4YzEtMDMyMzcxY2IwMDAyXkEyXkFqcGdeQXVyNzkwMjQ5NzM@._V1_UX34_CR0,0,34,50_AL_.jpg"
#> [4] "https://m.media-amazon.com/images/M/MV5BYmQxNmU4ZjgtYzE5Mi00ZDlhLTlhOTctMzJkNjk2ZGUyZGEwXkEyXkFqcGdeQXVyMzgxMDA0Nzk@._V1_UY50_CR0,0,34,50_AL_.jpg"
#> [5] "https://m.media-amazon.com/images/M/MV5BNDkxMzk2ODU4N15BMl5BanBnXkFtZTgwNTM4NjIzMjE@._V1_UY50_CR0,0,34,50_AL_.jpg"                                
#> [6] "https://m.media-amazon.com/images/M/MV5BNGViZWZmM2EtNGYzZi00ZDAyLTk3ODMtNzIyZTBjN2Y1NmM1XkEyXkFqcGdeQXVyNTAyODkwOQ@@._V1_UX34_CR0,0,34,50_AL_.jpg"

Question 2

Scrape top 50 horror films’ titles.

You should end up with a character vector of length 50, called movie.

head(movie)
#> [1] "Psycho"            "The Shining"       "Alien"            
#> [4] "Tumbbad"           "The Blue Elephant" "The Thing"

Question 3

Scrape top 50 horror films’ release years.

You should end up with a double vector of length 50, called year.

HINTS
  1. You may find one of {readr}’s parse_*() functions useful for extracting numbers.

year
#>  [1] 1960 1980 1979 2018 2014 1982 1962 1955 1920 1973 1968 2008 2004
#> [14] 1978 1968 1933 1932 1922 2010 1961 1935 1931 2017 2014 2000 1987
#> [27] 1978 1965 1963 1960 1960 1956 1933 2018 2016 2011 2009 2004 2002
#> [40] 2001 1986 1975 1954 2019 2018 2016 2010 2013 2007 1994

Question 4

Scrape top 50 horror films’ user ratings.

You should end up with a double vector of length 50, called rating.

rating
#>  [1] 8.5 8.4 8.4 8.3 8.2 8.1 8.1 8.1 8.1 8.0 8.0 7.9 7.9 7.9 7.9 7.9
#> [17] 7.9 7.9 7.8 7.8 7.8 7.8 7.7 7.7 7.7 7.7 7.7 7.7 7.7 7.7 7.7 7.7
#> [33] 7.7 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.6 7.5 7.5 7.5 7.5 7.5
#> [49] 7.5 7.5

Question 5

Create a tibble that contains these scraped films’ information ordered by their ranks. The column names are Rank, Poster, Movie, Year, Rating respectively.

You should end up with a tibble, called top50_horror.

NOTE: the Rank column is of integers.

top50_horror
#> # A tibble: 50 x 5
#>     Rank Poster                          Movie             Year Rating
#>    <int> <chr>                           <chr>            <dbl>  <dbl>
#>  1     1 https://m.media-amazon.com/ima… Psycho            1960    8.5
#>  2     2 https://m.media-amazon.com/ima… The Shining       1980    8.4
#>  3     3 https://m.media-amazon.com/ima… Alien             1979    8.4
#>  4     4 https://m.media-amazon.com/ima… Tumbbad           2018    8.3
#>  5     5 https://m.media-amazon.com/ima… The Blue Elepha…  2014    8.2
#>  6     6 https://m.media-amazon.com/ima… The Thing         1982    8.1
#>  7     7 https://m.media-amazon.com/ima… What Ever Happe…  1962    8.1
#>  8     8 https://m.media-amazon.com/ima… Les diaboliques   1955    8.1
#>  9     9 https://m.media-amazon.com/ima… Das Cabinet des…  1920    8.1
#> 10    10 https://m.media-amazon.com/ima… The Exorcist      1973    8  
#> # … with 40 more rows

Question4fun (NO marks)

Turn top50_horror into a searchable paged HTML table as follows.

library(reactable)
library(htmltools)