This R Notebook is the complement to my blog post Analyzing IMDb Data The Intended Way, with R and ggplot2.

This notebook is licensed under the MIT License. If you use the code or data visualization designs contained within this notebook, it would be greatly appreciated if proper attribution is given back to this notebook and/or myself. Thanks! :)

IMDb data retrieved on July 4th 2018.

Information courtesy of IMDb ( Used with permission.

Helper function to read IMDB files given filename.

read_imdb <- function(data_path) {
  path <- "/Volumes/Extreme 510/Data/imdb/"
  read_tsv(paste0(path, data_path), na = "\\N", quote='', progress=F)

Helper function to pretty print the size of a dataframe for charts/notebook.

ppdf <- function(df) {
  df %>% nrow() %>% comma()

1 Ratings

df_ratings <- read_imdb("title.ratings.tsv")
df_ratings %>% head()

There are 847,394 ratings in the dataset.

Plot every point. (note: very slow!)

plot <- ggplot(df_ratings, aes(x = numVotes, y = averageRating)) +

ggsave("imdb-0.png", plot, width=4, height=3)