The Research Question: How variable are quality of life and cost of living?

Loading Libraries and Getting Data

library(httr)
library(jsonlite)
library(purrr)
## 
## Attaching package: 'purrr'
## The following object is masked from 'package:jsonlite':
## 
##     flatten
library(tidyverse)
## -- Attaching packages ---------------------------------------------------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.0     v dplyr   0.8.5
## v tibble  3.0.0     v stringr 1.4.0
## v tidyr   1.0.2     v forcats 0.5.0
## v readr   1.3.1
## -- Conflicts ------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter()  masks stats::filter()
## x purrr::flatten() masks jsonlite::flatten()
## x dplyr::lag()     masks stats::lag()
res = GET("https://api.teleport.org/api/urban_areas/")

data = fromJSON(rawToChar((res$content)))
urban_areas = data.frame(data$`_links`$`ua:item`)

Cleaning data

# Function to get the individual data sets from each city's url 
get_all_scores = function(url, name){
  initial = fromJSON(rawToChar(GET(as.character(urban_areas$href[which(urban_areas$name == name)]))$content))
  city_api = fromJSON(rawToChar(GET(as.character(initial$`_links`$`ua:scores`))$content))
  city_data = data.frame(city_api$categories)
  
  # Data frame that calculates the Quality of life and includes the cost of living and internet access
  # 1 = Housing, 8 = Safety, 10 = Education, 11 = Environmental Quality, 14 = Internet Access,
  # 15 = Leisure & Culture, 2 = Cost of Living
  data.frame("Name" = name,
    "Quality of Life" = mean(city_data$score_out_of_10[1], city_data$score_out_of_10[8], 
                           city_data$score_out_of_10[10], city_data$score_out_of_10[11],
                           city_data$score_out_of_10[15]),
    "Cost of Living" = city_data$score_out_of_10[2], stringsAsFactors = FALSE,
    "Internet Access" = city_data$score_out_of_10[14])
}

# Uses the function to get all the scores
all_scores = bind_rows(list(url = urban_areas$href, name = urban_areas$name) %>% 
  pmap(.f = get_all_scores))

Creating Graph

all_scores %>%
  ggplot(aes(x = Quality.of.Life, y = Cost.of.Living, color = Internet.Access)) +
  geom_point(size = 2) +
  theme(panel.border = element_blank(),
          panel.grid.major = element_blank(),
          panel.grid.minor = element_blank(),
          axis.line = element_line(size = 0.5, linetype = "solid",
                                   colour = "black")) +
  labs(x = "Positivities in Life Score", y = "Cost of Living Score", 
       title = "Living in Urban Areas")

The main purpose of this plot is to show that there is a positive correlation between the positivites in life and cost of living in urban areas. The positivities is a mean score of the Housing, Safety, Education, Enviornmental Quality, and Leisure & Culture Scores. Although the positivities may be biased because I chose which scores are considered “Positivities,” I believe the majority of people would agree with me. A side factor I wanted to show is that higher positivity and cost of living areas tend to have lower internet access. (Personal Opinion: Therefore, do not get rich and move to a cooler house, because you will sadly have less internet! :C )