Challenge 3

Get data into a data frame:

#Get list of cities
cities = GET("https://developers.teleport.org/assets/urban_areas.json") 
citiesdata = fromJSON(rawToChar(cities$content))
citiesdata2 = tolower(citiesdata)
nameslist = names(citiesdata)

#Add scores to lists
City = c()
CostOfLiving = c()
Education = c()
Housing = c()
for(i in 1:length(citiesdata2)){
  scores = GET(paste("https://api.teleport.org/api/urban_areas/slug:", citiesdata2[i], "/scores/", sep = ""))
  data = fromJSON(rawToChar(scores$content))
  City = c(City, nameslist[i])
  CostOfLiving = c(CostOfLiving, data$categories$score_out_of_10[2])
  Education = c(Education, data$categories$score_out_of_10[10])
  Housing = c(Housing, data$categories$score_out_of_10[1])
}


#Add lists to dataframe
citiesdf = data.frame(City, CostOfLiving, Education, Housing) %>%
  rename(`Cost of Living` = CostOfLiving)

#Add group to use later in graphing
citiesdf = citiesdf %>%
  mutate(Group = ifelse(`Cost of Living` < 5.0, "One", "Two"))

head(citiesdf)

##          City Cost of Living Education Housing Group
## 1      Aarhus          4.015    5.3665  6.1315   One
## 2    Adelaide          4.692    5.1420  6.3095   One
## 3 Albuquerque          6.059    4.1520  7.2620   Two
## 4      Almaty          9.333    2.2830  9.2820   Two
## 5   Amsterdam          3.824    6.1800  3.0530   One
## 6   Anchorage          3.141    3.6245  5.4335   One

Make visualization of the data:

graph = ggplot(citiesdf, aes(x = Housing, y = CostOfLiving, col = Group)) + 
  geom_point() +
  theme_linedraw() +
  ylab("Cost of Living Rating") +
  xlab("Housing Rating") +
  ggtitle("Correlation Between Housing and Cost of Living Based on Group", subtitle = "Group 1: Cost of Living < 5, Group 2: Cost of Living >= 5") +
  transition_states(Group) +
  shadow_mark()
  
animate(graph)

This graph shows that the the correlation between housing and cost of living is a lot more variable for cities that have a cost of living rating less than 5.0. Group 1, the group which has cost of living < 5, shows that the data points do not form much of a correlation, and they all seem to be randomly spread out, in exception of the points that are almost exactly at 5.0. Group 2 is shown to have a strong positive correlation with a few outliers. This can be used as proof that if the cost of living rating is higher in a given city, you are almost guaranteed to have a higher housing rating in that city. However, if the cost of living rating is lower in a given city, there is a lot more variability on where there housing rating will lie.