#Get list of cities
cities = GET("https://developers.teleport.org/assets/urban_areas.json")
citiesdata = fromJSON(rawToChar(cities$content))
citiesdata2 = tolower(citiesdata)
nameslist = names(citiesdata)
#Add scores to lists
City = c()
CostOfLiving = c()
Education = c()
Housing = c()
for(i in 1:length(citiesdata2)){
scores = GET(paste("https://api.teleport.org/api/urban_areas/slug:", citiesdata2[i], "/scores/", sep = ""))
data = fromJSON(rawToChar(scores$content))
City = c(City, nameslist[i])
CostOfLiving = c(CostOfLiving, data$categories$score_out_of_10[2])
Education = c(Education, data$categories$score_out_of_10[10])
Housing = c(Housing, data$categories$score_out_of_10[1])
}
#Add lists to dataframe
citiesdf = data.frame(City, CostOfLiving, Education, Housing) %>%
rename(`Cost of Living` = CostOfLiving)
#Add group to use later in graphing
citiesdf = citiesdf %>%
mutate(Group = ifelse(`Cost of Living` < 5.0, "One", "Two"))
head(citiesdf)
## City Cost of Living Education Housing Group
## 1 Aarhus 4.015 5.3665 6.1315 One
## 2 Adelaide 4.692 5.1420 6.3095 One
## 3 Albuquerque 6.059 4.1520 7.2620 Two
## 4 Almaty 9.333 2.2830 9.2820 Two
## 5 Amsterdam 3.824 6.1800 3.0530 One
## 6 Anchorage 3.141 3.6245 5.4335 One
graph = ggplot(citiesdf, aes(x = Housing, y = CostOfLiving, col = Group)) +
geom_point() +
theme_linedraw() +
ylab("Cost of Living Rating") +
xlab("Housing Rating") +
ggtitle("Correlation Between Housing and Cost of Living Based on Group", subtitle = "Group 1: Cost of Living < 5, Group 2: Cost of Living >= 5") +
transition_states(Group) +
shadow_mark()
animate(graph)
This graph shows that the the correlation between housing and cost of living is a lot more variable for cities that have a cost of living rating less than 5.0. Group 1, the group which has cost of living < 5, shows that the data points do not form much of a correlation, and they all seem to be randomly spread out, in exception of the points that are almost exactly at 5.0. Group 2 is shown to have a strong positive correlation with a few outliers. This can be used as proof that if the cost of living rating is higher in a given city, you are almost guaranteed to have a higher housing rating in that city. However, if the cost of living rating is lower in a given city, there is a lot more variability on where there housing rating will lie.