Most, if not all, of the datasets you’ve worked with have been read into R and worked with internally. This often takes the form of reading in TXT, CSV, or TSV files stored locally on your machine. However, many datasets are stored on servers or machines other than your own and it could be tedious, redundant, or inefficient to create copies of them on your own machine. Fortunately, we’re not forced into working exclusively with local data files!
When you open RStudio you should notice that next to the Environment and History tabs is a Connections tab. This tab helps assist you in accessing external data sources and tracking your current connections to them. As helpful as this tab can be, all of your work connecting to external databases or other data sources can be done via your own code in a script or RMarkdown file if you wish.
We’re going to get some practice by connecting to a SQLite database. To keep the exercise as simple as possible, the database is actually still going to live on your own machine! However, we will still be accessing data without reading any files into our environment.
Please go to the following site, read about the Chinook sample database tables, and then download the Chinook sample database:
It should have downloaded a zipped file with the database file inside it. You’ll need to unzip the file and put the database file in a location of your choosing, but remember where you put it!
Now, follow the code on the site below to connect to and access the chinook.db file you just downloaded.
Notes:
RSQLite
package.dbConnect()
function, you should specify the path to your chinook.db file including the filename itself.db
functions in variables in the R
environment or even pipe them into other pipe-friendly functions.How many observations/rows are in the employees
table?
How many different job titles are included in the employees
table?
What is the average of the Total
variable in the invoices
table?
Before you finish this portion of the coursework, take a moment to run ?dbConnect
and notice that with this function you have the ability to connect to all sorts of external data sources that may require more credentials or authentication.
Now that we’ve talked about the JSON format for data files, it’s time we talked about one of the primary vehicles for them: APIs.
This is the tip of the iceberg when it comes to APIs, but it will suffice to enable a lot of cool things for us. The video above used some nice examples like the restaurant and travel services, but there are APIs for accessing all kinds of data out there…like Facebook, Twitter, Google Maps, etc.
For our class we’re not interested in creating APIs, but mostly using them to access data that we otherwise couldn’t get our hands on…at least not as easily.