Chapter 4 Create and connect to the adventureworks database in PostgreSQL
This chapter demonstrates how to:
- Create and connect to the PostgreSQL
adventureworks
database in Docker- Keep necessary credentials secret while being available to R when it executes.
- Leverage Rstudio features to get an overview of the database
- Set up the environment for subsequent chapters
4.1 Overview
Docker commands can be run from a terminal (e.g., the Rstudio Terminal pane) or with a system2()
command. The necessary functions to start, stop Docker containers and do other busy work are provided in the sqlpetr
package.
Note: The functions in the package are designed to help you focus on interacting with a dbms from R. You can ignore how they work until you are ready to delve into the details. They are all named to begin with
sp_
. The first time a function is called in the book, we provide a note explaining its use.
Please install the sqlpetr
package if not already installed:
library(devtools)
if (!require(sqlpetr)) {
remotes::install_github(
"smithjd/sqlpetr",
force = TRUE, build = FALSE, quiet = TRUE)
}
Note that when you install this package the first time, it will ask you to update the packages it uses and that may take some time.
These packages are called in this Chapter:
4.2 Verify that Docker is up, running, and clean up if necessary
The
sp_check_that_docker_is_up
function from thesqlpetr
package checks whether Docker is up and running. If it’s not, then you need to install, launch or re-install Docker.
## [1] "Docker is up, running these containers:"
## [2] "CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES"
## [3] "611b69c981a7 postgres:11 \"docker-entrypoint.s…\" 3 days ago Up About a minute 0.0.0.0:5432->5432/tcp adventureworks"
4.3 Clean up if appropriate
Force-remove the adventureworks
container if it was left over (e.g., from a prior runs):
## [1] 0
4.4 Build the adventureworks Docker image
Now we set up a “realistic” database named adventureworks
in Docker.
NOTE: This chapter doesn’t go into the details of creating or restoring the
adventureworks
database. For more detail on what’s going on behind the scenes, you can examine the step-by-step code in:
source('book-src/restore-adventureworks-postgres-on-docker.R')
To save space here in the book, we’ve created a function
in sqlpetr
to build this image, called OUT OF DATE!! . Vignette Building the adventureworks
Docker Image describes the build process.
*Ignore the errors in the following step:
## docker run --detach --name adventureworks --publish 5432:5432 --mount type=bind,source="/Users/jds/Documents/Library/R/r-system/sql-pet",target=/petdir postgres:11
4.5 Run the adventureworks Docker Image
Now we can run the image in a container and connect to the database. To run the
image we use an sqlpetr
function called OUT OF DATE sp_pg_docker_run
For the rest of the book we will assume that you have a Docker container called
adventureworks
that can be stopped and started. In that sense each chapter in the book is independent.
4.6 Connect to PostgreSQL
*CHECK for sqlpetr
update!The
sp_make_simple_pgfunction we called above created a container from the
postgres:11library image downloaded from Docker Hub. As part of the process, it set the password for the PostgreSQL database superuser
postgres` to the value
“postgres”.
For simplicity, we are using a weak password at this point and it’s shown here and in the code in plain text. That is bad practice because user credentials should not be shared in open code like that. A subsequent chapter demonstrates how to store and use credentials to access the DBMS so that they are kept private.
The
sp_get_postgres_connection
function from thesqlpetr
package gets a DBI connection string to a PostgreSQL database, waiting if it is not ready. This function connects to an instance of PostgreSQL and we assign it to a symbol,con
, for subsequent use. Theconnctions_tab = TRUE
parameter opens a connections tab that’s useful for navigating a database.
Note that we are using port 5439 for PostgreSQL inside the container and published to
localhost
. Why? If you have PostgreSQL already running on the host or another container, it probably claimed port 5432, since that’s the default. So we need to use a different port for our PostgreSQL container.
Use the DBI package to connect to the adventureworks
database in PostgreSQL. Remember the settings discussion about [keeping passwords hidden][Pause for some security considerations]
Sys.sleep(sleep_default)
# con <- connection_open( # use in an interactive session
con <- dbConnect( # use in other settings
RPostgres::Postgres(),
# without the previous and next lines, some functions fail with bigint data
# so change int64 to integer
bigint = "integer",
host = "localhost",
port = 5432, # this version still using 5432!!!
user = "postgres",
password = "postgres",
dbname = "adventureworks"
)
4.7 Adventureworks Schemas
Think of the Adventureworks database as a model of the Adventureworks business. The business is organized around different departments (humanresources, sales, and purchasing), business processes (production), and resources (person). Each schema is a container for the all the database objects needed to model the departments, business processes, and resources. As a data analyst, the connections tab has three of the five database objects of interest. These are schemas, tables and views. The other two database objects of interest not shown in the connetions tab are the table primary and foreign keys, PK and FK. Those database objects enforce the referential integrity of the data and the performance of the application. Let the DBA’s worry about them.
The Connections tab has three icons. The node icon represents a schema. The schema helps organize the structure and design of the database. The schema contains the views, the grid with the glasses, and tables, the grids without the glasses, that are of interest to the data analyst. A table is a database object usually represents something useful to a business process. For example, a sales person may enter a new order. The first screen is typically called the sales order header screen which contains information about the customer placing the order. This information is captured in salesorderheader table. The customers ordered items are typically entered via multiple screens. These are captured in the salesorderdetail table.
A view is a database object that maybe a subset of either the columns or rows of a single table. For example, the customer table has information on all the customers, but the customer view, c, shows only a single customer.
Or a view may have data from a primary/driving table and joined to other tables to provide a better understanding/view of the information in the primary table. For example, the primary table typically has a primary key column, PK, and zero or more foreign key columns, FK. The PK and FK are usually an integer which is great for a computer, but not so nice us mere mortals. An extended view pulls information associated with the FK. For example a sales order view a customer foreign key, can show the actual customer name.
4.8 Investigate the database using Rstudio
The Rstudio Connections tab shows that you are connected to Postgres and that the adventureworks
database has a many schemas each of which has multiple tables and views in it. The drop-down icon to the left of a table lists the table’s columns.
Clicking on the icon to the left of a schema
expands the list of tables
and views
in that schema
. Clicking on the View
or Table
icon opens up Rstudio’s View
pane to get a peek at the data:
The number of rows and columns shown in the View pane depends on the size of the window.
4.9 Cleaning up: diconnect from the database and stop Docker
Always have R disconnect from the database when you’re done.
Stop the adventureworks
container:
Show that the container still exists even though it’s not running
## CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
## 37f05f6c5c62 postgres:11 "docker-entrypoint.s…" 26 seconds ago Exited (0) Less than a second ago adventureworks
Next time, you can just use this command to start the container:
sp_docker_start("adventureworks")
And once stopped, the container can be removed with:
sp_check_that_docker_is_up("adventureworks")
4.10 Using the adventureworks
container in the rest of the book
After this point in the book, we assume that Docker is up and that we can always start up our adventureworks database with:
sp_docker_start("adventureworks")