EX05 - Data Layer


Overview

In this abbreviated exercise, you will be focused on connecting a subset of our checkin project’s REST API to a true database-backed data layer. We will maintain a strong separation of concerns between the FastAPI HTTP routing, request, and response handling and the SQLAlchemy’s data persistence concerns. The shared vocabulary of “data transfer objects” enabling these layers to engage with one another will be models defined in Pydantic. These architectural decisions were discussed in Class 17 on Backend Layers.

Unlike the SQLAlchemy tutorial, in this exercise you will run a full PostgreSQL database with long-lived persistence. We will set this database up together as part of the exercise. The database will run as a separate Docker container from your service, just as it would in production, but it will all be managed by the DevContainer setup we establish.

This exercise is designed to be a solo exercise and everyone needs to create their own repository and submission. However, you are welcomed to collaborate with your final project teammates and work together through each of these steps as long as everyone is completing the work on their own machines.

Getting Started

To begin, accept the following project on GitHub Classroom: https://classroom.github.com/a/tAAB1uSR

Next, from outside of VSCode on your host machine, in a terminal, clone your project. We have seen issues with students attempting to clone DevContainer projects from within VSCode. These problems can be avoided if you begin by cloning your repository on your host machine and then open this folder in VSCode. It will subsequently ask you if you want to open in a DevContainer. You can go ahead and do so to build the container image.

Establishing Environment Variables

In order to establish our database, we need to provide some configuration settings. It is considered common and best practice to pass configuration settings to our servers via environment variables. Environment variables are passed into processes from their “environment”; typically the shell, but we will see another handy way in .env files. You have encountered environment variables in your shell already: PATH is one such. We will learn more about environment variables and why they are important in cloud-deployed 12-factor applications soon.

Adding a .env file to backend

Environment variables can be clunky to manage and share via the terminal alone. Often you would like environment variables to be scoped to the projects you are working on rather than global to your user account or system. A common modern convention is defining environment variables in a dotfile named .env.

The .env file is a common place where secrets and system-specific configuration values are stored. Thus, it is important to be sure .env is in your project’s .gitignore file. You do not want to commit .env files to the history of your repository.

Now that you have confirmed the .gitignore entry, from within your backend directory, create a file named .env. This is where we’ll establish some important environment variables for the backend database configuration. Copy paste the following contents:

MODE=development
POSTGRES_USER=postgres
POSTGRES_PASSWORD=postgres
POSTGRES_HOST=db
POSTGRES_PORT=5432
POSTGRES_DATABASE=comp423

It is worth noting your shell program does not inherently pay attention or read .env files, but soon we will see how Docker and our application can make use of them.

Save your .env file and continue on.

Adding the Postgres Database Service to Docker Compose

Your DevContainer’s .devcontainer/docker-compose.yml file specifies only a single container in your project: your development container for backend work. Let’s add an additional container to our project to run Postgres.

In doing so, we’ll be introducing a Docker Compose “service”. This is the yet another overloaded usage of the term service you have seen in the course. In this context, a Docker “Service” in a single compose project refers to a container we can access by its name from other containers in the project.

The service container you’re about to setup will be named db. Its image will be based on a modern release of PostgreSQL. It will be given environment variables via the .env file you just established. These environment variables will specify the default username and password to access the database. These are kept simple (postgres/postgres) because we are working in a development environment on your machine only. In production, which we are not concerned with in this exercise, you’ll choose a randomized password.

Finally, this service needs to be backed by a persistent file system such that when you restart your project all of the data remains saved to disk. We will use a feature of Docker called a “Storage Volume” and attach storage to the container at the directory PostgreSQL expects to save its data.

Define the db Service

Open .devcontainer/docker-compose.yml. Under services, you will see the httpd entry that defines our developer server container. Next, you’ll add a sibling to this service named db. Add the following entry:

    db:
        image: postgres:16.0
        restart: unless-stopped
        env_file:
            - ../backend/.env
        ports:
            - '5432:5432'
        volumes:
            - db-data:/var/lib/postgresql/data

In terms of indentation levels, db should be at the same as httpd, and image should be at the same level as build above in httpd. Indentation is important in YAML files, so be careful yours is correct here to avoid problems later.

Notice under volumes we’re listing a volume named db-data that will be mounted into our db container’s /var/lib/postgresql/data directory; the place where PostgreSQL saves its data. Next, you’ll establish this volume:

Define the db-data Storage Volume

Add the following to your .devcontainer/docker-compose.yml file at the top-level of indentation; volumes should be at the same level as services.

volumes:
    db-data:

We are not overriding any of the default storage volume settings for db-data, so this looks a bit silly, but do notice we are explicitly declaring db-data to be the name of a volume in our project. The db service defined above’s reference of db-data is what connects it to this entry. Once we rebuild and restart the devcontainer, you’ll be able to see this volume from within the Docker dashboard.

Rebuild Container

After reconfiguring our docker-compose.yml file (or Dockerfile, though no changes should be necessary in it), you will need to rebuild your container for the updates to land in your DevContainer. Open the Command Palette and run “Dev Containers: Rebuild Container”. Your project should reopen without issue, but if you have problems select reopen on host machine and return to the steps above. You may also scan the error output to see if you can find the error that led to the restart not working. The three places it could be would be the .env file or the other steps above.

Open your Docker Dashboard and view your Containers tab. You should see the ex05 stack and if you expand it you’ll see db and httpd. Both should be running.

Now go to the Volumes tab of the dashboard and notice the storage volume that was created for your database data to be persistent.

Create Database in PostgreSQL

Now we need to create a Database for our app. The PostgreSQL server is capable of hosting many databases, each with their own tables and other resources, independent of one another.

Back in your VS Code Dev Container, open up a terminal and run the following commands. After the first command, type in postgres at the password prompt.

$ psql --host=db --user=postgres

# Use postgres as the password, too

postgres=# CREATE DATABASE comp423;

postgres-# \q  

Here we’re using the de facto command-line client for connecting to a PostgreSQL server: psql. Notice the db name of the host corresponds to what we named this service in the docker-compose file. Technically, this is a DNS host, just like www.unc.edu. One of the features of docker-compose is that it has a system for resolving these hostnames to the correct containers once they are started. Also notice the user and password are what you specified as environment variables in your .env file. When the PostgreSQL service began for the first time, it used the environment variables to assign the username/password for the system by convention.

The second line is a SQL statement to create a database named comp423 within your db container’s PostgreSQL server. The third line \q exits the psql client REPL.

Database Code Review

There are three source code files to review ahead of the next step:

  1. backend/database.py
  2. backend/env.py
  3. backend/entities.py

Notice in database.py, we are reading the environment variables you established in .env in order to load the configuration necessary to create a connection string for SQLAlchemy. You’ll additionally notice the function getenv is being imported from backend/env.py. Take a quick gander at this file. You will see the getenv function is merely passed through from the standar library’s os (operating system) package. However, you will also notice this module imports and calls a function named load_dotenv from the dotenv third party library (installed from requirements.txt). This 3rd party library is what is responsible for loading the .env file’s contents into environment variables. Without doing this, the contents of .env would otherwise be ignored. Back in database.py, you’ll see the engine being established with SQL echo’ing turned on, and a function definition named db_session. The db_session function is peculiar in its usage of yield, making it a generator function. We will not go into depth on this detail, but if you would like to learn more please read about dependencies with yield. This function will produce Session objects for our backend services layer to depend upon.

Next open entities.py and notice this is the same simple UserEntity discussed in class. It has some helper methods defined for going back and forth between pydantic models (found in models.py).

Scripting Developer Database Schema Reset and Sample Data

As a quick recap, you now have a running PostgreSQL container and a database named comp423 in it. However, you do not yet have a users table within this database, nor any data in the table. Let’s write a quick script that will be easy to run to reset our developer database tables.

Create a directory named script in the backend directory and add a file to it named reset_db.py. Go ahead and add the contents below and complete the TODO. For the mock UserEntity object you are constructing and adding to the database, perhaps choose your own pid, first_name, and last_name to initialize the object with.

import database
from entities import Base, UserEntity

# Reset Tables
Base.metadata.drop_all(database.engine)
Base.metadata.create_all(database.engine)

# Enter Mock Data
from sqlalchemy.orm import Session
session = Session(database.engine)

# TODO: Add a UserEntity to the database session and commit it.

After you have completed the TODO, which you can refer to your notes on SQLAlchemy for the steps involved, save your file and try running the following command to run your script.

$ cd backend
$ python3 -m script.reset_db

You should be able to wade through the SQL echoed out by SQLAlchemy and see the drop table, create table, and insert statements being issued. This script can be re-run, since it attempts to whipe the database clean (drop all) before building it back up. However, do note that these kinds of actions truly do whipe your database’s data. This is not an issue for this exercise, but in future scenarios or production management, you want to be very careful with commands and scripts like this!

Run the Backend FastAPI Server and open /docs

We will not have a front-end in this exercise as the focus is connecting the HTTP layer with the business logic / database layer. We will only have the OpenAPI /docs facilities provided by FastAPI.

Go ahead and start your development server. From within a pwd of backend, you can start the uvicorn server as usual: uvicorn main:app --reload. Then navigate your browser to localhost:8000/docs. You should see the four routes defined in main.py. Only one of them will actually work: GET /api/users.

If you are not seeing the correct backend, go back to VSCode and check the “PORTS” tab in the same pane as your terminal. You should a green dot next to port 8000 on your container and may see that your Local Address was mapped to a different port, in which case, you should navigate to that address with /docs appended.

Try executing the route /api/users. If all is well up to this point in the exercise, you should see the response include the mock data you established in reset_db.py. If you do not, then you should not proceed any further as some step above failed. If you are seeing an error, it likely has to do with database creation or table creation. If you are seeing no error, but an empty array response body, then your UserEntity in reset_db.py was not successfully inserted and committed to the database. Resolve these issues before proceeding.

Your Task

Your task in this exercise is to complete the three remaining route implementations in main.py, for routes new_user, get_user, and delete_user. These routes should persist and retrieve their data from the PostgreSQL database. The concerns of database interactions should be defined in user_service.py. Here, between the two files, you will see a complete end-to-end implementation for the route you just tested: get_users depends on UserService#all.

As discussed in class, the parameter and return types of your service should be User models. All database concerns, and UserEntity usage, should be encapsulated in your UserService class. One simple rule of thumb for this is looking at the imports of main.py versus user_service.py. Notice main.py does not import any functionality from sqlalchemy– it is implemented in terms of user_service.

Code Read: Dependency Injection

A new feature, which is demonstrated in each of the four route functions, is the ability to provide custom dependency injectors. The idea is very similar to Angular’s notion of dependency injection, but this mechanism is specific to FastAPI. Each of the route functions has a parameter of user_service: UserService = Depends(). The Depends default is FastAPI’s convention and system for specifying custom dependencies. This particular dependency is indicating FastAPI needs to construct a new instance of UserService before each route evaluation (which happens per each request from a web client).

Next, open user_service.py and notice the definition of UserService#__init__. Specifically, notice that it has a parameter with signature session: Session = Depends(db_session). Recall, db_session is a special function defined in database.py. This dependency tells FastAPI when it injects a UserService it needs to first call db_session to grab a fresh Session and pass it in as an argument to the constructor of UserService.

Thus, thanks to the FastAPI framework, when you request any of these API end-points, FastAPI is constructing this object graph and passing it into the function being routed to as a parameter. This has three distinct benefits. First, it alleviates you from needing to construct these objects manually in code for each route. Second, when it comes time to write tests, we will have a simple way of passing in our own arguments with test stubs and mocks rather than rely on dependency injection. Third, thanks to FastAPI’s conventions, we can define our dependencies to automatically clean up for themselves once the route has responded to the client. In our example, the session with the database is closed automatically in the database.py/db_session function. Dependency injection is a common, modern pattern seen widely throughout many frameworks (you saw this in Angular, too!)

Technical Requirements

There are a few technical requirements for the routes you implement and the backing service layer:

  1. All UserService methods should retain and adhere to their method signatures’ parameter and return types.
  2. new_user route should result in a 422 response if there is an error experienced in the creation of the user (such as attempting to register a duplicate PID, which the database will refuse and raise an error for you to handle in main.py).
  3. get_user should result in a 404 response if the user cannot be found in the database.
  4. delete_user should result in a 404 response if there is an issue deleting a user by their PID.

As you complete each route and its backing service method, you should be able to interactively test persistence in your database via the OpenAPI docs interface. Further, you can fully close your uvicorn server, start it back up, and restart your Dev Container and host machine, and your data from earlier sessions will be retained thanks to finally having a truly persistent layer in the PostgreSQL database.

Hand-in

Hand-in instructions to follow soon.

Contributor(s): Kris Jordan