ARTICLE

Persisting Application Data over Time

This article, excerpted from chapter 10, covers

§ Persisting Data

§ Database Systems

§ Database Structures

§ Modeling Data with SQLAlchemy

Take 35% off The Well-Grounded Python Developer by entering fccfarrell into the discount code box at manning.com.

This article is about persisting application data over time. You don’t run the applications you use forever, and despite the stability of computer systems, they are shut down and restarted periodically.

Imagine using a complex spreadsheet and re-entering all the data every time you shut the application or computer down. Even with the enormous processing power a computer has, it would hardly be a helpful device if there was no way to restore the information entered into it.

The Other Half

As a developer, it’s easy to think of the application code you’re creating as the primary product of your efforts. But, in reality, your cool, essential application with all of its well thought out code is only half the story. The other, equally important half is the data your application helps the user work with. Modifying, transforming and providing insights into data your users are interested in is the raw material an application works with.

Maintaining Information over Time

The other half of a computer system’s versatility is the file system. A file system saves data to a storage medium independent of electrical power. Most personal computer systems maintain file systems on either mechanical or solid-state drives. These storage devices have file system structures layered over them by the operating system of the computer.

Application programs give meaning to the data in a file. For example, when a photo viewing application opens a JPEG image file, the user sees a picture. The photo application can interpret the contents of the file and generate the expected visual results.

If a user were to open the same JPEG image file with a text editor, they would see a large block of largely incomprehensible data. Most of the files in a file system are like this, making sense to the applications that can read and interpret them.

Any application needs to save and recall content to display to users. The content saved to the file system is in a format understood by the application.

Accessing Data

Before diving directly into database systems, let’s talk about storing data in a file in general. To do so, we’ll use something everyone who’s bought anything online is familiar with, customer orders for products. Later, we’ll use this idea to illustrate some issues when storing data to file systems.

To begin with, imagine an online store that only sells a single product to many customers. Each customer might create multiple orders for that single product. To make the data relatively easy to present in this article, we’ll keep the amount of information very low, the customer’s name, their address, the product name, and the quantity in the order.

A common format for data in a file system is the comma-delimited value format or CSV. A CSV file is easy to understand and has the advantage of being human-readable and accessible by computer systems.

A CSV file is a simple text file where each line of text is a record of data ending in a carriage return. A comma character separates the data elements in each line of text. The first line of a CSV file often contains the names of each comma-separated field in the remainder of rows of text in the file.

That’s pretty much it, no concept of data type for each element in a record. An application reads the CSV file and splits each comma-separated line of text into fields of text data.

The imaginary company selling only one product to each customer could save all of the customer information and their orders in a single CSV file:

This CSV file is sufficient to represent the customers, orders, and address where to ship them. The first field contains the customer’s name, the second their address, the third their zip code, the fourth field is the product name, and the last field is the quantity of the products in the order. Because the company only sells a single product, this could work.

Even in this example, you might notice a potential problem. There’s redundant data in the file. For example, customers and their addresses are represented multiple times for each separate order, as in the case of Joe and Mary.

We could resolve the problem by removing the redundancy and having only one record for each customer, and representing multiple orders in that same record. You could create more comma-separated fields containing the quantities, but this doesn’t work. Because there’s no way to know how many orders a customer will create, it would be difficult for an application reading the CSV file to know how many fields to expect.

We could pack multiple product and quantity fields into the single order field, but we’d need to use a delimiter distinct from a comma to separate the values. So instead, we use the pipe “|” character to separate orders and the dash “-“ character to separate the product from the quantity. Doing this allows you to maintain multiple orders per record so that an application can still parse the lines of text in the file.

Implementing this idea creates a CSV file like this:

This implementation reduces the redundancy in the file as well as its size. Reducing the redundancy comes at the cost of increased processing when reading and interpreting the data in this CSV file. The application will have to parse for the comma, the pipe, and the dash symbol delimiters in the order field.

Suppose our imaginary company decides to sell multiple products, and customers can ship an order to any address they want. Now customer orders need to contain the shipping address information, which nees its own delimiters within the field to stay distinct.

We could extend the CSV file in this way:

Adding more delimiters could work, but it’s getting silly with multiple data items to parse in the orders field. This approach also doesn’t scale well as adding more products makes the orders field even more complicated.

Resolving this problem means recognizing the logical divisions between the data elements to be stored. For example, a customer can ship multiple orders to different shipping addresses, and each order can contain multiple products and quantities.

Each customer can have multiple orders, but each order is related to only a single customer. Similarly, each address can be related to multiple orders, but each order will ship to only a single address.

Orders and products are a little more challenging. An order can contain multiple products, and a product can be part of multiple orders. To resolve this, we invent the concept of an order item. An item relates to an order and a product, providing this two-way connection.

We can break the data into separate CSV files along these logical lines, essentially where we’ve added additional delimiters in the text. Taking this action creates five CSV files: customer, address, product, order, and item. The five CSV files separate the data along the logical lines. Unfortunately, there’s no way to connect a customer to an order, an order to address, or an item to either an order or a product.

To connect the data, we need to create relationships between the rows of data in the files. We can do this by creating a unique identifying value for each row in every CSV file. At a minimum, the row identifier only needs to be unique across the rows in an individual CSV file.

We’ll add another column at the beginning of each row and assign an integer value incremented for each row. The integer value uniquely identifies each row of data in a CSV file, but there are still no relationships between the CSV files.

To create relationships, we add the unique identifier from one record in a CSV file to another to indicate the relationship between the two CSV files. For example, a single customer can create multiple orders, but each order only belongs to a single customer.

We’ll add the unique identifier as a new value to all the rows in the order CSV file that relate to that customer to create this relationship. This kind of relationship is called one-to-many; each customer is related to multiple orders, but each order relates to only a single customer.

There’s also another relationship we have to establish. Each order can consist of multiple products, and each product can relate to multiple orders. A relationship like this is called a many-to-many. Conceptually this is a many-to-one relationship combined with a one-to-many and is implemented by creating relationship associations.

To do this, we’ll create an item CSV file that contains the unique id’s from both the order and product CSV files for each item. In this way, an order can connect to multiple items and connect to multiple products.

The five CSV files to represent the information is presented below:

The diagram shows the five CSV files, their contents, and the relationships between them. Each file has a unique id value as the first field in each row of data. The structure shows the Customer, Product, and Address files have no redundant data. It also shows the Order and Item files contain relationship data primarily, aside from the unique id and the qty value in Item.

Because of the structure and contents of the CSV files, our imaginary company could continue to add new customers, add new products to sell, and addresses to ship to, all without creating unsustainable redundant information. The example program in examples/01/main.py uses this information to create simple invoice PDF files for all the orders in the system. The diagram below shows invoice_1.pdf with indications for the source of the data shown:

I won’t present the example program here as it’s available in the book. The program works by reading all the CSV files into memory and creating a Transactions container class to hold the information. Next, the orders in the Transactions class are inserted into the transaction information fields in a Jinja2 template. The resulting rendered HTML is converted to a PDF file.

The above ideas and implementation work but have significant limitations. Because the CSV files are read into memory, the number of customers, products, and orders are limited to the amount of memory the application has available.

The example program has only one use, to create a set of order invoices for all the orders in the system. There is no facility to search for an order, customer or product. Any additional use cases, like searching or reporting our imaginary company might want, requires quite a bit more programming development.

Our imaginary company would likely want to have multiple users interacting with the data, both customers and employees. Coordinating multiple access has to be handled by the application so the data stays consistent and uncorrupted. If multiple applications access the CSV files, this presents another level of complexity to coordinate that access, keep the data synchronized and current in all applications and prevent the files from becoming corrupted.

There’s also no standardized way to use the CSV files. The CSV files are shared easily enough, but anyone wanting to use them would need detailed knowledge of the structure of the files and the relationships implied by that structure. They’d also have to maintain that structure if they wanted to modify the data contents. One solution to the problems is to move the data to a database system.

Database Systems

Database systems allow you to persist data as well as the relationships between that data. One common type of database is the Relational Database Management System or RDBMS. RDBMS systems provide the functionality to create, update and delete tables stored within them. These tables are analogous to the two-dimensional tables represented by the CSV files used in the example above.

RDBMS systems also have the functionality to create and update the relationships between tables by connecting the unique id values across table boundaries.

One of the advantages of a database system over using files to persist information is that creating, updating, and maintaining the data is handled by the database. All the knowledge necessary to maintain the data is handled by the database.

Tables

The use of tables represents the data a database maintains. Conceptually tables in a database are two-dimensional collections of rows and columns.

Like the CSV files presented previously, the rows are the individual records, and the columns are the fields within a row. Unlike a CSV file where the columns are strings separated by a delimiter, the columns in database tables have defined data types. The data types supported depend on the particular database, but data types of text, integer, real (decimal numbers), and blobs (binary objects) are generally supported.

The CSV files that allowed our imaginary company to track customers, orders, and products were small enough to include diagrams, as was shown previously. However, in real systems, the number of rows and columns can be extensive, which wouldn’t lend itself to the same kind of diagrams.

Tables in a database can be represented graphically as part of an Entity Relationship Diagram, or ERD. Rather than show the rows and columns that make up a table, the column and data type information for a record is shown. As an example, here is the database table representing the customer information:

The diagram header is the table name, and each row represents a column in the database table. Here the three columns in the diagram provide information about the database column. The second is the column name, and the third is the column data type.

The first column in the diagram provides additional information about the column. In the example above, the first row has the abbreviation PK. PK is shorthand for Primary Key, which is the unique id associated with each row of data in the table.

The order CSV file contained nothing but unique id values, one for the row unique id and two others to connect to the customer and address CSV file rows. The ERD for the order table is presented here:

Just like the customer table has three columns describing the database columns of the table, the order table has three columns for the data it maintains. The new FK abbreviation is shorthand for Foreign Key. A Foreign Key creates a relationship between two tables by referencing the Primary Key of another table.

Relationships

As important as storing and modifying data is to any application, the relationships between the data are just as important. The updated CSV files enabled us to reduce the original single CSV files data redundancy.

The separation of distinct data into separate tables does mean there needs to be a way to reconnect related data. RDBMS systems establish relationships between multiple tables with the use of Primary and Foreign Keys.

The Primary Key in a database table is a column in a row (a record) whose value is unique across the entire table. It’s often the case the Primary Key column exists for the sole purpose of providing this unique id value and provides no information about the record itself.

Most RDBMS systems have functionality to create auto-incrementing integer values when new rows are inserted into a table. These make convenient Primary Key values that are assured to be unique across the table as new records are inserted into the table and the value increments.

The existence of Primary Keys, and their use as Foreign Keys in other tables, allows relationships to exist between tables.

One To Many

In our example, there is a relationship between customers and orders. A customer can create zero to many orders for products, but each order is related to only one customer. This kind of relationship is called One-To-Many.

To establish a One-To-Many relationship, the unique customer_id value from the customer table is also present as a column of data in the order table as customer_id. The customer_id value is a Foreign Key relating to the customer table. Any number of order records can have the same customer_id value, creating the One-To-Many relationship.

When creating a Foreign Key in a table, part of the definition given to the database engine is to what table the Foreign Key relates. The Foreign Key tells the database engine there’s a relationship and helps it provide the functionality to use that relationship.

Many To Many

Our imaginary company also establishes another kind of relationship. An order can have many items, each related to a product. At the same time, a product can be part of many items. A relationship like this is called a Many-To-Many.

Establishing a Many-To-Many relationship is more involved and, in some ways, can be thought of as a One-To-Many relationship connected to a Many-To-One relationship. To create this requires an association table that acts as the Many parts between the two just mentioned.

The item table creates the association between the order and product tables. The item table has a Foreign Key to the order table and a Foreign Key to the product table.

Transaction Database

The transaction database you’ll create uses a naming convention for the tables and the columns within those tables. The tables are named using singular nouns for what they contain: customer, product, etc. The naming convention seems counter-intuitive since a table has multiple records, and the plural version of the noun might seem more fitting.

The table is defined in terms of one row of data and the data types and meanings of the record’s columns. How the table is accessed can return one or more records, but the table itself is configured based on a single record.

Additionally, it can get surprisingly awkward to use plurals when naming tables. Plus, good developers are lazy, and the singular versions are shorter and require less typing when working with them.

The Primary Key columns are named using a convention of the table name appended with “_id”. While seemingly redundant and wordy as the Primary Key name, it’s quickly apparent the column is a Foreign Key when used in another table.

Creating an ERD for the transaction database generates this:

This diagram represents the transaction database table’s structure and the relationships between them using common database ERD notation and symbols. Notice how the connection between tables goes from Primary Key in one table to Foreign Key in another.

The connecting lines are all variations of One-To-Many relationships. The existence of the Item table creates the Many-To-Many relationship between the Order ⇔ Items ⇔ Product tables.

Creating, updating, and interacting with the data in an RDBMS system uses the Structured Query Language (SQL) most RDBMS systems provide.

Structured Query Language: SQL

Accessing the functionality of a database is standardized, so any programming language that has a library available to connect to the database can use it. This standardization makes the database much easier to share between applications than a proprietary system.

Much of the standardized functionality of RDBMS systems are exposed to the user by using Structured Query Language, or SQL. SQL interacts with an RDBMS system as a declarative programming language. A declarative language lets you express what you want a computer system to do and not explicitly instruct the system on how to perform that task.

One way to think about this would be going to a bakery and asking for a cake. You expect that the baker will give you a cake, not ask for a recipe make a cake.

Getting Data

You’ll be creating the transaction database later in this article, but here I’ll show some SQL queries to access data. This SQL statement:

SELECT * FROM customer;

Returns these results:

customer_id  name
----------- ----------
1 Joe
2 Mary
3 Sue

The SQL command keywords are in uppercase, which is just a convention. The statement asks the database to return all rows from the customer table. The “*” character is a wildcard to get all columns for each row returned. The “;” character at the end of the SQL statement is the terminator for the command.

This SQL query asks for only the names in the customer database sorted in descending alphabetical order:

SELECT name FROM customer ORDER BY name DESC;

name
----------
Sue
Mary
Joe

SQL also provides functions that transform and use the data. The statement below returns the number of customers:

SELECT COUNT(*) AS 'Total Customers' FROM customer;
Total Customers
---------------
3

The COUNT function returns the total number of results produced by the query and assigns that value to an alias ‘Total Customers’ used as the column heading for the output results.

Using Relationships

Because the tables in the transaction database represent normalized data without redundancies, making interesting queries requires using relationships. In this SQL statement, the customers, all the addresses used for their orders, the number of times they’ve used an address for an order, are returned and sorted alphabetically by name:

SELECT c.name, a.street, a.zipcode, COUNT(c.name) AS 'Times Used'
FROM CUSTOMER c
JOIN 'order' o ON o.customer_id = c.customer_id
JOIN address a ON a.address_id = o.address_id
GROUP BY a.street
ORDER BY c.name;

name street zipcode Times Used
---------- ---------- ---------- ----------
Joe 12 Main St 12345 2
Mary 127 Margol 40322 1
Mary 41 Orange 40321 1
Sue 212 Grove 34213 1

Here the SQL statement spans multiple lines, which works fine as the statement isn’t completed until the final “;” character. As before, only some values from the tables are returned, but those values span multiple tables.

Initially, the query starts at the customer table and assigns it to an alias shorthand used in other parts of the query to reduce ambiguity. To get the customer address used with each order, the query needs to use the relationships between the customer, order, and address tables. Using the “JOIN” keyword achieves this. It tells the database how to connect one table to another using the Primary Key from one to the Foreign Key of another.

The Primary Key from one table must equal the Foreign Key of the other for that row to be part of the results. The “ON” keyword provides the condition to be met to include the data.

Notice the ‘order’ table is in single quotes in the first “JOIN” statement. The single quotes are necessary because the word order is a SQL keyword; placing it in single quotes tells SQL to interpret ‘order’ as the table name rather than a keyword.

The “GROUP BY a.street” tells SQL to aggregate the results based on identical street values. The results returned indicate this. For example, Joe has two orders but used the same address for both. Mary also has two orders but used a different address for each one.

The SQL to get the data to build invoices for all the orders our imaginary company has is written like this:

SELECT
c.name, a.street, a.zipcode, o.order_id, p.name, i.qty
FROM 'order' o
JOIN customer c ON c.customer_id = o.customer_id
JOIN address a ON a.address_id = o.address_id
JOIN item i ON o.order_id = i.order_id
JOIN product p ON p.product_id = i.product_id

And returns these results:

name        street      zipcode     order_id    name        qty
---------- ---------- ---------- ---------- ---------- ----------
Joe 12 Main St 12345 1 widget 2
Joe 12 Main St 12345 1 thingy 3
Joe 12 Main St 12345 2 thingy 5
Mary 41 Orange 40321 3 widget 1
Mary 41 Orange 40321 3 thingy 9
Mary 127 Margol 40322 4 widget 7
Sue 212 Grove 34213 5 widget 3

This SQL query joins all the tables in the transaction database to recreate the redundant data for customers, orders, addresses, products, and items.

SQLAlchemy

SQLAlchemy is a popular and powerful database access library for Python that provides an Object-Relational Mapper (ORM). One of the benefits of working with Python is that it’s an object-oriented language and everything in Python is an object. Working with data as Python objects feels more natural and more Pythonic.

Python can access database systems using SQL, and this is a viable approach. Most Python database libraries that support SQL return lists of tuples or dictionaries containing the results of the SQL statement.

The SQL statement illustrating how to get the data to create invoices for orders shows the data, but all relationship information is lost. To use this data would require software to glean the hierarchal relationships in orders ó items ó products. The disconnect between objects and flat data is known as object-relational impedance mismatch, and it’s a problem the SQLAlchemy ORM solves.

Benefits

Using SQLAlchemy to access a database allows you to think about objects and methods rather than SQL and result sets. For the most part, you don’t need to know SQL to work with the underlying database. Instead, SQLAlchemy builds the necessary SQL statement to map the resulting data to Python objects and vice versa.

Most RDBMS databases support SQL; however, they often add proprietary functionality to their implementation. Aside from specific use cases, SQLAlchemy abstracts these differences from you and works at a higher level.

Another advantage SQLAlchemy provides is protecting your application from SQL injection attacks. For example, if your application adds user-supplied information to database queries, your application is vulnerable to this kind of attack.

Exploits Of A Mom (https://xkcd.com/license.html)

Modeling the Database

Connecting the database to SQLAlchemy requires modeling the table structures with Python class definitions. The models map the table record structure, and the relationships between tables, to Python classes. Instances of these Python classes are created due to calling methods on the classes, which SQLAlchemy translates to SQL statements.

Because the ultimate goal is to use SQLAlchemy with the MyBlog application, we’ll use Flask and the flask_sqlalchemy modules to help define the classes. The flask_sqlalchemy module provides convenience features and definitions, but the classes defined here could just as easily have been defined with the SQLAlchemy module alone.

Defining the Classes

The examples/02/main.py example program imports a models.py module. The models.py module contains all of the code to create the database object, model the tables, and populate the database with data from the CSV files of examples/01.

The Database Connection

All of the classes to be defined inherit from a common SQLAlchemy provided database object. The database object is created prior to defining the classes and is created in this way:

app = Flask(__name__)    #A
app.config["SQLALCHEMY_DATABASE_URI"] = "sqlite:///transaction.sqlite" #B
app.config["SQLALCHEMY_TRACK_MODIFICATIONS"] = False #C
db = SQLAlchemy(app, session_options={"autoflush": False}) #D

#A Create the Flask instance

#B Configure SQLAlchemy to use SQLite and where to create the database file

#C Turn off an unnecessary default configuration that generates a warning

#D Creates the SQLAlchemy database object, in this case with autoflush turned off

The intent of the code above is to create the db object instance used to define the table models. The database itself is stored in the single file “transaction.sqlite”. The “sqlite://” prefix on the URI connection string indicates using the SQLite database system.

Modeling The Tables

The transaction database ERD shown previously is a pretty good guide to create the class definitions needed to access the database with SQLAlchemy. The class definitions define the database table to create, the column names within a record, and their data type.

There are also fields defined that don’t exist in the database but are created and maintained by SQLAlchemy when instances of the models are created. These extra fields give you useful functionality when working with the model instances, particularly relationships between tables.

Let’s look at the Customer and Order class definitions:

class Customer(db.Model):    #A
__tablename__ = "customer" #B
customer_id = db.Column(db.Integer, primary_key=True) #C
name = db.Column(db.String) #D
orders = db.relationship("Order", backref=db.backref("customer")) #E

#A Create the class, inheriting from the db instance Model class

#B Associate the class definition with the customer database table

#C Create the customer_id column as an integer type and as the primary_key

#D Create the name column as a string

#E Create the instance only attribute orders, connecting a customer to all their orders

class Order(db.Model):    #A
__tablename__ = "order" #B
order_id = db.Column(db.Integer, primary_key=True) #C
customer_id = db.Column(db.Integer, db.ForeignKey("customer.customer_id")) #D
address_id = db.Column(db.Integer, db.ForeignKey("address.address_id")) #E

#A Create the class, inheriting from the db instance Model class

#B Associate the class definition with the order database table

#C Create the order_id column as an integer type and as the primary_key

#D Create the customer_id as an integer and foreign_key to the customer table and customer_id field

#E Create the address_id as an integer and foreign key to the address table and address_id field

There’s quite a bit going on in these class definitions. By inheriting from the db.Model class, the Customer and Order classes, gets SQLAlchemy functionality allowing the classes to interact with the underlying database.

The customer_id column is defined as an integer and as the Primary Key. By doing this, the customer_id field is initialized by an auto-incrementing function every time a new Customer instance is added to the database. The same happens for the order_id field in the Order class.

The name column is a simple string that maps to whatever database type best supports Python string type variables. Since SQLite is the underlying database, that type is TEXT.

The Customer class attribute orders are interesting and useful. It does not define a column in the database customer table at all. Instead, it creates an attribute maintained by SQLAlchemy that is available to you as a developer.

The orders attribute uses the relationship established by the customer_id Foreign Key created in the Order class. A Customer instance has an orders attribute that is a Python list of the Order instances associated with the customer.

The odd-looking backef parameter passed to db.relationship(…) creates an SQLAlchemy maintained attribute named customer in the Order class definition that points back to the Customer instance to which the order relates. Conceptually the relationships look like this:

The orders attribute lets you write Python code like this when you have a Customer instance:

print(f”Customer {customer.name} has these order number”)
for order in customer.orders:
print(f”Order number: {order.order_id}”)

The relationships and the attributes created and maintained by SQLAlchemy are very useful when printing out the order invoices. The rest of the SQLAlchemy model definitions follow:

class Address(db.Model):
__tablename__ = "address"
address_id = db.Column(db.Integer, primary_key=True)
street = db.Column(db.String)
zipcode = db.Column(db.String)
orders = db.relationship("Order", backref=db.backref("address"))

class Product(db.Model):
___tablename__ = "product"
product_id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String)

class Item(db.Model): #A
__tablename__ = "item" #B
order_id = db.Column(db.Integer, db.ForeignKey("order.order_id"), primary_key=True) #C
product_id = db.Column(db.Integer, db.ForeignKey("product.product_id"), primary_key=True) #D
qty = db.Column(db.Integer) #E
order = db.relationship("Order", backref=db.backref("items")) #F
product = db.relationship("Product") #G

#A Create the class, inheriting from the db instance Model class

#B Associate the class with the item database table

#C Create the order_id as an integer and foreign_key to the order table and order_id field

#D Create the product_id as an integer and foreign_key to the product table and product_id field

#E Create the qty field to track the quantity of product for this item

#F Create the instance only attribute order, connecting an order to this item

#G Create the instance only attribute product, connection a product to this item

The Item class definition creates the many-to-many association relationship between order, the items in that order, and products related to the items.

Creating and Using the Database

Once the SQLAlchemy models are defined, the database can be created. This line of Python code creates the database:

db.create_all()

If the transaction.sqlite SQLite database file defined earlier doesn’t exist; it’s created with table structures defined by the models, and those tables will be empty. However, if the transaction.sqlite database file does exist; the code above won’t re-create it; it will just connect to it.

It’s essential to recognize any changes made to the SQLAlchemy models won’t appear in the database if the database already exists. You can delete and recreate the database, and it will match the models, which is fine in this case but an unreasonable action most of the time.

With an existing database, you’ll need to use SQL statements, or other database tools, to modify the database to match the SQLAlchemy models. As a working developer, it’s not common to create a database from scratch. The more frequent activity is modifying an existing database to add new features and functionality to it.

Inserting Data

Even though creating and populating a database from scratch isn’t an everyday activity for a developer, we’ll look at it in the examples/02/models.py to see how SQLAlchemy creates and inserts database table records.

The program’s goal in examples/02/main.py is to replicate the behavior of examples/01/main.py but use a database instead of CSV tables. To do so means parsing the CSV files and inserting the data into the database using SQLAlchemy. The models.py module contains the SQLAlchemy models and the statement to create the database. It also has a custom function to read the CSV files and load them into the database tables.

def load_database():
customers = CsvData("customer.csv") #A
addresses = CsvData("address.csv") #A
orders = CsvData("order.csv") #A
products = CsvData("product.csv") #A
items = CsvData("item.csv") #A

with session_manager() as session: #B
# create the customers
for customer in customers.data.values(): #C
session.add(Customer(
name=customer.get("name")
)) #C

# create addresses
for address in addresses.data.values(): #D
session.add(Address(
street=address.get("street"),
zipcode=address.get("zipcode")
)) #D
# create products
for product in products.data.values(): #E
session.add(Product(
name=product.get("name")
)) #E
# commit these items
session.commit() #F

# build a map of orders
orders_map = {str(index): Order() for index, order in enumerate(orders.data.values(), start=1)} #G

# build the orders and items
for item in items.data.values(): #H
# get the order_id and order associated with this item
order_id = item.get("order_id") #I
order = orders_map.get(order_id) #I

# get the customer, address and product associated with the item
customer_id = orders.data.get(order_id).get("customer_id") #J
customer = session.query(Customer).filter(Customer.customer_id == customer_id).one_or_none() #J
address_id = orders.data.get(order_id).get("address_id") #K
address = session.query(Address).filter(Address.address_id == address_id).one_or_none() #K

if order.customer is None: #L
order.customer = customer #L
if order.address is None: #L
order.address = address #L

# create an item with it's many-to-many associations
product_id = item.get("product_id") #M
product = session.query(Product).filter(Product.product_id == product_id).one_or_none() #M
new_item = Item(
qty=item.get("qty")
) #M
new_item.product = product #M
order.items.append(new_item) #M

# add the populated orders to the session and database
for order in orders_map.values(): #N
session.add(order) #N
session.commit() #N

#A Load all of the CSV files into variables that are rows of dictionaries

#B Use a context manager to control when the objects are committed (or not) to the database

#C Create Customer instances and add them to the database session

#D Create Address instances and add them to the database session

#E Create Product instance and add them to the database session

#F Commit the session to the database, assigning unique ids to all the objects in that session and persisting them in the database

#G Create an orders map to help connect orders, items, customers, and products together

#H Iterate over the items

#I Find the order the current item is related to

#J Find the customer to which the found order is related. The second statement is an SQLAlchemy query to get the customer instance

#K Find the address to which the found order is related. The second statement is an SQLAlchemy query to get the address instance

#L Assign the customer and address to the order only if they don’t already exist

#M Find the product instance to relate to the item, assign it to the item and then append the item to the order

#N Add all the initialized orders to the session and commit the session to the database, persisting the orders and items

There is quite a lot going on in the code above. The gist is to read the CSV files and use the data to create instances of the corresponding SQLAlchemy models. Then use the SQLAlchemy maintained attributes to develop the relationships between the instances.

Creating the customer, address, and product instances and then persisting them to the database with the session.commit() statement generates the unique id Primary Key value for each record. The Primary Key values are used later to establish relationships when creating the orders and the items in those orders.

Using the Data

The examples/02/main.py program demonstrates using the transaction.sqlite database to generate the invoice PDF files for all the orders.

import os
import csv
import sqlite3
from pathlib import Path
from jinja2 import Environment, FileSystemLoader
from weasyprint import HTML
from models import load_database, Order, session_manager #A


def create_invoice(order): #B
"""Create the PDF invoice for the order

Args:
info (dict): The info information to generate the invoice with
"""
invoice_filename = f"invoice_{order.order_id}.pdf"

# delete existing order invoice file if exists
if os.path.exists(invoice_filename):
os.remove(invoice_filename)

# set up Jinja2 to generate the HTML and then the PDF file
path = Path(__file__).parent
env = Environment(loader=FileSystemLoader(Path(path)))
template = env.get_template("invoice_template.jinja")
html_out = template.render(order=order) #C
HTML(string=html_out).write_pdf(
invoice_filename,
stylesheets=[
"page.css",
"bootstrap.css",
]
)

# load the database
load_database() #D

# generate an invoice file for all the orders
with session_manager() as session: #E
for order in session.query(Order).all(): #E
create_invoice(order) #E

#A Import functionality from the models.py module. Notice only the SQLAlchemy Order definition is used

#B Pass a single Order instance to the create_invoice function

#C Pass the single Order instance to the Jinja template as the context parameter

#D Call the load_database function defined in the models.py module to populate the database

#E With a database session, query the database for all orders, iterate over them and pass the single instances to the create_invoice function

The bulk of this program creates the invoice PDF files from the single SQLAlchemy Order instance passed to it. Because of the relationships established by the model’s everything necessary to print an invoice is connected to the Order instance.

The Jinja template uses the order context parameter to fill in the dynamic parts of the template:

<html lang="en">
<head>
<title>Invoice</title>
</head>
<body>
<div class="container border border-dark border-2 rounded-2">
<div class="container border border-dark mt-3 mb-3">
<h3>Invoice</h3>
Customer: {{order.customer.name}}<br /> #A
Street: {{order.address.street}}<br /> #B
Zip Code: {{order.address.zipcode}}<br /> #B
Order Number: {{order.order_id}}<br /> #C
</div>
<table class="table table-striped table-bordered caption-top">
<caption>
Order Items
</caption>
<thead>
<tr>
<th>Item Number</th>
<th>Product Name</th>
<th>Quantity</th>
</tr>
</thead>
<tbody>
{% for item in order.items %} #D
<tr> #D
<td>{{loop.index}} #D
<td>{{item.product.name}} #D
<td>{{item.qty}} #D
</tr> #D
{% endfor %} #D
</tbody>
</table>
</div>
</body>
</html>

#A Use the order instance customer attribute to get the customer’s name

#B Use the order instance address attribute to get the address the order was shipped to

#C Print out the order’s unique id value

#D Use the order instance items collection to print out the item information in a table

Creating models for the tables with relationships between the tables SQLAlchemy provides attributes to access the data hierarchically rather than in a flat two-dimensional way.

The models, and the object instances created from them, let you think about the data in Pythonic ways rather than manage relationships yourself and jump from list to dictionary and back.

The examples in this article and their simple data requirements allowed us to see how databases can greatly enhance merely persisting data. Adding relationships to the data structure, the storage of interrelated data is greatly simplified, and functionality is improved.

That’s all for this article. If you want to see more of the book, check it out on Manning’s liveBook platform here.

--

--

Follow Manning Publications on Medium for free content and exclusive discounts.

Love podcasts or audiobooks? Learn on the go with our new app.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store