Introduction to Azure Storage
From Learn Azure in a Month of Lunches, Second Edition by Iain Foulds
This article delves into Azure storage.
Storage may not seem the obvious topic to examine how to build and run applications, but it’s a broad service that covers a lot more than you may expect. In Azure, there’s much more available than somewhere to store files or virtual disks for your VMs.
Let’s look at what your fictional pizza company may need to build an app that processes orders from customers for takeout or delivery. The app needs a data store that holds the available pizzas, list of toppings, and prices. As orders are received and processed, the app needs a way to send messages between the different application components. The frontend website then needs mouth-watering images to show customers what the pizzas look like. As you can see in figure 1 Azure Storage has a variety of storage features, and can cover all three of these needs:
- Blob storage — For unstructured data such as media files and documents. Applications can store data in blob storage, such as images, and then render them. You could store images of your pizzas in blob storage.
- Table storage — For unstructured data in a NoSQL data store. As with any debate on SQL versus NoSQL data stores, plan your application and estimate the performance requirements when it comes to processing large amounts of data. You could store the list of pizzas on your menu in table storage.
- Queue storage — For cloud applications to communicate between various tiers and components in a reliable and consistent manner. You can create, read, and delete messages that pass between application components. You could use queue storage to pass messages between the web frontend when a customer makes an order and the backend to process and bake the pizzas.
- File storage — For a good old-fashioned Server Message Block (SMB) file share, accessible by both Windows and Linux/macOS platforms. Often used to centralize log collection from VMs.
Azure Storage for VMs is straightforward. You create and use Azure managed disks, a type of virtual hard disk (VHD) that abstracts away a lot of design considerations around performance and distributing the virtual disks across the platform. You create a VM, attach any managed data disks, and let the Azure platform figure out redundancy and availability.
Let’s discuss a couple of different types of data storage. First is table storage. Most people are probably familiar with a traditional SQL database such as Microsoft SQL Server, MySQL, or PostgreSQL. These are relational databases, made up of one or more tables which contain one or more rows of data. Relational databases are common for application development and can be designed, visualized, and queried in a structured manner — the S in SQL (for Structured Query Language).
NoSQL databases are a little different. They don’t follow the same structured approach, and data isn’t stored in tables where each row contains the same fields. NoSQL databases have different implementations: examples include MongoDB or CouchDB. The touted advantages of NoSQL databases are the ability to scale horizontally (meaning you can add more servers rather than adding more memory or CPU), to handle larger amounts of data, and a more efficient processor of large data sets.
How the data is stored in a NoSQL database can be defined in a few categories:
- Key-value, such as Redis
- Column, such as Cassandra
- Document, such as MongoDB
Each approach has pros and cons from a performance, flexibility, or complexity viewpoint. An Azure storage table uses a key-value store and it’s a good introduction to NoSQL databases when you’re used to an SQL database such as Microsoft SQL or MySQL.
You can download and install the Microsoft Azure Storage Explorer from www.-storageexplorer.com if you like to visualize the data. You don’t need to do this right now. The Storage Explorer is a great tool to learn what tables and queues look like in action. In this article I don’t want to take you too far down the rabbit hole of NoSQL databases. In fact, in the following exercise, you use the Cosmos DB API to connect to Azure Storage and create a table. The use of Azure tables is more of an introduction to NoSQL databases than a solid example of production use.
For now, let’s run a quick sample app to see how you can add and query data, as you’d do with an application. These samples are basic but show how you can store the types of pizzas you sell and how much each pizza costs. Rather than use something large like Microsoft SQL Server or MySQL, let’s use a NoSQL database with Azure table storage.
Try it now
To see Azure tables in action, complete the following steps.
- Open the Azure portal in a web browser, and then open the Cloud Shell.
- Grab a copy of the Azure samples from GitHub. as follows:
- Change into the directory that contains the Azure Storage samples:
- Install a couple of Python dependencies, if they aren’t already installed. Here you install the
azurermpackage, which handles communication which allows you to create and manage Azure resources, and two
azurepackages, which are the underlying Python SDKs for Azure CosmosDB and Storage:
pip install --user azurerm azure-cosmosdb-table azure-storage-queue
- What does –user mean when you install the packages? If you use the Azure Cloud Shell, you can’t install packages in the core system. You don’t have permissions. Instead, the packages are installed in your user’s environment. These package installs persist across sessions and let you use all the neat Azure SDKs in these samples.
- Run the sample Python application for tables. Follow the prompts to enjoy some pizza:
Snakes on a plane
Python is a widely used programming language which is often used in “Intro to Computer Science” classes. If you mainly work in the IT operations or administration side of things, think of Python as a powerful scripting language that works across OSs. Python isn’t only for scripting — it can also be used to build complex applications. As an example, Azure CLI is written in Python.
I use Python for some of the samples in this book because they should work outside of the Azure Cloud Shell without modification. macOS and Linux distros include Python natively. Windows users can download and quickly install Python and run these scripts locally. Python is great for those with little-to-no programming experience, as well as more seasoned developers in other languages. The Azure documentation for Azure Storage and many other services provides support for a range of languages including .NET, Java, and Node.js. You’re not limited to using Python as you build your own applications that use tables.
Azure tables are cool when you start to dip your toes into the world of cloud application development. As you begin to build and manage applications natively in the cloud, you typically break an application into smaller components which can each scale and process data on their own. To allow these different components to communicate and pass data back and forth, some form of message queue is typically used. Enter Azure Queues.
Queues allow you to create, read, and then delete messages that carry small chunks of data. These messages are created and retrieved by different application components as they pass data back and forth. Azure Queues won’t delete a message until an application component confirms it has finished acting on it when read.
Let’s continue the example application that handles pizza orders. You may have a frontend application component that customers interact with to order their pizza, and then a message queue that transmits messages to a backend application component for you to process those orders. As orders are received, messages on the queue can be visualized as shown in figure 2.
As the backend application component processes each pizza order, the messages are removed from the queue. Figure 3 shows what the queue looks like once you have a veggie pizza in the oven and that first message is removed.
Storage availability and redundancy
Azure datacenters are designed to be fault-tolerant with redundant internet connections, power generators, multiple network paths, storage arrays, and and the like. You still need to do your part when you design and run applications. With Azure Storage, you choose what level of storage redundancy you need. This level varies for each application and how critical the data is. Let’s examine the available storage-redundancy options:
- Locally redundant storage (LRS) — Your data is replicated three times inside the single datacenter in which your storage account was created. This option provides for redundancy in the event of a single hardware failure, but if the entire datacenter goes down (rare, but possible), your data goes down with it.
- Zone-redundant storage (ZRS) — The next level up from LRS is to replicate your data three times across two or three datacenters in a region (where multiple datacenters exist in a region), or across regions. ZRS is also available across availability zones.
- Geo-redundant storage (GRS) — With GRS, your data is replicated three times in the primary region in which your storage is created and then replicated three times in a paired region. The paired region is usually hundreds or more miles away. For example, West US is paired with East US, North Europe is paired with West Europe, and Southeast Asia is paired with East Asia. GRS provides a great redundancy option for production applications.
- Read-access geo-redundant storage (RA-GRS) — This is the premium data–redundancy option. Your data is replicated across paired regions like GRS, but you can also then have read-access to the data in that secondary zone.
That’s all for this article.
If you want to learn more about the book, check it out on our browser-based liveBook reader here.