Microsoft Azure

Microsoft Azure, as well as AWS, offers a wide variety of data platform services that enable you to design a persistent data layer for your applications or find a data storage option apt for your data ingestion and processing needs. With Azure offering both relational and non-relational data store services, whose total number exceeds 20, it can be a little bit overwhelming for a person without previous background in data platforms to navigate through them. But it is not that scary once you take your time to learn more about individual services and their intended use cases.

Personally, I have spent a lot of my time working with traditional on-premises application architectures and, in this world, SQL Server, or some other variety of a relational data store, most of the time, acts as a de facto standard for applications’ data layer. In the world of modern, cloud-native applications and big data, a wider variety of non-relational data stores are being used, and it can be a bit difficult to quickly grasp an entire spectrum of all these options even at a foundational level, let alone building deep expertise in all of them. Azure offers data store options of both types sufficient for architecting from scratch or transitioning your on-premises applications to the cloud, and as you will be learning more about them, you will see that some of these services are intended for developing from scratch, while others are here to facilitate a transition of existing applications into the cloud with minimal changes. One possible option to get an initial familiarity with an entire spectrum of Azure Data platform services is to take some course or learning materials geared towards the Microsoft DP-900 Azure Data Fundamentals exam, which covers the basics of the entire Azure Data Platform service offering (I’m taking this exam in a few days) and from that, you can go deeper exploring specific data platform services or service types in depth based on your practical needs or personal interests. As one of my favorite AWS instructors Adrian Cantrill says – “architecture is at the core of everything”, and to be able to come up with a good architectural decision, knowing all available building blocks is very important, and, although Adrian normally employs this wording while pushing for always taking Architect stream AWS exams first, to me, this entry-level Microsoft DP-900 exam, is something that may serve one well as a good introduction to potential architectural choices when it comes to Microsoft Data Platform too. But if we circle back to an idea of taking various Azure data platform services one by one, or in a small type-based group, this is exactly what I intend to do in this series of blog posts starting with Azure key-value store services.

Azure key-value store services are a subset of NoSQL aka non-relational Azure data store services intended for storing semi-structured data which provide you with a key-value data store. You can think about key-value data store as an associative array, which is normally defined as an abstract data type (also known as a map, symbol table, or dictionary) which composed of a collection of key-value pairs, with the key being unique per collection. This unique key is used to retrieve a specific value, and values themselves can be anything from a number or string to a complex object, such as a JSON file. Key-value data stores store data as a single collection without structure or relations, which differentiates this data store type from relational databases. As there is no need to think about data structure you provision such data stores rapidly and they also provide very fast write capabilities, but at the same time they only provide retrieval by key and values and data is completely opaque for the data store level – it is the responsibility of your application to do any type of real work with values and understand the data. In this case, data store only gives you an entire value, and any changes in it or processing have to be fully handled by your application.

Key-value data stores intended for semi-structured data, i.e., data that has some structure, but no schema. You can think about this data type as a middle ground between structured data stored in RDBMS and unstructured data typically stored in blobs (binary large objects) which can be only opened and manipulated by external applications. Key-value data is just one type/format of semi-structured data, other types being JSON and XML documents or graphs (and yes, key-value store value can be JSON document, for example).

When you may want to use key-value data stores in general? The key-value data store is a good choice:

  • When you need the simplest and the fastest NoSQL data store
  • When you need fast, high-volume data ingestion
  • When you need fast read and write operations, and search capabilities are secondary
  • When data structure/number of properties per entity varies

For sure, it can be all advantages with no downsides, so let’s mention some limitations:

  • Search is only possible on key, i.e., you can’t search on values as values are opaque for the data storage system which sees them as unstructured blocks – it is the responsibility of an app that retrieves these values to know data structure and do something with it
  • Write operations are restricted to inserts and deletes (i.e., to perform a value update, the application will need to retrieve an entire value, change it in memory, and overwrite the entire value in a data store)

When it comes to Azure, you can get a key-value data store in two flavors – Azure Table Store or Azure Cosmos DB Table API. With the latter being heavily promoted by Microsoft as a more feature-rich version of a Table Store. Let’s have a look at those two options to understand when you may want to use each of them.

Azure Table Storage is one of the services based on Azure Storage Account. Personally, I still can’t get rid of perceiving Azure storage accounts as a kind of matryoshka that hides too many things inside, but I will save these thoughts for some write-up on storage accounts. But if we think about what key-value store and storage account are offering on a foundational level you will see that it is quite logical to build one on top of another. This means that to provision Azure Table Storage, you first create a storage account, and then configure it to provide Table Storage Service creating a table (see screenshot below). You can create multiple tables within one storage account, and the number of tables and entities in them is only limited by the capacity limit of the storage account, which translates into the maximum possible table size of 500 TiB (roughly 550 TB).

Azure Storage Account – Create Table

Azure Storage Account – Create Table

Once provisioned, Azure Table Storage, accepts authenticated calls from inside and outside the Azure cloud and can be used to store flexible datasets and is well suited for web applications data, address books, device information, or other types of metadata.

Azure Table Storage table exposed through the URL of the format shown below and can be consumed using OData protocol.

Azure Table Storage – Table URL structure

Azure Table Storage – Table URL structure

As you can see, the Table Storage service is comprised of the following components: storage account which contains table(s), and entities inside of tables which are pairs of keys and values.

Entity size is limited to 1 MB, and it can include up to 252 properties (name-value pairs) combined with 3 system properties – partition key, row key, and time stamp. You may note that with the introduction of partitioning/partition key here it gets slightly more complicated than pure associative array definition and you can refer to Table Service Data Model documentation to explore this, but shortly speaking it allows for more efficient data queries and more interesting certification exam questions (make sure you have a good grasp of partition keys concept). You should keep in mind that, as well as being part of the addressing scheme for entities, partitions define a scope for transactions and form the basis of the table service scalability.

All in all, Microsoft defines Table Storage as a NoSQL key-value store for rapid development using massive semi-structured datasets while also offering more feature-rich Azure Cosmos DB Tables API service, emphasizing that both services use the same concepts and availability of unified SDK which works for both types.

Azure Cosmos DB Table API. With Azure Cosmos DB being a multi-model database service, you can use its Table API to get a key-value store which can be considered a more advanced version of Azure Table Store. Throughout their documentation, Microsoft positions Cosmos DB Table API as a better and newer version of Azure Table Storage (and I guess overall Cosmos DB is what they push for as a recommended data store, the same way as AWS pushes for Aurora DB – both of these offerings provide attractive feature sets combined with backward compatibility enabling features and positioned as a go-to choice for building new applications on these platforms).

To create an Azure Cosmos DB Table API service instance, you create an Azure Cosmos DB account selecting Azure Table type (see screenshot below).

Azure Cosmos DB – Create Azure Table API account

Azure Cosmos DB – Create Azure Table API account

Once the Azure Table API account is created you can use Data Explorer to create Tables within it. And don’t be confused by the presence of Create Database option under Common Tasks of Data Explorer – this is not possible within the Table API account, and hopefully Microsoft will make their unified Data Explorer UI more context-aware in the future so that it only shows things relevant for the account type.

Azure Cosmos DB Data Explorer – Create Table

Azure Cosmos DB Data Explorer – Create Table

As I mentioned, concepts we discussed in relation to Azure Table Storage are the same for Cosmos DB Table API, but with this service, we also get the following additional features:

  • Improved performance and availability – single-digit millisecond latency for reads and writes
  • Turnkey global distribution – you can add remove regions to distribute into at any time depending on your needs
  • Automatic secondary indexes – all properties are indexed automatically without management overhead (Table Storage only allows indexing on the Partition and Row keys)
  • Optional consumption-based serverless model – you are only charged for the Request Units consumed by your database operations and the storage consumed by your data (still offering milliseconds latency levels)
  • Enterprise-grade security – built-in encryption for data at rest and in transit, IP-based access control

Some of the listed features are seemingly clear improvements which available to you as soon as you go just by virtue of selecting a more advanced service and paying for it, but at the same time, some of them imply increased requirements to your understanding of the consequences of your configuration choices (e. g. global distribution requires you to understand flexible consistency model options and limitations). And most of these advanced features are common to all Cosmos DB accounts irrespectively of API type.

Now let’s have a look at the feature differences between these two services, which will give you a clearer view of how they stack up against each other. In addition to feature differences, there are certain behavioral differences between the two which are mentioned in Cosmos Table API FAQ, and although you can use unified .NET Azure Tables SDK Azure.Data.Tables to interact with both services, you need to be aware of those. You can see the key feature and behavioral differences between Azure key-value store services enumerated in the two tables below.

Table 1. Azure Table Storage vs Cosmos DB Table API – feature differences

Table 1. Azure Table Storage vs Cosmos DB Table API – feature differences

Table 2. Azure Table Storage vs Cosmos DB Table API – behavioral differences

Table 2. Azure Table Storage vs Cosmos DB Table API – behavioral differences

Once we looked at both services in terms of their features, we still need to discuss their pricing. Personally, to me, pricing of cloud services, in general, reminds a situation with mobile phone operators’ offerings where you are confronted with various options with deceivingly low per-minute rates which are causing a bit of headache at the time you need to evaluate your projected monthly bill or decide on a preferred option, but it seems to be a price we need to pay to be saved from huge CAPEX at the start of any project 😊… In the end, it translates into a cloud provider guarding his revenue and client getting read of CAPEX in exchange for (mostly) consumption-based OPEX, the rest must be calculated, evaluated, and reviewed based on actual usage. Just to give you some idea on pricing for these services, Cosmos DB costs 0.008/0.016 USD per hour per 100 RU depending on whether you are using multi-region write and 0.282$ per 1M RU when the Serverless model is used. This must be combined with costs of storage consumed by SSD-backed data and index across all the Azure regions your database is distributed to and billed per GB, and you also need to consider that 400 RUs is a minimum which you have to provision for Azure Cosmos DB containers and databases (in case of provisioned throughput). With Table Storage being a storage account-based service, you charged per GB a month and the exact rate depends on a redundancy configuration of the storage account (LRS/GRS/RA-GRS/ZRS/GZRS/RA-GZRS) and varies between 0.045 and 0.1265 USD and must be combined with 0.00036 fee per 10,000 transactions for tables.

To conclude, both of these key-value store services offer the same data store type and use similar concepts, with Cosmos DB Table API providing improved performance and availability at a (slightly) higher price. You can think about them as a storage account and Cosmos DB-based key-value store services, and their strengths and limitations are largely shaped by the service they are based on (storage account and Cosmos DB respectively). As you can interact with those services using unified Table API you can also think about these services as “standard” and “premium” Table API versions.

Without entering into the details, I would say that Table Store is a perfect fit for testing and small projects where benefits of Cosmos DB are irrelevant, and you are not clear if they will ever become necessary, and your budget is very constrained to pay even slightly more for unneeded premium features. Being a simpler service Table Storage is also implying fewer configuration decisions which increase its appeal for PoC and experimental project use cases. Pricing wise, you have a better chance of minimizing your costs with Table Store, especially when selecting between Table Store and Cosmos DB Table API with provisioned throughput which scales up to configured minimum but can’t be lower than 400 RUs which are billed hourly (and this will generate expense irrespectively of usage). Availability of Cosmos DB Table API Serverless changes your pricing concerns and with that option, we can say that both services provide consumption-based pricing, and that leaves you with the conclusion that you should opt for Cosmos DB whenever you have a real need for any of advanced features it offers. The exception to this rule of thumb can be big scale and big-budget projects where the price of switching to other storage services in the later stages may exceed a relatively small premium paid for unused features.

VSAN from StarWind is software-defined storage (SDS) solution created with restricted budgets and maximum output in mind. It pulls close to 100% of IOPS from existing hardware, ensures high uptime and fault tolerance starting with just two nodes. StarWind VSAN is hypervisor and hardware agnostic, allowing you to forget about hardware restrictions and crazy expensive physical shared storage.

Build your infrastructure with off-the-shelf hardware, scale however you like, increase return on investment (ROI) and enjoy Enterprise-grade virtualization features and benefits at SMB price today!

If you want to learn more about Azure key-value store services, Microsoft documentation covers all the details you may want to know about them, and Azure Architecture center has really good content to read up on architectural design patterns and considerations. For your convenience, you may find some of the links to Microsoft documentation below.

Azure Table storage documentation

Azure Cosmos DB Table API documentation

Table Storage Pricing

Azure Cosmos DB Pricing

Azure Architecture Guide – Understand data store models

Table Design Patterns

Design scalable and performant tables

Design a scalable partitioning strategy for Azure Table storage

That’s all I wanted to say about Azure key-value data store services, and I hope it was useful for you. Like I said at the beginning of this post, taking all available Azure data store types one by one is a good strategy to understand them better, and my plan is to write more posts on other data store types, such as column-family and analytical stores. Please, do let me know in the comments if you are interested in that.

Back to blog