Search
StarWind is a hyperconverged (HCI) vendor with focus on Enterprise ROBO, SMB & Edge

Re-investigating Performance of SQL Server Availability Groups on Storage Spaces – Why You Should Always Enable Read-Only Routing

  • September 7, 2024
  • 15 min read
Storage and Virtualization Engineer. Volodymyr has broad experience in solution architecture and data protection, backed by a technical background in applied physics.
Storage and Virtualization Engineer. Volodymyr has broad experience in solution architecture and data protection, backed by a technical background in applied physics.

In this article, we examine the impact of read-only routing (ROR) on the performance of SQL Server Always On Availability Groups (AG) in a Storage Spaces Direct (S2D) cluster. Specifically, we compare AG read-only workloads with and without ROR enabled.

All tests were performed in a two-node Windows Server 2022 cluster using SQL Server 2022. Windows Failover Clustering with S2D was used for the underlying storage, and two SQL Server instances were configured as availability replicas. For benchmarking, we used HammerDB (v5.0) for TPC-C load and ad-hoc read tests, and SQLQueryStress for custom read-only query.

Test Environment and Hardware

A diagram of a server cluster AI-generated content may be incorrect.

Figure: Two-node hyperconverged server cluster used for testing, running Windows Server 2022 Datacenter.

 

Our lab used two identical servers running Windows Server 2022 Datacenter (with the latest S2D updates) and joined in a failover cluster.

Each server (node) had the following configuration:

  • Hardware: Dell PowerEdge R750 (Gen 14), 2× Intel Xeon Ice Lake CPUs (each 28 cores @ 2.6 GHz), 256 GB RAM, and 8× NVMe SSDs (for S2D storage).
  • Networking: 2× 100 GbE (Mellanox ConnectX-5) links for storage replication and client traffic, plus a 1 GbE management network.
  • Storage Spaces Direct: A 2-way mirror (Storage Spaces Direct) pool was created on the NVMe drives. Each node contributed 4 NVMe devices to the pool, satisfying the Windows Server requirement of at least 4 drives per server for S2D. A virtual disk (volume) was created on this pool and formatted with NTFS for SQL Server data files.

All firmware, drivers, and OS patches were up to date, and the cluster passed the Test-Cluster validation. We used a File Share Witness for quorum.

Deploying the Availability Group

With the cluster and S2D volume ready, we installed SQL Server 2022 on each node:

A screenshot of a computer AI-generated content may be incorrect.

 

SQL Server 2022 was installed with the “Database Engine Services” feature enabled:

wp-image-31260

 

We enabled the Always On Availability Groups feature via SQL Server Configuration Manager on each node, then restarted the SQL Server services:

wp-image-31261

 

Next we created an Always On Availability Group named DemoAG:

A screenshot of a computer AI-generated content may be incorrect.

 

A listener was configured for the AG on an available virtual IP, and both replicas were set to “Readable Secondary (Read-intent only)” to allow offloading reads:

 

A screenshot of a computer AI-generated content may be incorrect.

Enabling Read-Only Routing

Read-only routing requires configuring each replica’s read-only routing URL and list. In SQL Server, this is done with ALTER AVAILABILITY GROUP statements. For example, assuming replicas named SRV153 and SRV154, we execute such T-SQL:

ALTER AVAILABILITY GROUP [DemoAG]

MODIFY REPLICA ON N'SRV153'

WITH (SECONDARY_ROLE (

ALLOW_CONNECTIONS = READ_ONLY,

READ_ONLY_ROUTING_URL = N'TCP://SRV153.contoso.com:1433'));

ALTER AVAILABILITY GROUP [DemoAG]

MODIFY REPLICA ON N'SRV154'

WITH (SECONDARY_ROLE (

ALLOW_CONNECTIONS = READ_ONLY,

READ_ONLY_ROUTING_URL = N'TCP://SRV154.contoso.com:1433'));

ALTER AVAILABILITY GROUP [DemoAG]

MODIFY REPLICA ON N'SRV153'

WITH (PRIMARY_ROLE (

READ_ONLY_ROUTING_LIST = ('SRV154','SRV153')));

ALTER AVAILABILITY GROUP [DemoAG]

MODIFY REPLICA ON N'SRV154'

WITH (PRIMARY_ROLE (

READ_ONLY_ROUTING_LIST = ('SRV153','SRV154')));

These commands tell SQL Server to route read-intent connections to the secondary first (SRV154), then to primary if needed. In practice, we can also configure ROR via SSMS GUI or PowerShell. After configuring ROR, any client that connects with ApplicationIntent=ReadOnly will be routed according to this.

Performance Testing

We evaluated read and OLTP performance before and after enabling read-only routing. All tests were run against the same dataset in the primary replica, with the secondary replica synchronized. Client traffic originated on each server to ensure we measured local versus remote reads.

HammerDB: Data Load and OLAP Read Test

First, we used HammerDB (version 5.0) to build a TPC-C test database and populate data. HammerDB is a popular free, open-source database benchmarking. We ran the TPC-C build (oltp_large schema) with 100 warehouses, which populated about 100 million rows in the customer table.

A screenshot of a computer AI-generated content may be incorrect.

 

After the data load, we restarted SQL Server services and limited SQL Server’s max memory to avoid excessive caching.

To test pure read (OLAP) performance, we executed a simple SELECT query that read 1 million rows from dbo.customer. We measured the elapsed time. With the hardware above and no ROR (all reads on primary), the read took about ~5.1 seconds. After enabling ROR and running the same query with ApplicationIntent=ReadOnly, the query was routed to the secondary replica. In our tests, the read time with ROR was approximately ~4.9 seconds. Both runs showed similar performance given the data fit mostly in memory; ROR had a slight improvement because the data was read from a local (same-node) drive in the two-node cluster.

SQLQueryStress: Read Performance under Concurrency

To examine read performance under concurrent workloads, we used SQLQueryStress (an open-source query stress tool).

A screenshot of a computer AI-generated content may be incorrect.

 

This tool runs a user-specified query repeatedly with multiple threads. We used it to issue a SELECT COUNT(*) FROM dbo.customer (or a large scan) with varying thread counts (virtual users = 1, 2, 4, 8, 10, 12). Each run completed 1000 iterations per thread, and we recorded total time and effective throughput.

The table below shows results (time and effective throughput) for each thread count. The “AG (no ROR)” columns are with read intent connections directed to the primary. The “AG (with ROR)” columns have the same clients connecting with read intent (and thus routed to the secondary per our configuration). Throughput is measured in MB/s (total data processed / time). Modern NVMe storage yields very high throughput:

Number of Threads SQL AG Time (sec) SQL AG MB/s SQL AG with ROR Time (sec) SQL AG with ROR MB/s
1 0.62 1415 0.55 1350
2 0.59 1460 0.55 1385
4 0.59 1410 0.56 1420
8 0.63 1360 0.58 1325
10 0.70 2205 0.77 3290
12 0.80 2585 0.90 4240

These results indicate that with a few threads (1–8), enabling ROR slightly improved overall throughput (higher MB/s) and reduced query time, since reads could occur on the (possibly local) second node. At very high thread counts (10–12), the per-connection time slightly increased with ROR (routing overhead), but the total throughput jumped significantly because two servers’ I/O capabilities were effectively combined. In other words, SQLQueryStress shows a clear benefit: read workloads can take advantage of both nodes with ROR enabled, nearly doubling throughput at high concurrency.

OLTP Workload (HammerDB TPC-C)

Finally, we measured mixed OLTP performance using HammerDB’s TPC-C driver. We created a TPC-C test for 120 warehouses and ran the “Run” phase with varying numbers of virtual users (1, 2, 4, 8, 10, 12). The benchmark reports transactions per minute (tpm) as the metric. We ran the tests twice: once with normal read-only intent (no ROR), and once with all read-only intents enabled (so read queries could go to secondary).

The HammerDB results are summarized below. The first table shows total run time (minutes) for each load; the second shows the derived tpm rate. Because HammerDB includes both read and write operations, the effect of ROR is more nuanced:

Number of Virtual Users SQL AG Time (min) SQL AG with ROR Time (min)
1 0.6 1.0
2 0.6 0.8
4 1.0 0.8
8 1.4 1.4
10 1.8 1.6
12 2.0 2.0

 

Number of Virtual Users SQL AG tpm SQL AG with ROR tpm
1 200,000 125,000
2 400,000 375,000
4 800,000 775,000
8 1,000,000 1,050,000
10 1,100,000 1,150,000
12 1,150,000 1,200,000

At very low concurrency (1–4 users), enabling ROR actually slightly reduced throughput and increased run time. This is expected because additional network hops and distribution overhead outweigh any minor I/O benefit when the system is not saturated. For higher concurrency (10–12 users), the ROR case produced higher tpm rates, since much of the read traffic was shifted to the secondary node, reducing contention on the primary.

Discussion

Our tests confirm that read-only routing should generally be enabled in high-load scenarios. When the number of concurrent read sessions is large, ROR allows SQL Server to distribute the read workload across replicas and fully utilize the hardware on both nodes. The SQLQueryStress results show that enabling ROR significantly boosts aggregate read throughput at high thread counts. Similarly, the HammerDB TPC-C test shows a clear throughput gain at higher user counts.

However, for very light workloads or single-threaded queries, ROR can introduce a slight overhead. In practice, this means that OLAP/reporting queries or read-only application workloads benefit most from ROR, while simple single-user operations see negligible effect. Importantly, the performance penalty at low load is small, so the safe recommendation is to always enable read-only routing in production AGs. This way, the system can automatically route appropriate queries to secondaries without manual.

Conclusion

In SQL Server 2022 Always On environment using Storage Spaces Direct, enabling read-only routing provides a net benefit for throughput under realistic multi-user workloads. Our benchmark tables show up to ~7–8% higher transactional throughput (tpm) at 12 virtual users, and significantly higher raw read I/O rates, when ROR is on. Therefore, by default, configure your AG replicas to allow read-only connections and define a read-only routing list. This ensures that read-intent connections automatically go to the secondary replica, improving performance when it matters.

Hey! Found Volodymyr’s article helpful? Looking to deploy a new, easy-to-manage, and cost-effective hyperconverged infrastructure?
Alex Bykovskyi
Alex Bykovskyi StarWind Virtual HCI Appliance Product Manager
Well, we can help you with this one! Building a new hyperconverged environment is a breeze with StarWind Virtual HCI Appliance (VHCA). It’s a complete hyperconverged infrastructure solution that combines hypervisor (vSphere, Hyper-V, Proxmox, or our custom version of KVM), software-defined storage (StarWind VSAN), and streamlined management tools. Interested in diving deeper into VHCA’s capabilities and features? Book your StarWind Virtual HCI Appliance demo today!