Data residency, sovereignty, and localization: a practical guide for infrastructure teams

Pick any dataset in your environment. Customer records, VM backups, audit logs, archived file shares, SaaS exports – whatever’s generating support tickets this week.

Now ask three questions.

Where’s it physically stored? Which laws apply? Can it legally leave?

Most infrastructure teams can answer the first one. The next two are harder.

Today, those questions influence cloud strategy, vendor selection, disaster recovery planning, AI adoption, and whether certain projects can move forward at all. Data residency, data sovereignty, and data localization describe different constraints. Understanding the distinction matters as much as understanding availability zones or backup retention policies. (If you think AI doesn’t affect residency, ask your legal team about the EU AI Act.)

What is data residency?

Data residency is simply where data is physically stored. That’s it.

If your backups live in Dublin, your object storage sits in Frankfurt, and your DR copies replicate to Warsaw, those are residency decisions. They describe location.

We rarely choose a region purely because a regulation says we should. Compliance matters, but so do latency requirements, customer contracts, cyber insurance questionnaires, disaster recovery objectives, and procurement policies.

A SaaS provider serving European customers might select Frankfurt because it reduces latency. A government customer might require in-country storage as part of the procurement process. A cyber insurer might be uncomfortable with certain jurisdictions and require data to remain elsewhere.

The important point? Residency answers a physical question: where does the data actually live? Everything else comes later.

What is data sovereignty?

If residency tells you where the data is, sovereignty tells you whose laws get involved.

Once data exists within a country’s jurisdiction, local authorities can exercise whatever legal powers are available under that country’s laws. That’s the foundation of data sovereignty.

The challenge is that physical location and legal exposure don’t always align as neatly as architecture diagrams suggest. A US company can store European customer data in Frankfurt and satisfy a residency requirement. That doesn’t automatically eliminate the legal reach of US authorities if the provider, parent company, or service operator falls under US jurisdiction. This is where many compliance discussions become confusing. Storage location and legal authority are related, but they aren’t the same thing.

We usually focus on where the data sits. Legal and compliance teams focus on who can compel access to it.

Sovereignty beyond borders: Schrems II and the CLOUD Act

One of the biggest misconceptions in infrastructure planning is that legal authority stops at national borders. It doesn’t.

We’ve all sat in meetings where legal and infrastructure talk past each other on this point.

GDPR is a good example. The regulation applies to the personal data of EU residents regardless of where that data is stored. A dataset containing EU personal information remains subject to GDPR obligations whether it sits in Paris, Virginia, Singapore, or Sydney.

That tension became highly visible after the Schrems II ruling in 2020. The ruling invalidated the EU-US Privacy Shield framework and forced organizations to reassess how personal data moved between jurisdictions.

The US CLOUD Act introduces pressure from the opposite direction. Under certain circumstances, US authorities can compel US-based providers to produce data under their control even when that data is stored outside the United States, which means your Frankfurt copy isn’t necessarily beyond reach.

From an infrastructure perspective, this creates an uncomfortable reality. The storage location may be Frankfurt. The legal request may originate in Washington. Both facts can be true at the same time.

For some workloads, that risk is acceptable. For others – government, defense, healthcare, critical infrastructure – it becomes a deciding factor in platform selection.

This is why sovereignty discussions increasingly involve lawyers, compliance officers, procurement teams, security architects, and infrastructure engineers in the same meeting. Everyone’s looking at the same dataset through a different lens.

What is data localization?

Localization is the strictest form of residency. Instead of specifying where data should be stored, localization rules specify where it cannot be stored.

Certain categories of information must remain within national borders, sometimes with strict restrictions on replication, backup, processing, or access from outside the country. Government records, national identity systems, defense information, healthcare records, financial transaction data, and critical infrastructure information commonly fall into this category, although the exact rules vary by jurisdiction.

Unlike standard residency requirements, localization leaves little room for architectural flexibility.

A residency requirement might allow data to remain in a particular region while permitting carefully controlled cross-border backups. A localization requirement often removes that option entirely.

We’ve had to maintain separate infrastructure stacks in different countries for localization requirements, and it’s as expensive as it sounds. Organizations may need to limit SaaS adoption, redesign disaster recovery strategies, or deploy dedicated backup environments that would’ve otherwise been consolidated elsewhere. Every release, migration, and platform upgrade becomes more complicated because data movement itself becomes a regulated activity.

The three concepts at a glance

The table below summarizes the distinction.

Concept	What it answers	Driver	Flexibility
Data residency	Where is the data physically stored?	Performance, customer preference, compliance	Usually a business decision
Data sovereignty	Which laws and authorities apply to the data?	Jurisdictional law, corporate parentage	Determined by location, ownership, and provider structure
Data localization	Can the data leave a specific country?	Regulatory mandate	Often non-negotiable

Why this matters today?

We’re paying more attention to residency and sovereignty today for a few reasons.

Enforcement is the first. A decade ago, many organizations treated cross-border data compliance as a problem reserved for hyperscalers and multinational banks, assuming regulators would never bother with ordinary enterprises that barely had a presence outside their home country, but that assumption collapsed around 2022 and now even mid-sized SaaS vendors get questionnaires about data flows.

Scope is the second. Regulations are expanding beyond traditional personal information. Frameworks such as the EU Data Act, DORA, and the AI Act introduce requirements that affect operational data, industrial systems, third-party providers, cloud platforms, and AI-driven workloads.

The third reason is software itself. AI-powered features now appear in products that were never previously part of a data governance discussion, slipping into productivity suites, collaboration tools, CRM platforms, customer support systems, and developer tooling that process data through AI services which may operate across multiple jurisdictions without your infrastructure team ever seeing a network diagram.

Many organizations understand where their databases reside. Far fewer understand where every AI-assisted workflow ultimately sends data. That gap is becoming a governance challenge of its own.

The cost of getting residency wrong

Direct fines are the most visible cost of a residency failure, but they’re rarely the most expensive component.

We’ve watched this sequence play out more than once.

A typical enforcement sequence runs through several stages: regulatory inquiry, legal team bottleneck, paused customer renewals, and forced engineering migration. The true cost lies in the disruption. When regulators ruled that Standard Contractual Clauses (SCCs) combined with superficial safeguards were insufficient for hyperscalers, it sent a clear shockwave. If companies with unlimited legal resources can lose compliance cases, standard enterprise stacks are deeply vulnerable.

Similarly, when mid-sized businesses are caught in cross-border compliance traps, the true damage is measured in hundreds of engineering hours burned on emergency re-architecting projects that land on a regulator’s calendar and completely halt the core product roadmap.

Answer these questions before deployment. It’s considerably cheaper than explaining after regulators start asking where the data went.

Where residency typically breaks

Most residency failures don’t happen in the primary database. That’s usually the one everyone remembers to check.

The problems tend to appear in the surrounding systems: backups, analytics platforms, SaaS integrations, AI features, disaster recovery environments, support tooling, and identity services. By the time data reaches those layers, ownership is often split across multiple teams, vendors, and platforms.

We’ve all seen the compliance pattern where the production workload satisfies residency requirements perfectly. Everything around it doesn’t.

Cloud and SaaS workloads

One of the most persistent misconceptions in cloud architecture is that selecting a region solves residency. Choosing Frankfurt in a cloud console only determines where the primary workload runs. Modern applications generate far more data than the records stored in the production database.

Backups need somewhere to live. Logs are collected and indexed. Monitoring platforms aggregate telemetry. Analytics teams build reporting warehouses. Customer support systems store attachments. Disaster recovery environments replicate data elsewhere. Third-party integrations copy information into collaboration tools, CRM systems, billing platforms, and marketing automation services. Each of those systems has its own residency profile.

I’ve seen organizations spend months validating the location of a production database only to discover that log archives were being exported to another region entirely. Nobody had configured it intentionally. The service simply used the vendor’s default storage location.

SaaS platforms introduce additional complexity because customers rarely control every component of the underlying architecture.

A useful example is Atlassian’s approach to data residency. The company publishes detailed documentation explaining which products support regional pinning, which data types remain global, and what happens when tenants move between regions. Some services can be pinned to a specific geography. Others rely on shared global platforms that aren’t region-specific.

Identity systems are a common example. User accounts, authentication services, and directory infrastructure often sit outside the residency controls available for application data because they operate as global services shared across tenants.

AI features

AI’s introduced an entirely new category of residency challenges. Every major software vendor is adding AI-assisted capabilities to existing products. The problem is that the AI components often operate independently from the rest of the platform’s residency architecture.

The CRM may reside in Europe. The model serving endpoint may not. Treat AI features as separate data-processing systems.

Consider a helpdesk platform that summarizes support tickets. Customer data may remain inside the tenant’s selected region, while the summarization request itself is processed by an inference service elsewhere.

The same issue appears in document search systems. Embeddings, vector databases, prompt histories, and retrieval indexes are all forms of data, and each requires its own storage location, retention policy, and residency controls that the parent application’s compliance team may never have reviewed.

Development tools are no exception. Code completion platforms and AI assistants may retain prompts, telemetry, or interaction logs. Those records can contain source code, credentials, internal architecture details, or customer information.

Business intelligence platforms present similar risks. An AI-generated report may only display a handful of rows to the user, but the underlying inference request could involve much larger datasets behind the scenes. This is why AI governance increasingly overlaps with residency governance.

The questions are usually the same. Where is the inference performed? Are prompts retained? Where are embeddings stored? Is tenant data used for model training? Can processing be restricted to a specific region?

Some vendors have invested heavily in answering those questions. Inference runs in-region, prompts aren’t retained, training on customer data is disabled, and documentation clearly explains the architecture. Others have simply connected their products to whichever model endpoint was easiest to deploy.

The same logic applies to internally developed AI systems. Retrieval-augmented generation (RAG) pipelines, embedding stores, prompt caches, vector databases, fine-tuning datasets, and inference logs all create new data stores. Each one needs a residency decision.

This is one reason many organizations with strict sovereignty requirements are deploying AI workloads on infrastructure they directly control. That might mean on-premises gear, a private cloud, or a sovereign cloud environment. Keeping the models close to governed data is often simpler than proving the data never crossed a border in the first place.

Backups and disaster recovery

If there’s one place residency problems consistently hide, it’s the backup environment.

Production systems tend to receive attention because everyone knows they’re important. Backup systems often inherit settings that were configured years ago and never revisited.

Production data is correctly stored in Germany because a customer contract requires it. Nightly backups are then replicated to object storage in another region because someone – I think it was a contractor who left in 2022 – enabled a default cross-region replication policy during a migration that everyone else has forgotten about. Regulators don’t care which copy caused the violation. A backup is still data. (Obvious, but apparently not to everyone.)

The same issue appears throughout the recovery stack. Immutable repositories live somewhere. WORM storage has a physical location. Snapshot archives sit in specific regions. Disaster recovery replicas run in designated failover sites, while log retention platforms store operational records independently from production systems. Each of those layers creates its own residency obligations.

Organizations can carefully document the location of primary workloads while having no clear inventory of where recovery copies actually exist. During an audit, the backup architecture becomes a larger compliance concern.

Vendor support introduces another dimension that often gets overlooked. Imagine a backup vendor troubleshooting an incident where a support engineer accesses a repository from another country, mounts a backup set, and reviews the contents to diagnose a problem.

From a technical perspective, that’s routine support activity. From a regulatory perspective, it may be a cross-border data transfer. The location of the storage matters, but so does the location of the people who can access it. That’s why mature residency programs map not only where data resides, but also who can reach it and from where.

Encryption and key management: the part that actually matters

When residency concerns emerge, encryption is often the first solution people reach for. Encrypting data at rest and in transit protects it from unauthorized access, but it doesn’t change where the data is stored. Encrypted data sitting in Virginia still resides in Virginia. Encrypted backups replicated to another country are still located in that country.

The more important question is who controls the keys. With provider-managed encryption, the cloud platform generates and controls the key material. The customer benefits from strong encryption, but the provider remains part of the trust chain.

Customer-managed keys improve the situation, but the keys often still reside within the provider’s key management infrastructure.

For highly regulated workloads, organizations increasingly move toward external key management models. Bring Your Own Key (BYOK) provides additional control, but Hold Your Own Key (HYOK) and external key management systems go further by keeping key material entirely outside the cloud provider’s environment. In these architectures, the provider stores encrypted data but doesn’t possess the keys required to decrypt it independently – which sounds simple until you try to explain it to a support engineer at 2 a.m. why they can’t restore your backup.

This distinction can change the sovereignty picture. Or at least it changes how auditors look at it, which is basically the same thing.

When we evaluate sovereign cloud architectures, national cloud providers, or highly regulated deployment models, who holds the keys is often the first thing architects investigate and the last thing auditors stop asking about.

So if data location determines where the data sits, key control often determines who can actually use it.

Architecture options for stricter requirements

Eventually, some organizations discover that standard cloud residency controls aren’t enough.

The region is correct, the contracts are in place, and encryption is configured properly. Yet the legal, regulatory, or customer requirements still aren’t fully satisfied.

Most architecture discussions then converge on two options: sovereign cloud, or infrastructure under direct organizational control.

Sovereign cloud

The major hyperscalers are well aware of the sovereignty problem.

Over the last few years, Microsoft, AWS, and Google have all introduced offerings designed for workloads that require more than regional data residency. The goal is to provide additional operational and legal separation between customer workloads and the broader hyperscale platform.

In Europe, that has led to initiatives such as Microsoft Cloud for Sovereignty, AWS European Sovereign Cloud, and Google’s Sovereign Controls framework. Additional partnerships have emerged as well. Projects such as Bleu in France and S3NS (a Thales-Google joint venture) combine hyperscale technology with locally controlled operating entities in an effort to address sovereignty concerns that standard public cloud deployments cannot fully solve.

“Sovereign cloud” is not a single architecture. Different providers solve different parts of the problem. Some focus on operational control or administrative isolation. Others emphasize local staffing requirements. Some concentrate on key ownership or jurisdictional separation. Others restrict support access. That’s why organizations evaluating sovereign cloud offerings should start with a simple question: which specific risk are we trying to eliminate?

We’ve found sovereign cloud can be the right answer. It just helps to know exactly which question you’re asking it to solve.

On-premises, HCI, and hybrid cloud

For some workloads, the simplest answer is still the most effective one: keep the data on infrastructure you directly control.

On-premises deployments remain common across government, healthcare, financial services, critical infrastructure, defense, and other heavily regulated sectors for exactly this reason. When data resides in a facility you operate, or in a facility operated under clearly defined contractual controls, many residency questions become easier to answer.

Where is the data? In that building. Who can access it? The people on this access list. Where are the backups? In these locations. The conversation becomes direct.

That doesn’t mean cloud adoption stops. Most organizations continue to use cloud services extensively. What changes is workload placement.

Data subject to strict residency, sovereignty, or localization requirements often remains on-premises, while less sensitive workloads move to public cloud environments where scalability and operational flexibility provide stronger business value.

This is the foundation of most hybrid cloud strategies: place each workload where its requirements are easiest to satisfy.

For organizations pursuing this model, several design principles tend to matter: full visibility into where every copy of the data resides; high availability that does not depend on cross-border replication; backup and recovery workflows that remain within approved jurisdictions; administrative access paths that can be controlled and audited; encryption and key management architectures aligned with sovereignty requirements.

This is where hyperconverged infrastructure frequently enters the conversation.

Platforms such as StarWind Virtual SAN allow organizations to build highly available shared storage using local server resources, keeping compute, storage, and availability services within a single operational environment. For teams that prefer a preconfigured deployment model, StarWind HCI Appliance provides the same architectural approach in an integrated platform.

The fewer external dependencies involved in storing, replicating, and recovering data, the easier it becomes to understand exactly where that data lives and who can reach it.

How to build a residency strategy

We’ve used a checklist across most environments. Seven steps. Or six. Finishing them matters more than counting them.

Catalog the data. Residency can’t be enforced on data that hasn’t been inventoried. Begin with the systems your infrastructure team actually manages. Databases, file shares, backups, SaaS platforms, analytics environments, object storage, collaboration systems, and AI platforms are usually enough to expose the largest risks.
Classify it. Personal, financial, healthcare, public-sector, customer-confidential, logs, metadata. The category determines the applicable rules.
Map the flows. Production, replication, backup, DR, monitoring, support access, AI features, and third-party integrations. The support and AI flows are typically the least documented.
Check residency per copy. Primary, replicas, snapshots, logs, backups. These rarely share a default region.
Pin residency in contracts. Whenever possible, require vendors to identify specific regions and operational controls. “Stored within the EU” sounds precise until a provider reorganizes infrastructure, acquires another company, or changes service architecture. Specific commitments tend to age better than vague ones.
Decide on key management architecture. Customer-managed, BYOK, or HYOK, with a documented rationale tied to each workload class.
Test the DR path by region. Run a restore exercise and confirm that the recovered data lands in the region specified in the customer contract. Residency gaps discovered during an audit are substantially more expensive than gaps discovered during a drill.

Questions to ask any cloud or SaaS vendor

Before we sign a contract, we get a handful of questions answered in writing. Not because vendors are necessarily trying to hide anything, but because assumptions have a habit of becoming problems later.

Start with the basics. Where are primary workloads stored? Where are backups stored? Where are logs, metadata, and telemetry stored? Where do disaster recovery copies reside?

Then move into the sovereignty discussion. Which legal jurisdiction governs the provider? Is the parent company subject to laws with extraterritorial reach, such as the CLOUD Act? Which support personnel can access customer data and from where?

For AI-enabled products, ask additional questions. Where is inference performed? Are prompts retained? Are prompts or customer data used for model training? Where are embeddings and vector indexes stored? Can AI processing be restricted to specific regions, or does the model endpoint move based on load?

Finally, ask about encryption. Are customer-managed keys supported? Is BYOK available? Is external key management supported? Which layers of the platform can be protected with customer-controlled keys?

The answers determine whether the vendor can realistically support the commitments you’ve already made to your own customers.

Conclusion

If you’re still treating residency as a checkbox you fill after the architecture is built, you’re doing it backwards. The teams that avoid emergency migrations are the ones that map data flows before they sign vendor contracts, not after.

Start with the systems that touch your production data but aren’t the production database itself. Backups, logs, AI features, and SaaS integrations are where residency actually breaks. Fix those first.

In 2026, the question won’t be whether you know where your database lives. Everyone knows that. The question will be whether you can prove where your embeddings and inference logs ended up, or whether your third-party support tickets stayed in-region. My bet is that within eighteen months, localization rules will start showing up as engineering tickets rather than legal footnotes. The organizations that treat them as infrastructure constraints now will be the ones that don’t have to rip out their stack later.

FAQ

What is the difference between data residency and data sovereignty?

Residency describes the physical storage location of data (a geographical fact). Sovereignty describes the legal authority that applies to it based on national borders and corporate ownership. They do not always match: a server in Frankfurt is under EU jurisdiction, but if owned by a US parent company, it may still be reached via the US CLOUD Act.

Does choosing a cloud region solve data residency?

Only for the primary database. Backups, logs, metadata, analytics extracts, AI inference paths, and customer support mirroring do not automatically inherit that region and must be verified independently.

Is data residency the same as data localization?

No. Residency can be a business preference or a standard contractual clause. Localization is a strict legal mandate dictating that specific categories of data cannot cross national borders under any circumstances.

How do AI features change residency requirements?

They introduce unmonitored data paths. Prompts, embeddings, and inference logs are often processed outside the tenant region, triggering direct compliance risks under both GDPR and the EU AI Act unless explicitly engineered around.

What is the difference between BYOK and HYOK?

BYOK (Bring Your Own Key) stores key material inside the cloud provider’s environment, meaning the provider retains theoretical access. HYOK (Hold Your Own Key) keeps key material entirely outside the cloud provider on customer-controlled infrastructure, preventing the provider from decrypting data under foreign legal compulsion.

Can on-premises infrastructure solve residency?

Yes. On-premises and hyperconverged infrastructure (HCI) offer absolute control over data location, administrative access paths, and recovery loops, eliminating the legal ambiguities inherent in global public cloud architectures.