SmartNIC - New Network Interface Controller

At the last VMworld 2020, during the first keynote, Pat Gelsinger has announced the new project Monterey. This project is based on a new type of network interface controller (NIC) called SmartNIC that can totally change (or at least improve) the future of computing, storage, networking, and security.

What is a SmartNIC?

SmartNIC is a NIC that not only accelerates networking functions (most existing NICs already have those functions), but it’s also adaptable to different use cases like security, virtualization, storage, load balancing, and data path optimization.

Basically, it’s a NIC with a general-purpose CPU (and/or FPGA), out-of-band management, and virtualized device functionality:

From https://blogs.vmware.com/vsphere/2020/09/announcing-project-monterey-redefining-hybrid-cloud-architecture.html

SmartNIC is an evolution of existing NIC to include DPU (Data Processing Unit) capabilities with different approaches: by using ASIC, FPGA, or System-on-a-Chip (SOC).

From https://blog.mellanox.com/2018/08/defining-smartnic/

For more information on what is a SmartNIC see also those vendor’s pages:

Why SmartNIC?

SmartNIC can really accelerate a lot of different workloads compared to traditional NICs or also existing NICs (that have already some intelligent functions for specific offload use cases):

From https://blog.mellanox.com/2018/08/defining-smartnic/

By offloading a different workload to the SmartNIC, more of the main server resources will be available for other workload and this is the first evident pro of this solution.

But we are just at the first step, and SmartNIC SmartNIC can really change future hardware architecture design and can be the catalyst to accelerate the digital transformation.

Can also be used as a foundation to implement other solutions, like for example Composable Infrastructure.

SmartNIC and computing

As written before, there are ASIC, FPGA or SOC based SmartNIC and, for generic code, the SOC based is the most interesting and flexible.

But still, they are not exactly general-purpose CPU for generic code.

For example, the NVIDIA BlueField-2® data processing unit (DPU) has up to 8 ARMv8 A72 cores (64-bit) pipeline with 8 GB / 16 GB on-board DDR4 (ECC). A good amount of resources, but not a fully generic CPU or at least not compatible with Intel/AMD x64 processors.

That requires specific ARM compiled code and means that existing application and workload may not offloadable on SmartNICs (also if SOC based). And for this reason, is not possible to increase the computing power of a server with the computational resources of a SmartNIC, but just run a specific code on them instead of the main CPU.

Anyway, there is already a lot of OSes and programs ported on the ARM platform.

SmartNIC and storage

Funny but the idea to bring computational capabilities into a specific device comes from the storage world.

Introduced in 2013, the Seagate Kinetic Open Storage platform was an interesting idea to combining an open source object storage protocol with Ethernet connectivity into a single HDD.

It has garnered industry support from AOL, Digital Sense, and Hewlett Packard, which have collaborated to support the open Kinetic API and helped to enable this new storage technology to come to market.

But the project seems almost dead since 2016 and replaced by another different project.

Computational Storage Services (CSS) is a solution, coupled to the storage, designed to offloading host processing, or reducing data movement.

Of course, SmartNICs are not directly related to CSS: SmartNIC is coupled to the network part, CSS is coupled to the storage part!

But SmartNIC can still be useful for some data processing (like compression, deduplication, redundancy and/or encryption) at host side.

And SmartNICs can help to virtualize software-defined storage (SDS), hyperconverged infrastructure (HCI), and other cloud resources.

SmartNIC and networking

SmartNIC bring computing to the network side and this makes possible to add more protocols, new virtual functions, offload the network stack, and so on.

Also network virtualization (by offloading VXLAN, NVGRE, or Geneve protocols) or even virtual switches can use SmartNIC, for example, to provide a programmable data path for virtual switch acceleration.

The following table provides some example of interesting networking functions provided by SmartNICs:

From https://blog.mellanox.com/2018/09/why-you-need-dpu-based-smart-nic-use-cases/

Software Definited Network (SDN) and by extension Network functions virtualization (NFV) are the perfect use cased for SmartNIC.

SmartNIC and security

SmartNIC can offload security features directly in the network path and sometimes also in the data part.

Can easily protect all the data in motion (or in transit) with encryption and other security control.

But can also help to secure the data at rest on the host side.

And can provide, as written before, several network functions like packet inspection and flow table processing useful to increase network security.

But what can be really interesting is the security airgap between the host OS and the SmartNIC OS that can be used to physically secure some functions, making them not attackable from the host OS side.

Project Monterey

VMware’s Project Monterey is a redesign and rethinking of VCF to take advantage of these disruptive hardware capabilities.

The idea is moving and offload some functionality that is used to run on the main server’s CPU directly to the SmartNIC CPU.

For example, the vSAN data management engine, or part of NSX engine and functions or the ESXi host management code:

From https://blogs.vmware.com/vsphere/2020/09/announcing-project-monterey-redefining-hybrid-cloud-architecture.html

The most interesting points are the performance and security aspects.

By running the storage and network services on the SmartNIC, not only storage and network I/O performance will be improved, but also will be reduced the pressure on the core CPU, leaving more cycles for the virtualized workloads.

And by bring the host management layer into the SmartNIC, ESXi will now manage the x86 ESXi host. This provide better security but also better manageability, because it allows to improve the Lifecycle Manager (LCM) for example to manage also the host firmware update process in a better way.

And most important in the truly security airgap: having an ESXi instance on the SmartNIC provides greater defense-in-depth as written before.

Of course, this means that there are two different ESXi instances running simultaneously, one on the main x86 CPU and one on the SmartNIC. These two ESXi instances can be managed separately or as a single logical instance. But what about licensing? Will be the second ESXi free for charge? Or at least discounted?

This will be one of the aspects to be considered, including that happens with more SmartNIC, for example, to design a more resilient infrastructure.

But the idea is cool and will be interesting to see its evolution.

SmartNIC and the future of computing, storage, networking and security