NVIDIA Quantum InfiniBand Automates Security For 10K GPUs
NVIDIA is automating security for Quantum InfiniBand clusters scaling to 10,000 GPUs. Using intent-based profiles within the Unified Fabric Manager (UFM), the company has reduced the deployment of critical security features—like PKey isolation and GUID-based access control—from days to minutes, according to NVIDIA.
How do intent-based profiles accelerate cluster security?
NVIDIA’s new intent-based security profiles automate the configuration of the Subnet Manager, a process that previously required manual, multi-step setups. Administrators can now select from three primary profiles—General, Bare Metal Cloud, and Secured Bare Metal Cloud—to deploy security measures with a single click, NVIDIA states.

This shift targets the complexity of high-performance computing (HPC) and artificial intelligence workloads. By automating Partition Key (PKey) isolation and Management Datagram (MAD) key protection, NVIDIA claims deployment and testing times have dropped from hours or days to mere minutes.
What makes PKey-based isolation different from traditional VLANs?
The Bare Metal Cloud profile utilizes PKey-based isolation to separate tenants within a shared physical fabric. While NVIDIA notes this functions similarly to VLANs in Ethernet, PKey isolation is enforced at the hardware level. This prevents compromised host-side software from circumventing security boundaries.
According to NVIDIA, the Subnet Manager centrally manages all partition assignments. Unlike some networking protocols, nodes cannot self-assign partitions, and applications cannot specify their own partition usage. This centralized control ensures that tenants remain logically and cryptographically separated.
The security of these port attributes is further hardened via the Management Key, which NVIDIA says is accessible only to the Subnet Manager and the InfiniBand silicon.
How does the Secured Bare Metal Cloud profile enhance protection?
For environments facing higher risk, such as hyperscale cloud computing or agentic AI workloads, NVIDIA provides the Secured Bare Metal Cloud profile. This version adds a comprehensive suite of protections beyond basic PKey isolation.
The profile includes full Management Datagram (MAD) key protection using randomized seeds for multiple key types, including MKEY, VSKEY, and PMKEY. It also implements GUID-based access control via the allowed_guid_list feature, which restricts which devices can communicate on the network.
To defend against denial-of-service (DoS) attacks, NVIDIA integrated MAD rate limiting and source-based rate limiting. These features proactively throttle traffic to prevent network saturation, according to the company.
Comparison: Manual vs. Automated Configuration
| Feature | Manual UFM/SM Setup | Intent-Based Profiles |
|---|---|---|
| Deployment Time | Hours to Days | Minutes |
| Configuration | Multi-step, manual | Single-click automation |
| Scalability | Requires specialized expertise | Zero-touch scaling for 100s of nodes |
How does Continuous Security Verification (CSV) maintain network health?
Beyond initial setup, NVIDIA uses Continuous Security Verification (CSV) to monitor the fabric’s security posture. Integrated into the UFM, CSV performs static analysis and log-based auditing to identify vulnerabilities in real-time, rather than relying on one-time checks.

The system generates a “Security Health Score,” which provides administrators with a snapshot of their network’s integrity. If a vulnerability is found, CSV offers automated remediation guidance to fix the issue, according to NVIDIA.
Users can adjust the verbosity of these reports, ranging from critical errors to informational messages. This granularity is designed for complex environments where tens of thousands of GPUs are interconnected and minor misconfigurations could lead to significant outages.
Frequently Asked Questions
What is the primary benefit of NVIDIA’s intent-based profiles?
They reduce the time required to configure InfiniBand security from days to minutes by automating the Subnet Manager settings.
Can a node change its own partition in a Bare Metal Cloud profile?
No. Partition assignment is managed centrally by the Subnet Manager; nodes cannot self-assign partitions.
What is a Security Health Score?
It is a metric produced by the Continuous Security Verification (CSV) tool that evaluates the current security posture of an InfiniBand deployment based on logs and static analysis.
How does NVIDIA prevent DoS attacks in these clusters?
Through the Secured Bare Metal Cloud profile, which implements MAD rate limiting and source-based rate limiting.
Want to secure your AI infrastructure? Share your thoughts on automated fabric management in the comments below or subscribe to our newsletter for more updates on GPU cluster optimization.