Keysight Introduces AI Data Center Builder to Validate and Optimize Network Architecture and Host ..
Keysight Technologies,
Inc. introduces Keysight AI (KAI) Data Center Builder, an advanced software
suite that emulates real-world workloads to evaluate how new algorithms,
components, and protocols impact the performance of AI training. KAI Data
Center Builder’s workload emulation capability integrates large language model
(LLM) and other artificial intelligence (AI) model training workloads into the
design and validation of AI infrastructure components – networks, hosts, and
accelerators. This solution enables tighter synergy between hardware design,
protocols, architectures, and AI training algorithms, boosting system
performance.
AI operators
use various parallel processing strategies, also known as model partitioning,
to accelerate AI model training. Aligning model partitioning with AI cluster topology
and configuration enhances training performance. During the AI cluster design
phase, critical questions are best answered through experimentation. Many of
the questions focus on data movement efficiency between the graphics processing
units (GPUs). Key considerations include:
Scale-up
design of GPU interconnects inside an AI host or rack
Scale-out
network design, including bandwidth per GPU and topology
Configuration
of network load balancing and congestion control
Tuning of the
training framework parameters
The KAI Data
Center Builder workload emulation solution reproduces network communication
patterns of real-world AI training jobs to accelerate experimentation, reduce
the learning curve necessary for proficiency, and provide deeper insights into
the cause of performance degradation, which is challenging to achieve through
real AI training jobs alone. Keysight customers can access a library of LLM
workloads like GPT and Llama, with a selection of popular model partitioning
schemas like Data Parallel (DP), Fully Sharded Data Parallel (FSDP), and
three-dimensional (3D) parallelism.
Using the
workload emulation application in the KAI Data Center Builder enables AI
operators to:
Experiment
with parallelism parameters, including partition sizes and their distribution
over the available AI infrastructure (scheduling)
Understand
the impact of communications within and among partitions on overall job
completion time (JCT)
Identify
low-performing collective operations and drill down to identify bottlenecks
Analyze
network utilization, tail latency, and congestion to understand the impact they
have on JCT
The KAI Data
Center Builder's new workload emulation capabilities enable AI operators, GPU
cloud providers, and infrastructure vendors to bring realistic AI workloads
into their lab setups to validate the evolving designs of AI clusters and new
components. They can also experiment to fine-tune model partitioning schemas,
parameters, and algorithms to optimize the infrastructure and improve AI
workload performance.
Ram
Periakaruppan, Vice President and General Manager, Network Test & Security
Solutions, Keysight, said: "As AI infrastructure grows in scale and
complexity, the need for full-stack validation and optimization becomes
crucial. To avoid costly delays and rework, it's essential to shift validation
to earlier phases of the design and manufacturing cycle. KAI Data Center
Builder’s workload emulation brings a new level of realism to AI component and
system design, optimizing workloads for peak performance.”
KAI Data
Center Builder is the foundation of the Keysight Artificial Intelligence (KAI)
architecture, a portfolio of end-to-end solutions designed to help customers
scale artificial intelligence processing capacity in data centers by validating
AI cluster components using real-world AI workload emulation.
Leave A Comment