Microsoft has announced the launch of new Azure virtual machines (VMs) aimed specifically at enhancing cloud-based AI supercomputing capabilities.
The new H200 v5 series virtual machines are now generally available to Azure customers and will enable enterprises to address increasingly cumbersome AI workload demands.
By leveraging the new series of VMs, users can boost the inference and training capabilities of the base model, the tech giant revealed.
Scale, efficiency and performance
In a blog post, Microsoft said that a large number of customers and partners are already using the new series of virtual machines to boost artificial intelligence capabilities.
“The scale, efficiency and improved performance of our ND H200 v5 virtual machines are already driving customer adoption and adoption of Microsoft AI services, such as Azure Machine Learning and Azure OpenAI Service,” the company said .
Among them is OpenAI, according to Trevor Cai, head of infrastructure at OpenAI, which is taking advantage of the new series of VMs to drive research and development and fine-tune ChatGPT for users.
“We are excited to adopt Azure's new H200 virtual machines,” he said. “We've seen that H200 offers improved performance with minimal portability effort; we look forward to using these virtual machines to accelerate our research, improve the ChatGPT experience, and further our mission.”
Under the hood of the H200 v5 series
The Azure H200 v5 VMSs are designed with Microsoft's systems approach to “improve efficiency and performance,” the company said, and include eight Nvidia H200 Tensor Core GPUs.
Microsoft said this addresses a growing “gap” for business users regarding computing power.
As GPUs grow in raw computing capabilities at a faster rate than attached memory and memory bandwidth, this has created a bottleneck for AI inference and model training, the tech giant said.
“Azure ND H200 v5 series virtual machines offer a 76% increase in high-bandwidth memory (HBM) to 141GB and a 43% increase in HBM bandwidth to 4.8TB/ s compared to the previous generation of Azure ND H100 v5 virtual machines,” Microsoft said in its announcement.
“This increase in HBM bandwidth allows GPUs to access model parameters faster, helping to reduce overall application latency, which is a critical metric for real-time applications such as interactive agents. “.
In addition, the new VM series can also compensate for more complex large language models (LLM) within the memory of a single machine, the company said. Thus, this improves performance and allows users to avoid costly overheads when running distributed applications across multiple virtual machines.
Microsoft believes that better GPU memory management for model weights and batch sizes is also a key differentiator for the new series of VMs.
All current GPU memory limitations have a direct impact on the performance and latency of LLM-based inference workloads and create additional costs for businesses.
By leveraging greater HBM capacity, H200 v5 virtual machines are able to support larger batch sizes, which Microsoft says dramatically improves GPU utilization and performance compared to previous iterations.
“In early testing, we saw up to a 35% performance increase with ND H200 v5 virtual machines compared to the ND H100 v5 series for inference workloads running the LLAMA 3.1 405B model (with world size 8. input length 128, output length 8 and maximum length). lot sizes: 32 for H100 and 96 for H200), the company said.