Customers have a lot of options when it comes to building their generative AI stacks to train, fine-tune, and run AI models. In some cases, the number of options may be overwhelming. To help simplify the decision-making and reduce that all-important time it takes to train your first model, Nvidia offers DGX Cloud, which arrived on AWS last week.
Nvidia’s DGX systems are considered the gold standard for GenAI workloads, including training large language models (LLMs), fine-tuning them, and running inference workloads in production. The DGX systems are equipped with the latest GPUs, including Nvidia H100 and H200s, as well as the company’s enterprise AI stack, like Nvidia Inference Microservices (NIMs), Riva, NeMo, and RAPIDS frameworks, among other tools.
With its DGX Cloud offering, Nvidia is giving customers the array of GenAI development and production capabilities that come with DGX systems, but delivered via the cloud. It previously offered DGX Cloud on Microsoft Azure, Google Cloud, and Oracle Cloud Infrastructure, and last week at re:Invent 2024, it announced the availability of DGX Cloud on AWS.
“When you think about DGX Cloud, we’re offering a managed service that gets you the best of the best,” said Alexis Bjorlin, the vice president of DGX Cloud at Nvidia. “It’s more of an opinionated solution to optimize the AI performance and the pipelines.”
There’s a lot that goes into building a GenAI system beyond just requisitioning Nvidia GPUs, downloading Llama-3, and throwing some data at it. There are often additional steps, like data curation, fine tuning of a model, and synthetic data generation, that a customer must integrate into an end-to-end AI workflow and protect with guardrails, Bjorlin said. How much accuracy do you need? Do you need to shrink the models?
Nvidia has a significant amount of experience building these AI pipelines on a variety of different types of infrastructure, and it shares that experience with customers through its DGX Cloud service. That allows it to cut down on the complexity the customer is exposed to, thereby accelerating the GenAI development and deployment lifecycle, Bjorlin said.
“Getting up and running with time-to-first-train is a key metric,” Bjorlin told BigDATAwire in an interview last week at re:Invent. “How long does it take you to get up and fine tune a model and have a model that is your own customized model that you can then choose what you do with? That’s one of the metrics we hold ourselves accountable to: developer velocity.”
But the expertise extends beyond just getting that first training or fine-tuning workload up and running. With DGX Cloud, Nvidia can also provide expert assistance in some of the finer aspects of model development, such as optimizing the training routines, Bjorlin said.
“Sometimes we’re working with customers and they want more efficient training,” she said. “So they want to move from FP16 or BF16 to FP8. Maybe it’s the quantization of the data? How do you take and train a model and shard it across the infrastructure using four types of parallelism, whether it’s data parallel pipeline, model parallel, or expert parallel.
“We look at the model and we help architect…it to run on the infrastructure,” she continued. “All of this is fairly complex because you’re trying to do an overlap of both your compute and your comms and your memory timelines. So you’re trying to get the maximum efficiency. That’s why we’re offering outcome-based capabilities.”
With DGX Cloud running on AWS, Nvidia is supporting H100 GPUs running on EC2 P5 instances (in the future, it will be supported on the new P6 instances that AWS announced at the conference). That will give customers of all sizes the processing oomph to train, fine-tune, and run some of the largest LLMs.
AWS has a variety of types of customers using DGX Cloud. It has a few very large companies using it to train foundation models, and a larger number of smaller firms fine-tuning pre-trained models using their own data, Bjorlin said. Nvidia needs to maintain the flexibility to accommodate all of them.
“More and more people are consuming compute through the cloud. And we need to be experts at understanding that to continually optimize our silicon, our systems, our data center scale designs and our software stack,” she said.
One of the advantages of using DGX Cloud, besides the time-to-first train, is customers can get access to a DGX system with as little as a one-month commitment. That’s beneficial for AI startups, such as the members of Nvidia’s Inception program, who are still testing their AI ideas and perhaps aren’t ready to go into production.
Nvidia has 9,000 Inception partners, and having DGX Cloud available on AWS will help them succeed, Bjorlin said. “It’s a proving ground,” she said. “They get a lot of developers in a company saying, ‘I’m going to try out a few instances of DGX cloud on AWS.’”
“Nvidia is a very developer-centric company,” she added. “Developers around the world are coding and working on Nvidia systems, and so it’s an easy way for us to bring them in and have them build an AI application, and then they can go and serve on AWS.”
Related Items:
Nvidia Introduces New Blackwell GPU for Trillion-Parameter AI Models
NVIDIA Is Increasingly the Secret Sauce in AI Deployments, But You Still Need Experience
The Generative AI Future Is Now, Nvidia’s Huang Says