NVIDIA AI Foundry: Creating custom Llama 3.1 generative AI models for global enterprises

Now, global enterprises can all use NVIDIA's AI Foundry service to build "super models", where companies can train these models using their own data or synthetic data. Currently, Accenture is the first to use this new service from NVIDIA to create custom models for clients

Author: Fang Jiayao

Source: Hard AI

Can companies customize their own "super models" and train generative AI applications that meet their specific needs?

On Tuesday, July 23, NVIDIA announced the launch of a new service, NVIDIA AI Foundry and NVIDIA NIM™ Inference Microservices.

Previously, Meta released the largest open-source AI model Llama 3.1, and NVIDIA AI Foundry will provide custom services for the Llama 3.1 model to global enterprises. NVIDIA and Meta are collaborating to enhance the generative AI capabilities of global enterprises.

Core Services and Features

1) NVIDIA AI Foundry

Enterprises and countries can use the Llama 3.1 model, along with NVIDIA's software, computing power, and expertise, to create custom "super models" for their specific industries. These models can be trained using the enterprise's proprietary data as well as synthetic data generated from Llama 3.1 405B and NVIDIA Nemotron™ reward models.

2) NVIDIA NIM Inference Microservices

The NIM Inference Microservices for the Llama 3.1 model are now available for download, significantly improving the model's inference efficiency by 2.5 times compared to not using NIM. Enterprises can combine the Llama 3.1 NIM microservices with NVIDIA NeMo Retriever NIM microservices to create state-of-the-art retrieval pipelines for AI assistants and digital human avatars.

These services are supported by the NVIDIA DGX™ Cloud AI platform, developed in collaboration with leading global public cloud service providers, providing enterprises with powerful computing resources and the ability to scale on-demand based on AI requirements.

Llama 3.1 is a series of generative AI models publicly provided by Meta. These models are open source and can be used by enterprises and developers to build advanced generative AI applications. The Llama 3.1 models include scales of 80 billion, 700 billion, and 4050 billion parameters, trained on over 16,000 NVIDIA H100 Tensor Core GPUs, optimized for data centers, the cloud, and local devices.

Furthermore, enterprises can pair the Llama 3.1 NIM microservices with the new NVIDIA NeMo Retriever NIM microservices to help build cutting-edge retrieval pipelines for various AI application scenarios, such as AI co-pilots, intelligent assistants, and digital human avatars.

By combining the Llama 3.1 NIM microservices and NVIDIA NeMo Retriever NIM microservices, enterprises can significantly enhance the deployment and operational efficiency of the Llama 3.1 model in production environments.

Meeting the AI Needs of Enterprises and Countries

Currently, many companies and countries are looking to customize large-scale language models to build generative AI applications with domain-specific knowledge and localization features. For example, medical companies need AI models to understand medical terms and practices, while financial companies require AI models with expertise in the financial field.

Companies in sectors such as healthcare, energy, financial services, retail, transportation, and telecommunications have started using NVIDIA NIM microservices to support Llama. The first companies to use the new NIM microservices with Llama 3.1 include Aramco, AT&T, and Uber.

NVIDIA's founder and CEO Jensen Huang stated:

"The release of Meta's Llama 3.1 model is a crucial moment for global enterprises adopting generative AI. Llama 3.1 opens the door for every company and industry to create the most advanced generative AI applications. NVIDIA's AI Foundry fully integrates Llama 3.1, enabling companies to build and deploy customized Llama supermodels at any time."

Meta's founder and CEO Mark Zuckerberg said:

"The new Llama 3.1 model is an extremely important step for open-source AI. With NVIDIA's AI Foundry, companies can easily create and customize the most advanced AI services people want, and deploy them through NVIDIA NIM. I am excited to put this into everyone's hands."

Success Stories of Early Adopters

Global professional services company Accenture was the first to adopt NVIDIA AI Foundry, using its AI Refinery™ framework to build customized Llama 3.1 models for internal use and client services. Julie Sweet, Chairman and CEO of Accenture, said:

"Generative AI is transforming industries, and companies are eager to deploy applications driven by custom models. With NVIDIA AI Foundry, we can help clients quickly create and deploy customized Llama 3.1 models, driving transformative AI applications."

Comprehensive Support from NVIDIA AI Foundry

1) End-to-End Services and Partnerships

NVIDIA AI Foundry is a comprehensive service platform that provides companies with the ability to quickly build AI models by integrating NVIDIA's technical resources and the power of the open community.

Companies can choose or customize the Llama 3.1 model using this service and develop their AI models using the NVIDIA NeMo platform and the top-notch Nemotron-4 340B model (ranked first on the Hugging Face RewardBench).

After development, companies can further create NIM inference microservices, allowing them to deploy and run these AI models on various cloud platforms and hardware systems to support their business operations After a company creates custom models, it can utilize the NVIDIA NIM inference microservice to deploy and run these AI models on its preferred cloud platform and NVIDIA A-certified system. NVIDIA provides expert support and a partner ecosystem to help companies accelerate the entire process of AI model development to actual deployment.

2) NVIDIA Neutron supports advanced model customization

Companies requiring additional training data can use Llama 3.1 405B and Nemotron-4 340B in combination to generate synthetic data, improving the accuracy of models in specific domains. Customers with their own training data can further enhance model accuracy by using NVIDIA NeMo for domain adaptive pre-training (DAPT) on the Llama 3.1 model.

The collaboration between NVIDIA and Meta offers a way for developers to create smaller, more efficient Llama 3.1 models that can be deployed on various devices, including AI workstations and laptops.

3) NeMo Retriever microservice for improved retrieval accuracy

By using the new NVIDIA NeMo Retriever NIM inference microservice for retrieval-augmented generation (RAG), organizations can enhance response accuracy when deploying customized Llama super models in production. This microservice provides the highest retrieval accuracy for open and commercial text question-answering.

4) Extensive enterprise ecosystem support

NVIDIA NIM partners can integrate new microservices into their AI solutions, providing generative AI enhancements to over 5 million developers and 19,000 startups. NVIDIA AI Enterprise offers production support for Llama 3.1 NIM and NeMo Retriever NIM microservices. Members of the NVIDIA Developer Program will soon have free access to NIM microservices for research, development, and testing on their preferred infrastructure