Exploring the Powerful Features of NVIDIA NeMo for Generative AI

Hey everyone,

In this post, we'll delve into the powerful capabilities of NVIDIA NeMo, a groundbreaking development in generative AI. NVIDIA NeMo is a set of inference microservices designed for deploying AI models, revolutionizing how we deploy generative AI in enterprises. Whether you're working with large language models (LLMs) or multi-modal models, NeMo offers a robust platform for seamless integration and high scalability. Let's explore its features and see it in action through some coding examples.

Introduction to NVIDIA NeMo

NVIDIA NeMo is a versatile platform that supports various AI models, including LLMs and multi-modal models. It also provides access to NVIDIA AI Foundation models. With NeMo, you can easily integrate these models into your applications using APIs, making it highly scalable and efficient for enterprise use.

Key Features:

Inference Microservices: Deploy AI models with ease.
Multi-Modal Models: Supports both text and image inputs.
High Scalability: Optimized for enterprise applications.
NVIDIA AI Foundation Models: Access to powerful pre-trained models.

Getting Started with NVIDIA NeMo

To get started, follow these steps to set up your environment and explore the capabilities of NVIDIA NeMo.

Step 1: Create an API Key

First, create an API key by visiting maker.google.com/app/api_key. This key will authenticate your access to the NVIDIA AI Foundation endpoint.

Step 2: Set Up the Environment

Create a virtual environment and install the required libraries.

conda create -n venv python=3.10
conda activate venv

Create a requirements.txt file with the following content:

openai
langchain_nvidia_ai_endpoints
langchain_community
faiss-cpu
python-dotenv
streamlit
pypdf
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

Install the required libraries:

pip install -r requirements.txt

Set up your API key in a .env file:

NVIDIA_API_KEY=your_api_key_here

Example 1: Using NVIDIA NeMo for Text Generation

Let's start with a simple text generation example using NVIDIA NeMo.

Step 1: Create the `app.py` File

Create a file named app.py and add the following code:

from openai import OpenAI
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Configure the API key
client = OpenAI(
  base_url="https://integrate.api.nvidia.com/v1",
  api_key=os.getenv("NVIDIA_API_KEY")
)

# Create a chat completion
completion = client.chat.completions.create(
  model="meta/llama3-70b-instruct",
  messages=[{"role": "user", "content": "hello"}],
  temperature=0.5,
  top_p=1,
  max_tokens=1024,
  stream=True
)

# Print the response
for chunk in completion:
  if chunk.choices[0].delta.content is not None:
    print(chunk.choices[0].delta.content, end="")

Step-by-Step Explanation

Import Libraries: We import the necessary libraries, including openai for making API calls, os for accessing environment variables, and dotenv for loading the environment variables from a .env file.
Load Environment Variables: We load the environment variables using load_dotenv(), which reads the .env file and sets the variables in the environment.
Configure the API Key: We set up the client with the API key from the environment variables and specify the base URL for NVIDIA NeMo.
Create the Chat Completion: We define a function client.chat.completions.create that takes a prompt as input and returns the response from the NeMo API. We specify parameters like max_tokens, temperature, and top_p to control the output.
Print the Response: We call the function with a prompt ("hello") and print the response in chunks.

Step 2: Run the App

Run the script using the following command:

python app.py

You should see the generated text output in the console.

Example 2: Building an End-to-End Application

Next, let's build an end-to-end application that combines text and image processing using NVIDIA NeMo.

Step 1: Create the `final_app.py` File

Create a file named final_app.py and add the following code:

import streamlit as st
import os
from langchain_nvidia_ai_endpoints import NVIDIAEmbeddings, ChatNVIDIA
from langchain_community.document_loaders import WebBaseLoader, PyPDFDirectoryLoader
from langchain.embeddings import OllamaEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.chains.combine_documents import create_stuff_documents_chain
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain.chains import create_retrieval_chain
from langchain_community.vectorstores import FAISS
from dotenv import load_dotenv
import time

# Load environment variables
load_dotenv()

# Load the NVIDIA API key
os.environ['NVIDIA_API_KEY'] = os.getenv("NVIDIA_API_KEY")

# Function to create vector embeddings
def vector_embedding():
    if "vectors" not in st.session_state:
        st.session_state.embeddings = NVIDIAEmbeddings()
        st.session_state.loader = PyPDFDirectoryLoader("./us_census")  # Data Ingestion
        st.session_state.docs = st.session_state.loader.load()  # Document Loading
        st.session_state.text_splitter = RecursiveCharacterTextSplitter(chunk_size=700, chunk_overlap=50)  # Chunk Creation
        st.session_state.final_documents = st.session_state.text_splitter.split_documents(st.session_state.docs[:30])  # Splitting
        st.session_state.vectors = FAISS.from_documents(st.session_state.final_documents, st.session_state.embeddings)  # Vector OpenAI embeddings

# Streamlit app setup
st.title("Nvidia NIM Demo")
llm = ChatNVIDIA(model="meta/llama3-70b-instruct")

prompt = ChatPromptTemplate.from_template(
"""
Answer the questions based on the provided context only.
Please provide the most accurate response based on the question.
<context>
{context}
<context>
Questions: {input}
"""
)

prompt1 = st.text_input("Enter Your Question From Documents")

if st.button("Documents Embedding"):
    vector_embedding()
    st.write("Vector Store DB Is Ready")

if prompt1:
    document_chain = create_stuff_documents_chain(llm, prompt)
    retriever = st.session_state.vectors.as_retriever()
    retrieval_chain = create_retrieval_chain(retriever, document_chain)
    start = time.process_time()
    response = retrieval_chain.invoke({'input': prompt1})
    st.write(f"Response time: {time.process_time() - start}")
    st.write(response['answer'])

    # With a Streamlit expander
    with st.expander("Document Similarity Search"):
        # Find the relevant chunks
        for i, doc in enumerate(response["context"]):
            st.write(doc.page_content)
            st.write("--------------------------------")

Step-by-Step Explanation

Import Libraries: We import the necessary libraries, including streamlit for the web interface, os for accessing environment variables, and dotenv for loading environment variables.
Load Environment Variables: We load the environment variables using load_dotenv().
Load the API Key: We set the NVIDIA_API_KEY from the environment variables.
Define the Function: We define a function vector_embedding() that loads PDF documents, splits them into smaller chunks, creates vector embeddings, and returns the vectors.
Streamlit App Setup: We set up the Streamlit app with a header and buttons. When the "Documents Embedding" button is clicked, the vector_embedding() function is called and the embeddings are created. When a question is entered and the "Submit" button is clicked, the response is displayed, and the document similarity search is shown.

Step 2: Run the App

Run the Streamlit app using the following command:

streamlit run final_app.py

You can now interact with the app to create vector embeddings and get responses to your queries.

Conclusion

NVIDIA NeMo is a powerful platform for deploying generative AI models, offering high scalability and seamless integration. By following this guide, you can explore the capabilities of NeMo and build versatile applications for various tasks. Stay tuned for more tutorials and examples as we continue to explore the world of generative AI.

Thank you for reading, and happy coding!

Revanth Tech Trends