Exploring Gemini Pro: A Comprehensive Guide to Using Multi-Modal Language Models

- June 03, 2024

Hey everyone,

In this post, we'll dive into the powerful capabilities of Gemini Pro, a multi-modal language model that supports both text and image inputs. We'll explore how you can leverage this model to create end-to-end applications for various tasks, including text summarization, Q&A, and image analysis. Let's get started!

Introduction to Gemini Pro

Gemini Pro is a multi-modal language that can process both text and images, making it versatile for a wide range of applications. Whether you want to create a text summarization tool, a Q&A system, or even perform image analysis, Gemini Pro has got you covered. The best part? It's currently available for free, allowing you to make up to 60 queries per minute.

Setting Up Your Environment

To get started, you'll need to create an API key. Once you have your API key, follow these steps to set up environment:

Step 1: Create a Virtual Environment

First, create a virtual environment with Python 3.10 to ensure compatibility with Gemini Pro.

conda create -n venv python=3.10
conda activate venv

Step 2: Create a Requirements File

Create a requirements.txt file with the following content:

streamlit
google-generativeai
python-dotenv

Step 3: Install Required Libraries

Install the required libraries using the following command:

pip install -r requirements.txt

Step 4: Set Up Environment Variables

Create a .env file in your project directory and add your API key:

GOOGLE_API_KEY=your_api_key_here

Building a Text-Based Application

Let's start by building a simple text-based application using Streamlit and Gemini Pro.

Step 1: Create the App File

Create a file named app.py and add the following code:

import streamlit as st
import os
import google.generativeai as genai
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Configure the API key
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Function to get Gemini Pro response
def get_gemini_response(question):
    model = genai.GenerativeModel("gemini-pro")
    response = model.generate_content(question)
    return response.text

# Streamlit app setup
st.set_page_config(page_title="Gemini Pro Text Application")
st.header("Gemini Pro LM Application")

input_text = st.text_input("Ask a question:")
submit = st.button("Submit")

if submit:
    response = get_gemini_response(input_text)
    st.subheader("Response:")
    st.write(response)

Step 2: Run the App

Run the Streamlit app using the following command:

streamlit run app.py

You can now interact with Gemini Pro by asking questions and receiving responses.

Building an Image-Based Application

Next, let's build an application that analyzes images using Gemini Pro Vision.

Step 1: Create the Vision App File

Create a file named vision.py and add the following code:

import streamlit as st
import os
from PIL import Image
import google.generativeai as genai
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Configure the API key
genai.configure(api_key=os.getenv("GOOGLE_API_KEY"))

# Function to get Gemini Pro Vision response
def get_gemini_response(image, input_text=None):
    model = genai.GenerativeModel("gemini-pro-vision")
    if input_text:
        response = model.generate_content([input_text, image])
    else:
        response = model.generate_content(image)
    return response.text

# Streamlit app setup
st.set_page_config(page_title="Gemini Pro Image Application")
st.header("Gemini Pro Vision Application")

input_text = st.text_input("Enter a description or leave blank for auto-analysis:")
uploaded_file = st.file_uploader("Choose an image...", type=["jpg", "png"])

if uploaded_file is not None:
    image = Image.open(uploaded_file)
    st.image(image, caption='Uploaded Image.', use_column_width=True)

    if st.button("Analyze Image"):
        response = get_gemini_response(image, input_text)
        st.subheader("Response:")
        st.write(response)

Step 2: Run the Vision App

Run the Streamlit app using the following command:

streamlit run vision.py

You can now upload images and analyze them using Gemini Pro Vision.

Conclusion

Gemini Pro is a powerful multi-modal language model that can handle both text and image inputs. By following this guide, you can create versatile applications that leverage the capabilities of Gemini Pro for various tasks. Stay tuned for more tutorials where we will explore advanced use cases and dive deeper into the world of generative AI.

Thank you for reading, and happy coding!

Revanth Tech Trends