Utilizing MLflow for Generative AI Applications

- June 03, 2024

Hello everyone!

In this blog post, we will explore how to leverage MLflow with generative AI models, specifically focusing on large language models (LLMs). MLflow is an open-source platform that helps manage the end-to-end lifecycle of machine learning projects, including experiment tracking, visualization, evaluation, model registry, and serving. We'll also discuss how to integrate DagsHub for remote repository tracking. By the end of this post, you'll understand how to work with generative AI applications using MLflow and track various performance metrics.

What is MLflow?

MLflow is a robust platform designed to manage the entire lifecycle of machine learning projects. It offers several key features:

Experiment Tracking: Track and compare different models and their performance metrics.
Model Registry: Register and version your models.
Model Serving: Deploy models as REST APIs.
Visualization and Evaluation: Visualize and evaluate model performance.

Setting Up Your Environment

First, let's set up our environment and install the necessary libraries. We'll use Google Colab for this demonstration, but you can also run it locally.

Step 1: Install Required Libraries

Create a requirements.txt file with the following content:

langchain==0.1.16
langchain_community==0.0.33
langchain_openai==0.0.8
openai==1.12.0
tiktoken==0.6.0
mlflow==2.12.1
faiss-cpu
starlette
fastapi
watchfiles
slowapi
gunicorn
waitress
load_dotenv
dagshub

Then, install the required libraries using the following command:

pip install -r requirements.txt

Step 2: Import Libraries and Initialize Variables

Import the necessary libraries and initialize variables such as your API key and experiment name.

import mlflow
import openai
import pandas as pd
from dagshub import DAGsHubLogger

# Set your OpenAI API key
openai.api_key = 'YOUR_OPENAI_API_KEY'

# Initialize MLflow experiment
experiment_name = "LLM_Evaluation"
mlflow.set_experiment(experiment_name)

Step 3: Create Sample Test Data

Create a DataFrame with sample inputs and expected outputs to evaluate your LLM. For this example, we'll use simple questions and answers.

# Sample test data
sample_data = pd.DataFrame({
    'inputs': [
        'What is the capital of France?', 
        'Who wrote "To Kill a Mockingbird"?'
    ],
    'expected_outputs': [
        'The capital of France is Paris.', 
        '"To Kill a Mockingbird" was written by Harper Lee.'
    ]
})

Step 4: Define the Experiment

Define the experiment using MLflow's tracking capabilities. We'll evaluate the performance of the LLM based on various metrics.

with mlflow.start_run() as run:
    # Define system prompt and user prompt
    system_prompt = "Answer the following question in one sentence."
    
    # Initialize an empty list to store results
    results = []

    for index, row in sample_data.iterrows():
        user_prompt = row['inputs']
        expected_output = row['expected_outputs']

        # Define the model and task
        model_name = "gpt-4"
        task = "text-davinci-002"

        # Log the model
        mlflow.openai.log_model(model_name=model_name, task=task, artifact_path="model")

        # Generate response using OpenAI's API
        response = openai.Completion.create(
            engine=model_name,
            prompt=f"{system_prompt}\n{user_prompt}",
            max_tokens=50
        )

        # Extract the generated text
        generated_text = response.choices[0].text.strip()

        # Log the response
        mlflow.log_param(f"input_{index}", user_prompt)
        mlflow.log_param(f"expected_output_{index}", expected_output)
        mlflow.log_param(f"generated_text_{index}", generated_text)
        
        # Store the result
        results.append({
            'input': user_prompt,
            'expected_output': expected_output,
            'generated_text': generated_text
        })

Step 5: Evaluate the Model

Evaluate the model using predefined metrics such as similarity, latency, and readability.

# Convert results to DataFrame for evaluation
results_df = pd.DataFrame(results)

# Example evaluation metrics (you can customize these)
metrics = {
    'similarity': 0.95,  # Placeholder value
    'latency': response.response_ms,
    'readability': 10  # Placeholder value
}

# Log the metrics
for metric, value in metrics.items():
    mlflow.log_metric(metric, value)

# Save evaluation results to a CSV file
evaluation_results = pd.DataFrame([metrics])
evaluation_results.to_csv("evaluation_results.csv", index=False)

Step 6: Integrate DagsHub for Remote Tracking

Integrate DagsHub to store and visualize your experiment results remotely.

# Initialize DagsHub logger
dagshub_logger = DAGsHubLogger(
    owner='YOUR_DAGSHUB_USERNAME',
    repo='YOUR_REPO_NAME',
    experiment_name=experiment_name,
    job_name='LLM_Evaluation'
)

# Log parameters and metrics to DagsHub
dagshub_logger.log_params({
    'system_prompt': system_prompt,
    'results': results
})
dagshub_logger.log_metrics(metrics)

# Push evaluation results to DagsHub
dagshub_logger.log_artifact("evaluation_results.csv")
dagshub_logger.close()

Conclusion

MLflow is a powerful tool for managing the lifecycle of machine learning projects, including generative AI applications. By integrating DagsHub, you can remotely track and visualize your experiment results, making it easier to manage and share your projects. Whether you're working with OpenAI's GPT models or other LLMs, MLflow provides the flexibility and features needed to streamline your workflow.

We hope this post has given you a clear understanding of how to use MLflow with generative AI models. Stay tuned for more detailed tutorials and examples as we continue to explore the exciting world of generative AI.

Thank you for reading!

Revanth Tech Trends