How to Build Your Own PDF Chat App with Python
Hey there, young tech enthusiasts! Today, we're going to learn how to create a super cool chat app that lets you talk to your PDF files. Imagine being able to upload a PDF document and then ask questions about it, like chatting with a friend. By the end of this guide, you'll know how to build your very own PDF chat app using Python!
What We'll Do
We'll create a chat app that:
- Uploads a PDF file.
- Splits the PDF content into smaller chunks.
- Allows you to ask questions about the PDF.
- Gives you answers based on the PDF content.
What You’ll Need
- Basic knowledge of Python.
- A few Python packages (we'll show you how to install these).
- Curiosity and excitement to learn something new!
Step-by-Step Guide
1. Setting Up Your Environment
First, we need to set up a virtual environment to keep all our project files organized. Open your command terminal and type:
conda create -n pdf_chat python=3.8
conda activate pdf_chat
2. Installing Required Packages
Next, we need to install some Python packages that will help us read PDFs, create a user interface, and interact with language models.
Create a file named requirements.txt
with the following content:
langchain==0.0.154
PyPDF2==3.0.1
python-dotenv==1.0.0
streamlit==1.18.1
faiss-cpu==1.7.4
streamlit-extras
joblib
altair==4.2.0
tiktoken
openai==0.27.0
Then, run this command to install the packages:
pip install -r requirements.txt
3. Setting Up the .env File Create a file named .env in the same directory as your app.py file. This file will store your OpenAI API key. Add the following line to your .env file: OPENAI_API_KEY=your_openai_api_key_here Replace your_openai_api_key_here with your actual OpenAI API key. You can get this key by signing up on the OpenAI platform and generating an API key.
4. Creating the App
Now, let's create the main Python file for our app. We'll call it app.py
.
Open app.py
and start by importing the necessary packages:
import streamlit as stfrom dotenv import load_dotenvimport joblibfrom PyPDF2 import PdfReaderfrom langchain.text_splitter import RecursiveCharacterTextSplitterfrom langchain.embeddings.openai import OpenAIEmbeddingsfrom langchain.vectorstores import FAISSfrom langchain.chains.question_answering import load_qa_chainfrom langchain.callbacks import get_openai_callbackfrom langchain.chat_models import ChatOpenAIimport os
5. Building the User Interface
We’ll use Streamlit to create a simple user interface. Streamlit is great for building web apps with Python.
Add this code to app.py
to set up the basic UI:
# Sidebar contentswith st.sidebar: st.title('🤗💬 LLM Chat App') st.markdown(''' ## About This app is an LLM-powered chatbot built using: - Streamlit - LangChain - OpenAI LLM model ''') st.markdown('---') # Add a horizontal rule for separation st.write('Made with ❤️ by [RevanthTechTrends](https://youtube.com/@RevanthTechTrends)')
load_dotenv()
def main(): st.header("Chat with PDF 💬")
# Upload a PDF file pdf = st.file_uploader("Upload your PDF", type='pdf')
if pdf is not None: pdf_reader = PdfReader(pdf) text = "" for page in pdf_reader.pages: text += page.extract_text()
text_splitter = RecursiveCharacterTextSplitter( chunk_size=1000, chunk_overlap=200, length_function=len ) chunks = text_splitter.split_text(text=text)
store_name = pdf.name[:-4] st.write(f'{store_name}')
if os.path.exists(f"{store_name}.joblib"): VectorStore = joblib.load(f"{store_name}.joblib") else: embeddings = OpenAIEmbeddings() VectorStore = FAISS.from_texts(chunks, embedding=embeddings) joblib.dump(VectorStore, f"{store_name}.joblib")
# Accept user questions/query query = st.text_input("Ask questions about your PDF file:")
if query: docs = VectorStore.similarity_search(query=query, k=3)
llm = ChatOpenAI( api_key=os.getenv("OPENAI_API_KEY"), model_name="gpt-4", max_tokens=1000, temperature=0.5 ) chain = load_qa_chain(llm=llm, chain_type="stuff") with get_openai_callback() as cb: response = chain.run(input_documents=docs, question=query) print(cb) st.write(response)
if __name__ == '__main__': main()
Running the App
Install the Requirements: Open your command terminal and navigate to the directory containing
requirements.txt
. Run the following command to install all the necessary packages:pip install -r requirements.txt
Run the Streamlit App: Use the following command to run the Streamlit app:
streamlit run app.py
Open the App: Streamlit will provide a local URL (usually
http://localhost:8501
). Open this URL in your web browser to interact with your PDF chat app.
How It Works
Uploading and Reading the PDF: The app allows you to upload a PDF file and then reads its content using the PyPDF2 library.
Splitting Text into Chunks: The app splits the PDF text into smaller chunks to make it easier for the language model to process. This is done using the RecursiveCharacterTextSplitter from the LangChain library.
Creating Embeddings: Embeddings are like digital fingerprints for text. They help the app understand the content of each chunk. The app uses OpenAI's embedding model to create these embeddings.
Caching Embeddings: To save time and resources, the app stores the embeddings in a file so it doesn't have to recompute them every time you upload the same PDF.
Querying the PDF: You can type in questions, and the app will find the most relevant chunks of text from the PDF. It uses a technique called similarity search to do this.
Getting Answers: The relevant chunks of text are then passed to a language model, which generates answers to your questions.
Final Thoughts
Congratulations! You've built your very own PDF chat app. This project is a fun way to learn about reading PDF files, creating user interfaces, and working with language models. Keep experimenting and see what other cool features you can add. Maybe try using different types of documents or adding more sophisticated querying options. The sky's the limit!
Happy coding, and see you next time!
Comments
Post a Comment