Building a RAG Chatbot with LangChain, ChromaDB, Streamlit & OpenAI

MLOPS Project Series-PART 2

Powered by OpenAI Embeddings + GPT, using my rag-chatbot-langchain repo

Large Language Models are powerful, but out of the box they don’t know anything about your PDFs, playbooks, or eBooks. That’s where Retrieval-Augmented Generation (RAG) comes in.

In this project, I built a document-aware chatbot that can answer questions based on any PDFs you drop into a folder — using:

  • LangChain for orchestration
  • ChromaDB as a vector store
  • OpenAI Embeddings + Chat Models for intelligence
  • Streamlit for the UI
  • Docker for easy deployment

The complete implementation lives here:

👉 https://github.com/rjshk013/mlops-project/tree/master/rag-chatbot-langchain

This article explains how it works and gives you a step-by-step guide to run it from my repo.

⚠️ Important: This Project Uses OpenAI Only (Not HuggingFace)

The current codebase is built specifically for OpenAI, not HuggingFace.

Concretely:

  • Embeddings are generated via:
  • from langchain_openai import OpenAIEmbeddings embeddings = OpenAIEmbeddings()
  • The LLM answering your questions uses:
  • from langchain_openai import ChatOpenAI llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0.7)

👉 This means:

  • You must have a valid OpenAI account
  • You must generate an OpenAI API key
  • You must set OPENAI_API_KEY in a .env file (or environment variable)
  • OpenAI will charge you per API usage (embeddings + chat)

There is no HuggingFace / local LLM integration in this version. If you remove the API key, the app will fail with authentication or quota errors.

🏗️ High-Level Architecture

Here’s what happens under the hood:

  1. You place PDFs into the data/ folder
  2. The app:
  • Loads the PDFs
  • Splits them into overlapping text chunks
  • Calls OpenAI Embeddings API to embed each chunk
  • Stores embeddings + metadata in ChromaDB (local folder chroma_db/
  1. When you ask a question:
  • ChromaDB retrieves the most relevant chunks
  • LangChain builds a RAG prompt using those chunks as context
  • ChatOpenAI (GPT model) generates the final answer
  • Streamlit displays the response in a chat UI

It’s a classic RAG loop: Retrieve → Augment → Generate.

🧱 Project Structure (From the Repo)

Inside mlops-project/rag-chatbot-langchain:

rag-chatbot-langchain/
│── chatbot.py # Streamlit app (RAG chatbot)
│── ingest_database.py # Optional batch ingestion script
│── requirements.txt # Python dependencies
│── Dockerfile # Image definition
│── docker-compose.yml # Orchestration
│── data/ # Put your PDFs here
│── chroma_db/ # Chroma vector DB (auto-created)
│── .env # OpenAI API key (you create this)
└── README.md

🔑 OpenAI Requirement: Why We Need the API Key

Two core parts of the pipeline depend on OpenAI:

  1. Embeddings during ingestion
  • OpenAIEmbeddings() calls the embeddings API for each chunk
  1. LLM during question answering
  • ChatOpenAI() calls the chat/completions API to generate response

That’s why the .env must include:

OPENAI_API_KEY=sk-...

And why OpenAI asks for payment:
Every embedding + chat request uses compute on their GPU clusters, so they charge per token to cover infra costs.

🚀 Step-by-Step: How to Run This Project from My Repo

Here’s a clean, ordered guide to run this from scratch.

✅ Prerequisites

  1. OpenAI Account with billing enabled
  2. OpenAI API Key from:
    https://platform.openai.com/api-keys
  3. Git installed
  4. EITHER:
  • Docker + Docker Compose (recommended), or
  • Python 3.10+ with pip

1️⃣ Clone the Repository

git clone https://github.com/rjshk013/mlops-project.git
cd mlops-project/rag-chatbot-langchain

2️⃣ Create .env with Your OpenAI API Key

In the rag-chatbot-langchain directory, create a file named .env:

nano .env

Add:

OPENAI_API_KEY=sk-your-key-here

🔐 Keep this file private. Don’t commit it to GitHub.

3️⃣ Add Your PDF Documents

Put one or more PDFs into the data/ folder:

mkdir -p data
cp /path/to/your-documents/*.pdf data/

Examples:

  • Kubernetes eBooks
  • Internal design docs
  • Policy PDFs
  • Training material

These are the knowledge base for your chatbot.

4️⃣ Run the App (Option A: Docker Compose) ✅ Recommended

Make sure Docker and Docker Compose are installed, then run:

docker-compose up --build

What this does:

  • Builds the container image using Dockerfile
  • Installs dependencies from requirements.txt
  • Mounts ./data and ./chroma_db into /app inside the container
  • Exposes port 8501 for the Streamlit app

Once it’s up, open:

👉 http://localhost:8501

5️⃣ Ingest Documents from the UI

Inside the Streamlit page:

  1. Go to the left sidebar
  2. Click 📥 Ingest Documents
  3. The app will:
  • Load all PDFs from data/
  • Split them into text chunks
  • Call OpenAIEmbeddings to generate embeddings
  • Store the vector store in chroma_db/

If successful, you’ll see:

✅ Ingested X document chunks!

6️⃣ Ask Questions in the Chat

Now at the bottom of the page:

  • Type a question like:

“What is Kubernetes and why is it used?”
According to the eBook, what are the main components of the Kubernetes control plane?”

The chat flow:

  • Uses st.chat_input to capture your prompt
  • Retrieves relevant chunks from Chroma
  • Builds a prompt via ChatPromptTemplate
  • Sends it to ChatOpenAI with the context
  • Streams the answer back into the UI

You should see document-grounded answers, not generic responses.

7️⃣ (Optional) Run Ingestion via Script Instead of UI

You also have ingest_database.py, which can perform ingestion from the CLI.

It uses:

from langchain_openai.embeddings import OpenAIEmbeddings
from langchain_chroma import Chroma
from langchain_community.document_loaders import PyPDFDirectoryLoader

Run it as:

python ingest_database.py

This will:

  • Load PDFs from data/
  • Generate embeddings (via OpenAI)
  • Populate the same chroma_db/ directory

After that, when you launch chatbot.py, it will attach to the existing vector store.

8️⃣ Run Locally Without Docker (Option B)

If you prefer to run it directly on your machine:

1. Create & activate virtualenv (optional but recommended)

python -m venv venv
source venv/bin/activate # on Windows: venv\Scripts\activate

2. Install dependencies

pip install --upgrade pip
pip install -r requirements.txt

3. Ensure .env has your OPENAI_API_KEY

(As shown earlier)

4. Run the app

streamlit run chatbot.py

Then open:

👉 http://localhost:8501

🧪 How to Confirm It’s Really Using Your PDFs

A few good tests:

  1. Ask about a very specific phrase from your PDF
  • “What does the document say about ‘kubelet and node agent’?”
  1. Ask something not present in the documen
  • “Explain Istio multi-mesh, as per this document.”
    — The bot should say it’s not mentioned.
  1. Ask for a summary of a specific chapter / section
  • “Summarize the chapter that explains Kubernetes cluster components.”

If the answers match your docs and don’t hallucinate unrelated content, your RAG setup is working well.

🔮 Possible Next Steps (Future Work)

Right now, this project is:

  • ✅ OpenAI only
  • ✅ RAG-based
  • ✅ Streamlit UI + Dockerized

You could extend it by:

  • Adding source citations (showing which PDF pages were used)
  • Adding upload-from-UI instead of manual data/ copy
  • Swapping OpenAI models (e.g., GPT-4o)
  • Adding rate limiting & logging
  • Creating a FastAPI backend and separating the UI

🙌 Conclusion

You’ve built a full Retrieval-Augmented Generation chatbot from scratch — including ingestion, embeddings, search, and a fully interactive UI.

This project is perfect for:

  • ML Engineers
  • MLOps portfolios
  • DevOps engineers learning GenAI
  • Interview preparation
  • Production RAG deployments

Leave a Comment

Your email address will not be published. Required fields are marked *

Stay up to date with our blogs.

Subscribe to receive email notifications for new blog posts.