DATA5000

Artificial Intelligence Programming in Business Analytics

Application of Large Language Models 1

Workshop #11

Kaplan Business School

Learning Outcomes

By the end of this workshop, you will be able to:

Understand how to give LLMs access to your company's specific data
Explain the concept of Retrieval-Augmented Generation (RAG)
Describe how vector embeddings enable document similarity search
Integrate ChromaDB, LangChain, and LLMs using Python
Compare RAG and fine-tuning approaches for customizing LLMs
Apply these techniques to real business scenarios

The Business Problem

Scenario: TechCorp Customer Support

TechCorp has accumulated 10 years of customer service transcripts, product documentation, and technical troubleshooting guides. They want to use ChatGPT to help support agents quickly find solutions.

❌ Generic ChatGPT

Question: "How do I reset the TC-5000 router?"

Response: "I don't have specific information about the TC-5000 router. Generally, routers can be reset by..."

Vague, unhelpful, potentially incorrect

✓ Customized LLM

Question: "How do I reset the TC-5000 router?"

Response: "According to TechCorp Manual v3.2: Hold the reset button for 15 seconds, then power cycle. The LED will flash amber during reset."

Accurate, specific, cited

Why Generic LLMs Fall Short

ChatGPT and similar models have fundamental limitations for business applications:

No Access to Your Data

LLMs only know what they were trained on. Your company's internal documents, customer data, and proprietary information don't exist in their knowledge base.

Outdated Information

Training data has a cutoff date. New products, updated policies, or recent market changes are unknown to the model.

Hallucination Risk

When LLMs don't know the answer, they often generate plausible-sounding but incorrect information—dangerous for business decisions.

The Solution: We need to give LLMs access to specific, relevant information when answering questions. This is where Retrieval-Augmented Generation comes in.

Two Approaches to Customizing LLMs

Approach 1: RAG

Retrieval-Augmented Generation

Concept: Give the LLM access to relevant documents when it answers questions

Analogy: Open-book exam—the model can consult reference materials

Best for: Frequently changing information, need for source citations, limited training data

Approach 2: Fine-Tuning

Model Training

Concept: Train the LLM on your specific domain and writing style

Analogy: Intensive studying—the model internalizes the knowledge

Best for: Stable knowledge domain, consistent style needed, large training datasets

Today's focus: RAG (We'll cover fine-tuning later)

Knowledge Check 1

Why would a company need RAG or fine-tuning instead of using ChatGPT directly?

A) ChatGPT is too expensive for business use

B) ChatGPT doesn't understand natural language well enough

C) ChatGPT doesn't have access to company-specific data and information

D) ChatGPT can only process text, not numbers

How RAG Works: Simple Overview

1. Question

User asks: "How do I reset the TC-5000?"

→

2. Search

System finds relevant documents in database

→

3. Retrieve

Pulls TC-5000 manual section

→

4. Generate

LLM reads docs + question, generates answer

Key Insight

The LLM never "memorizes" your documents. Instead, it's given relevant excerpts in real-time to inform its response. This means it always has access to the latest information.

Next question: How does the system know which documents are "relevant"?

The Technical Challenge

How do computers find relevant documents?

How Humans Do It

Read and understand content
Recognize concepts and context
Identify semantic similarity
Make connections between ideas

Example: We know "kitten" and "cat" are related even if the exact words don't match.

How Computers Need to Do It

Convert text to numbers
Measure mathematical similarity
Find closest matches
Rank by relevance

Computers can't "read"—they need numerical representations to compare documents.

The Solution: Vector Embeddings

We convert each document into a unique "fingerprint" made of numbers. Documents with similar content get similar fingerprints, allowing computers to find related information mathematically.

Understanding Vector Embeddings

What is an Embedding?

An embedding is a list of numbers that represents the meaning of text. Similar meanings result in similar numbers.

Example: Pet-Related Words

"cat" = [1.6, 2.3, 4.7, ..., 8.2, 17.9]
"kitten" = [1.6, 2.4, 4.7, ..., 8.3, 18.4]
"dog" = [1.5, 2.2, 4.8, ..., 8.1, 17.7]
"puppy" = [1.5, 2.3, 4.8, ..., 8.2, 18.0]
"car" = [8.2, 1.1, 2.3, ..., 4.5, 6.7]

Notice: "cat" and "kitten" have very similar numbers. "car" is completely different. The algorithm captures semantic meaning!

For business documents: A product manual and a troubleshooting guide about the same product would have similar embeddings, making them easy to find together.

Visualizing Vector Embeddings

In reality, embeddings have hundreds of dimensions. Here's a simplified 2D visualization:

sink

faucet

kitchen

refrigerator

oven

microwave

drill

hammer

saw

QUERY: "plumbing"

Key Observation: Similar concepts cluster together. When searching for "plumbing," the system finds the nearest neighbors in vector space—in this case, kitchen/sink/faucet items rather than tools or appliances.

Knowledge Check 2

What is the main purpose of converting documents into vector embeddings?

A) To compress files and save storage space

B) To enable mathematical comparison of document similarity

C) To encrypt sensitive business information

D) To make documents load faster in the LLM

Measuring Document Similarity

Cosine Similarity: The Distance Metric

Once we have vector embeddings, we need to measure how similar they are. Cosine similarity is the standard approach:

Similarity = (Vector1 · Vector2) / (||Vector1|| × ||Vector2||)

Range: -1 (opposite) to +1 (identical)

Practical Example

Document A: "The TC-5000 router supports WiFi 6 and has excellent range."

Document B: "TC-5000 provides WiFi 6 connectivity with extended coverage."

Document C: "Our company picnic will be held in the park next Saturday."

Results:

Similarity(A, B) = 0.94 ← Very similar (same topic)

Similarity(A, C) = 0.12 ← Not similar (different topics)

Don't worry: Python libraries handle these calculations automatically. You just need to understand the concept.

Complete RAG System Architecture

User Interface Layer

User asks question: "How do I troubleshoot TC-5000 connectivity issues?"

↓

Embedding Layer (HuggingFace)

Converts question into vector: [2.3, 1.7, 4.2, ..., 8.9]

↓

Vector Database (ChromaDB)

Searches for documents with similar vectors → Finds TC-5000 troubleshooting guide (similarity: 0.92)

↓

Orchestration Layer (LangChain)

Combines: User question + Retrieved documents → Sends to LLM

↓

Generation Layer (LLM)

Reads context and generates: "According to the TC-5000 guide, try these steps..."

RAG Technology Stack

HuggingFace Embeddings

Purpose: Convert text to vectors

Function: Provides pre-trained models that understand semantic meaning across multiple languages and domains

Example: sentence-transformers/all-MiniLM-L6-v2

ChromaDB

Purpose: Store and search vectors

Function: Specialized database optimized for finding similar vectors quickly, even with millions of documents

Benefit: 100x faster than traditional databases for similarity search

LangChain

Purpose: Connect everything

Function: Framework that orchestrates the entire workflow—from question to embedding to retrieval to generation

Benefit: Write less code, focus on business logic

Why Three Separate Tools?

Each component is specialized and best-in-class. Using them together gives you flexibility to swap components as technology improves while maintaining the same overall architecture.

Knowledge Check 3

In a RAG system, what happens immediately AFTER the user asks a question?

A) The LLM generates an answer directly

B) The system searches Google for information

C) The question is converted to a vector embedding

D) All documents are sent to the LLM

Business Benefits of RAG

Accuracy and Trust

RAG dramatically reduces hallucinations because responses are grounded in actual documents. Users can verify source citations.

Metric: Studies show 85% reduction in factual errors compared to generic LLMs

Always Current

Update documents in the database, and the LLM immediately has access to new information—no retraining required.

Example: Add today's product release notes, answer questions about new features tomorrow

Cost-Effective

No expensive model training. Use existing LLMs with your data. Typical implementation cost is 90% less than fine-tuning.

Comparison: RAG setup: $1,000-5,000 | Fine-tuning: $50,000-200,000

Transparency

Every answer can be traced to source documents. Critical for compliance, auditing, and building user confidence.

Use case: Financial advice, medical information, legal guidance

RAG Applications Across Industries

Industry	Application	Business Impact
Customer Support	Instant access to product manuals, troubleshooting guides, and past support tickets	67% reduction in average handling time, 45% improvement in first-call resolution
Legal Services	Query case law, contracts, and regulations to support legal research	80% faster document review, 90% cost reduction in junior associate hours
Healthcare	Access medical literature, clinical guidelines, and patient history for decision support	35% improvement in diagnostic accuracy, 50% reduction in research time
Finance	Analyze company filings, market reports, and financial regulations	Real-time insights from 10,000+ documents, 70% faster compliance checks
Human Resources	Employee handbook, policies, benefits information instantly accessible	60% reduction in HR inquiry volume, 24/7 employee self-service

Alternative Approach: Fine-Tuning

What is Fine-Tuning?

Fine-tuning means continuing the training of a pre-trained LLM on your specific dataset. The model learns patterns, terminology, and style from your data.

When Fine-Tuning Makes Sense

✓ Good For:

Specialized domain language (medical, legal, technical)
Consistent writing style (brand voice, tone)
Stable knowledge that rarely changes
When you have 10,000+ training examples
Need for very fast inference (no retrieval step)

✗ Challenges:

Expensive (requires GPU compute for training)
Time-consuming (days to weeks for training)
Requires large datasets (thousands of examples)
Difficult to update (need to retrain)
Risk of "catastrophic forgetting" (model forgets original training)

RAG vs. Fine-Tuning: Decision Framework

Factor	RAG	Fine-Tuning
Setup Time	Hours to days	Weeks to months
Setup Cost	$1,000 - $5,000	$50,000 - $200,000
Data Required	Any amount (even 10 docs)	10,000+ training examples
Update Frequency	Real-time (just add docs)	Requires full retraining
Transparency	High (shows sources)	Low (black box)
Response Speed	Slower (retrieval + generation)	Faster (just generation)
Best Use Case	Dynamic knowledge bases, customer support, research	Specialized domains, consistent style, stable knowledge

Recommended Strategy: Start with RAG. It's faster, cheaper, and more flexible. Consider fine-tuning only when you have specific style/domain requirements and substantial training data.

Knowledge Check 4

A company updates its product catalog weekly and needs an AI assistant to answer customer questions about products. Which approach is most suitable?

A) RAG—because the information changes frequently and needs to stay current

B) Fine-tuning—because it provides faster responses

C) Neither—just use ChatGPT directly

D) Both—always use RAG and fine-tuning together

Building a RAG System: Step-by-Step

Prepare Your Documents

Collect and organize your company's documents. Clean the data (remove duplicates, fix formatting, remove sensitive information).

Tip: Start with 50-100 documents to test the system before scaling up.

Install Required Tools

Set up Python environment with LangChain, ChromaDB, and HuggingFace libraries.

pip install langchain chromadb sentence-transformers

Create Embeddings

Convert each document into vector embeddings using HuggingFace models.

embeddings = HuggingFaceEmbeddings( model_name="all-MiniLM-L6-v2" )

Store in Vector Database

Load embeddings into ChromaDB for efficient similarity search.

vectorstore = Chroma.from_documents( documents=docs, embedding=embeddings )

Build Query Interface

Use LangChain to connect the retrieval system to your LLM, creating a complete question-answering pipeline.

Today's Hands-On Activity

What You'll Build

A working RAG system that can answer questions about a set of sample documents. You'll see how each component works together.

Part 1: Simple RAG

File: DATA5000_Simple_RAG.ipynb

What you'll do:

Create sample documents
Initialize embedding model
Build ChromaDB vector store
Query the database
Add metadata and filters

Time: 20-25 minutes

Part 2: Cosine Similarity

File: DATA5000_Cosine_similarity.ipynb

What you'll do:

See word embeddings in action
Calculate similarity scores
Understand why similar concepts cluster
Experiment with different words

Time: 15 minutes

Learning Goal: By the end of the activities, you should understand how to connect your own company documents to an LLM and see why vector embeddings enable semantic search.

Case Study: TechCorp Implementation

Background

TechCorp's customer support team handles 1,000 emails daily about 500+ products. Agents spend 40% of their time searching for information in 10 years of documentation.

Initial Situation

Challenge: Average response time of 4 hours, inconsistent answers, high agent frustration

Cost: $2.5M annually in support costs

RAG Implementation (6 weeks)

Data: Ingested 10,000 documents (product manuals, FAQs, past tickets, troubleshooting guides)

System: Built RAG interface integrated with email system

Cost: $25,000 setup + $500/month hosting

Results After 6 Months

✓ Average response time: 45 minutes (89% reduction)

✓ First-response accuracy: 92% (up from 67%)

✓ Agent satisfaction: +45 points

✓ Cost savings: $1.2M annually

ROI: 4,700% in first year

Advanced: Combining RAG and Fine-Tuning

For maximum performance, some organizations use both approaches together:

RAG Handles:

Accessing current information
Retrieving specific facts
Providing source citations
Updating knowledge in real-time

Fine-Tuning Handles:

Domain-specific language
Company writing style
Common question patterns
Technical terminology

Example: Legal Firm Implementation

Fine-tuned model: Trained on legal writing style, case law structure, and legal reasoning patterns

RAG system: Accesses current case database, recent rulings, and client-specific documents

Result: Natural legal writing (from fine-tuning) with accurate, current case citations (from RAG)

Cost consideration: This approach costs $150,000-300,000 to implement but provides best-in-class performance for mission-critical applications.

Knowledge Check 5

Which statement about RAG implementation is most accurate?

A) RAG requires retraining the entire LLM on your documents

B) Once implemented, RAG systems cannot be updated with new information

C) RAG provides transparency by showing which documents were used to generate answers

D) RAG is always more expensive than fine-tuning

Critical Success Factors for RAG

What makes RAG implementations succeed or fail?

Factor	Success Pattern	Failure Pattern
Document Quality	Clean, well-organized, accurate source documents with clear structure	Poorly formatted, outdated, or contradictory documents that confuse the system
Chunk Size	Optimal: 200-500 words per chunk. Balances context and precision.	Too small (fragments sentences) or too large (retrieves irrelevant content)
Metadata Strategy	Rich metadata (date, author, department, document type) enables filtering	No metadata means retrieving potentially outdated or irrelevant documents
Monitoring	Track answer quality, user feedback, retrieval accuracy—iterate constantly	"Set and forget"—quality degrades as new docs are added without review
User Training	Teach users how to ask good questions and interpret citations	Users don't trust system because they don't understand how it works

Connection to Assessment 3

How This Week Relates to Your Project

Assessment 3 requires you to develop a data-driven business recommendation. RAG and fine-tuning are powerful tools you can use:

Potential Application 1

Build a RAG system using your business case documents, market research, and industry reports to support your analysis and recommendations.

Shows prescriptive analytics capability

Potential Application 2

Create a custom knowledge base that executives can query to understand your recommendations and supporting evidence.

Demonstrates business value

Potential Application 3

Use RAG to analyze competitor strategies, customer feedback, or market trends specific to your business case.

Provides data-driven insights

Next week (Week 12): We'll explore advanced LLM applications including agents, tool use, and more sophisticated RAG implementations that you can leverage in your assessments.

Key Takeaways

The Problem

Generic LLMs don't have access to your company's specific data, leading to vague or incorrect answers.

The Solution: RAG

Retrieval-Augmented Generation gives LLMs access to relevant documents in real-time, grounding responses in actual information.

How It Works

Convert documents to vector embeddings → Store in specialized database → Retrieve similar documents when users ask questions → LLM generates informed answer.

Key Components

HuggingFace (embeddings), ChromaDB (vector storage), LangChain (orchestration), LLM (generation).

Business Impact

85% reduction in factual errors, 60-90% faster response times, 90% lower cost than fine-tuning, always current information.

When to Use What

RAG: Dynamic information, need sources, fast implementation. Fine-tuning: Stable domain, style requirements, large training data. Both: Maximum performance for critical applications.

Resources for Further Learning

Documentation

LangChain RAG Tutorial: docs.langchain.com/rag
ChromaDB Quickstart: docs.trychroma.com
HuggingFace Embeddings: huggingface.co/sentence-transformers

Research Papers

"Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks" (Lewis et al., 2020)
"A Taxonomy of Retrieval-Augmented Generation" (2024)

Video Resources

RAG Explained: youtube.com/watch?v=tKPSmn-urB4
Building Production RAG Systems
Vector Databases Deep Dive

Tools & Templates

Google Colab notebooks (provided)
RAG decision framework
Assessment 3 application guide

What Happens Next

Today's Activities

Complete the two Google Colab notebooks to gain hands-on experience with vector embeddings and RAG systems.

Activity 1: Cosine Similarity

File: DATA5000_Cosine_similarity.ipynb

See how word embeddings capture semantic meaning and calculate similarity scores between different words.

Activity 2: Build a Simple RAG System

File: DATA5000_Simple_RAG.ipynb

Create a complete RAG pipeline with document embeddings, vector storage, and querying capabilities.

Week 12 Preview

Next week we'll explore advanced LLM applications: agents that can use tools, more sophisticated RAG architectures, and multi-step reasoning systems.

Remember: The goal isn't to memorize syntax—it's to understand how these systems work so you can apply them to real business problems.

Thank You

Questions?

Let's discuss how RAG can solve real business challenges

Kaplan Business School

DATA5000 - Week 11