Understanding how AI generates content from data
From prediction to creation: The technology behind ChatGPT, Claude, and Gemini
| Week | Topic | Key Learning | Data Type |
|---|---|---|---|
| 1-2 | Predictive ML | Pattern recognition in structured data | Numbers, categories |
| 3 | Deep Learning & Transformers | Neural networks with attention mechanism | Time series, sequences |
| 4-6 | Causal AI | Understanding cause and effect | Treatment effects, interventions |
| 7 | Generative AI | Creating new content from patterns | Text, images, code |
| 8 | Prompt Engineering | Controlling AI outputs effectively | Instructions, context |
Today's Focus: Moving from analyzing data to generating new content based on learned patterns
In Week 3, you used Temporal Fusion Transformers (TFT) to forecast Australia's inflation rate.
| Quarter | GDP Growth | Unemployment | Interest Rate | Inflation |
|---|---|---|---|---|
| 2023-Q1 | 2.3% | 3.5% | 3.6% | 7.8% |
| 2023-Q2 | 1.8% | 3.6% | 4.1% | 6.0% |
TFT Output: Predicted inflation for Q3 with 95% confidence: 5.2% ± 0.8%
Predict single output
Example: House price
Input: 5 features → Output: $450,000
Predict sequences
Example: Inflation forecast
Input: Time series → Output: Next 4 quarters
Generate new content
Example: Business report
Input: Prompt → Output: Full text/code
Key Insight: Generative AI uses the same transformer architecture you've already seen, but applies it to predict the next word in a sequence rather than the next number. By repeatedly predicting "what comes next," it can generate entire documents, code, or analyses.
Generative AI creates new content by learning patterns from massive datasets, then using those patterns to generate text, images, code, or other outputs that appear human-created.
Question: "Will customer X churn?"
Output: 0.73 (73% probability)
Fixed, numerical output
Question: "Why might customer X churn?"
Output: "Based on usage patterns, customer X shows declining engagement over 3 months, with support tickets about pricing. Recommend targeted retention offer..."
Flexible, contextual explanation
Tokenization is the process of breaking text into smaller units (tokens) that can be converted into numbers for the AI model to process.
Result: 9 tokens
Each word gets a unique number from the model's vocabulary
Rule of Thumb: 1 token ≈ 0.75 words in English
Or approximately: 100 words = 133 tokens
| Document Type | Word Count | Token Count | Notes |
|---|---|---|---|
| Short email | 150 words | ~200 tokens | Simple vocabulary |
| Business memo | 500 words | ~667 tokens | Professional language |
| Technical report | 2,000 words | ~2,800 tokens | Complex terminology, more tokens per word |
| Customer chat log | 100 words | ~150 tokens | Casual language, abbreviations |
Embeddings convert each token into a vector (list of numbers) that captures its meaning in mathematical space. Similar words have similar vectors.
Example: Token "Customer" (ID: 2456)
Example: Token "Client" (ID: 3892)
Notice: Similar meanings = similar numbers!
When we reduce 768 dimensions to 3D for visualization, words with similar meanings appear close together:
| Word Pair | Cosine Similarity | Relationship |
|---|---|---|
| "revenue" ↔ "income" | 0.87 | Very similar |
| "profit" ↔ "revenue" | 0.72 | Related concepts |
| "profit" ↔ "loss" | 0.41 | Opposite but related |
| "profit" ↔ "customer" | 0.23 | Weakly related |
| "profit" ↔ "bicycle" | 0.05 | Unrelated |
Scale: 1.0 = identical, 0.0 = completely unrelated
Generative AI models learn by analyzing massive amounts of text data from books, websites, articles, and code repositories. The scale is unprecedented in computing history.
| Model | Training Tokens | Equivalent Books | Training Cost |
|---|---|---|---|
| GPT-3 (2020) | 300 billion | ~600,000 books | ~$4.6 million |
| GPT-4 (2023) | ~13 trillion | ~26 million books | ~$100 million (estimated) |
| Claude 3 (2024) | Similar scale | Tens of millions of books | Similar magnitude |
For Context: One book ≈ 500,000 tokens (about 375,000 words)
During training, the model learns patterns by analyzing how words appear together millions of times across different contexts. It builds statistical understanding of language structure.
Learned: "quarterly revenue" is often followed by:
Model assigns probabilities based on training frequency
Learned: After "The customer", likely words:
No explicit grammar rules programmed
Pattern: "The company reported [X] earnings"
Result: Model learns "strong" is more likely than "purple" in this context
Generative AI works by repeatedly predicting the next most likely token based on all previous tokens. This simple process, when repeated, creates coherent text.
Input Context: "The quarterly revenue"
Selected Token: "increased" (highest probability)
Next Step: Model now predicts next token after "The quarterly revenue increased"
This process repeats until a complete response is generated
Purpose: Predict future numerical values
Input: Time series data (GDP, unemployment, etc.)
Output: Next quarter's inflation rate
Attention: Which past time periods are important?
Purpose: Generate text, code, analysis
Input: Text prompt (tokenized)
Output: Next token (repeated for full text)
Attention: Which previous words are important?
Processing Flow: Text → Tokens → Embeddings → 96+ Transformer Layers → Next Token Probabilities → Select Token → Repeat
Self-Attention allows each word to "look at" other words in the input to understand context. This is how AI understands that "Apple" in "Apple stock rose" refers to the company, not the fruit.
| Word | Attends Most To | Attention Weight | Why? |
|---|---|---|---|
| Apple | stock | 0.85 | Determines it's company context |
| stock | Apple, rose | 0.78, 0.62 | Subject and action relationship |
| rose | stock, despite | 0.82, 0.43 | Main action and contrast |
| despite | rose, concerns | 0.71, 0.69 | Contrast marker |
| market | concerns | 0.88 | Modifies concerns |
| concerns | market, rose | 0.83, 0.47 | Type and contrast |
At $0.03 per 1,000 input tokens (GPT-4 pricing):
Parameters are the numbers the model adjusts during training to learn patterns. More parameters generally means more capacity to learn complex patterns, but also higher costs.
| Model | Parameters | Release | Context Window | Relative Speed |
|---|---|---|---|---|
| GPT-3 | 175 billion | 2020 | 4,096 tokens | Fast |
| GPT-3.5 | 175 billion | 2022 | 4,096 tokens | Very fast |
| GPT-4 | ~1.8 trillion (estimated) | 2023 | 128,000 tokens | Slower |
| Claude 3 Opus | Unknown (comparable) | 2024 | 200,000 tokens | Medium |
| Claude 3.5 Sonnet | Unknown | 2024 | 200,000 tokens | Fast |
Temperature is a parameter (0.0 to 2.0) that controls how deterministic vs. creative the model's outputs are. It affects how the model samples from its probability distribution.
Effect: Always picks highest probability token
Output: Consistent, predictable, focused
Best For:
Effect: Samples more randomly from probabilities
Output: Creative, varied, exploratory
Best For:
Prompt: "The quarterly revenue"
| Temperature | Next Token | Explanation |
|---|---|---|
| 0.0 | "increased" (45%) | Always picks highest probability |
| 0.7 | "decreased" (22%) | Occasionally picks 2nd or 3rd option |
| 1.5 | "remained" (5%) | Even low-probability options possible |
Prompt: "Write a brief email to the team about Q3 revenue results"
Pricing Model: Most AI APIs charge separately for input tokens (what you send) and output tokens (what you receive)
| Token Type | Cost per 1,000 tokens | Typical Use |
|---|---|---|
| Input (Prompt) | $0.03 | Your questions, context, data |
| Output (Generated) | $0.06 | AI's responses |
| Cached Input | $0.015 | Repeated context (50% discount) |
ROI Consideration: If each conversation saves 5 minutes of human agent time ($0.50 labor cost), monthly savings = $5,000. Net benefit = $4,670/month
Hallucination: When the model generates information that sounds plausible but is factually incorrect, not supported by training data, or fabricated.
| Category | Hallucination Example | Risk Level |
|---|---|---|
| Statistics | "Studies show 73% of customers prefer..." (no such study exists) | High |
| Citations | "According to Smith et al. (2023)..." (paper doesn't exist) | High |
| Product Features | "This software includes blockchain integration" (it doesn't) | Medium |
| Company Details | "ABC Corp was founded in 1995" (actually 1998) | Medium |
| Technical Specs | "The API supports 10,000 requests/sec" (actual limit: 1,000) | High |
Scenario: You asked AI to summarize your company's Q3 performance. Review these 5 statements:
Business Problem: TeleConnect receives 1,000 customer reviews weekly. Manual categorization takes 5 minutes per review. Can generative AI automate this accurately?
| Review ID | Customer Review Text | Length |
|---|---|---|
| 001 | "The service is reliable but customer support response time is terrible. Waited 3 days for callback." | 92 words |
| 002 | "Love the new mobile app features! Much easier to manage my account now." | 48 words |
| 003 | "Pricing is too high compared to competitors. Considering switching despite good service quality." | 67 words |
| Setting | Temperature | Accuracy | Consistency | Speed |
|---|---|---|---|---|
| Configuration 1 | 0.2 | 94% | Very High | ~50 tokens/sec |
| Configuration 2 | 0.7 | 87% | Medium | ~50 tokens/sec |
| Configuration 3 | 1.2 | 76% | Low | ~50 tokens/sec |
Conclusion: Temperature 0.2 optimal for classification tasks requiring consistency
Today's Lab: You'll use Google Colab to interact with Anthropic's Claude API and see tokenization, temperature, and costs in action.
Due Week 9 - How this week's content directly applies to your group project
Now that you understand how generative AI works (tokens, embeddings, temperature, patterns), next week you'll learn how to use it effectively through prompt engineering.