Week 2: Applying Analytics Methodology to Industry Projects

DATA6000: Industry Business Analytics Project: Capstone

Focus Areas:

  • Formulating the right business questions
  • Sourcing data for industry projects
  • Applying analytics methodology steps

Learning Objectives

By the end of this session, you will be able to:

  • Transform vague business requests into specific, actionable analytics questions
  • Identify and evaluate appropriate data sources for industry capstone projects
  • Apply the 6-phase analytics methodology systematically to real-world problems
  • Recognize common pitfalls in problem formulation and data sourcing
  • Plan and scope an analytics project with realistic constraints

Why Problem Formulation is Critical

"A problem well-stated is a problem half-solved" - Charles Kettering

Problem formulation is the most critical element in any business analytics project because:

  • Direction: It determines every subsequent decision in your project
  • Resources: It defines what data, tools, and techniques you need
  • Success Metrics: It establishes how you measure project success
  • Stakeholder Alignment: It ensures everyone understands the project goals
Key Insight: Most failed analytics projects fail not because of poor analysis, but because they solved the wrong problem. Getting the problem right is your first and most important task.

The Translation Challenge

From Business Language to Analytics Language

Business stakeholders speak in terms of goals and outcomes. Analysts need specific, measurable, and answerable questions.

Business Language (Vague) Analytics Language (Specific)
"We need to improve sales" "What factors predict which customers will make repeat purchases within 30 days?"
"Customers are leaving" "Which customer segments have the highest churn rate in the past 6 months, and what behaviors precede churn?"
"Optimize our operations" "Where are the bottlenecks in our order fulfillment process that add more than 24 hours to delivery time?"
"Understand our customers better" "What are the distinct customer segments based on purchasing behavior, and what are their characteristics?"

What Makes a Good Analytics Question?

The SMART-A Framework for Analytics Questions

S Specific: Clearly defined scope and focus. Who, what, where, and when are specified.
M Measurable: Can be answered with data and quantitative methods. Success can be objectively evaluated.
A Actionable: The answer will lead to concrete business decisions or actions.
R Relevant: Directly tied to business objectives and stakeholder needs.
T Time-bound: Has a clear timeframe for analysis and impact.
A Analytically Tractable: Can be answered with available or obtainable data and appropriate methods.

The 5W1H Method for Problem Definition

A systematic framework to translate business problems into analytics questions:

The 5W1H Framework

WHO

Stakeholders, target users, affected parties

WHAT

Objectives, outcomes, deliverables

WHERE

Context, location, business unit

WHEN

Timeframe, deadlines, temporal scope

WHY

Motivation, business value, impact

HOW

Success metrics, KPIs, measurement

5W1H Method: Applied Example

Business Request: "We need to reduce customer churn"

Applying the 5W1H Framework:

WHO: Subscription customers who have been active for at least 3 months

WHAT: Identify customers at high risk of canceling their subscription

WHERE: Focus on the North American market segment initially

WHEN: Predict churn probability for the next 60 days; project completion in 8 weeks

WHY: Customer acquisition costs are 5x retention costs; reducing churn by 10% equals $2M annual revenue

HOW: Success measured by model accuracy (>80%), precision (>70%), and business impact (5% churn reduction in pilot)

Resulting Analytics Question: "Which features and behaviors predict subscription cancellation within 60 days for North American customers who have been active for 3+ months, with sufficient accuracy to enable targeted retention interventions?"

Common Pitfalls in Problem Formulation

Watch Out For These Mistakes:

Too Vague

Problem: "Improve customer experience"

Why it fails: No specific metrics, no clear scope, impossible to measure success

Fix: "Reduce average customer support response time from 4 hours to 2 hours"

Too Broad

Problem: "Analyze all business operations"

Why it fails: Unfocused, resource-intensive, no clear deliverable

Fix: "Identify inefficiencies in the order fulfillment process that cause delays >24 hours"

Solution-Focused

Problem: "Build a neural network for sales"

Why it fails: Specifies technique before understanding problem

Fix: "Predict monthly sales by product category to optimize inventory levels"

Not Actionable

Problem: "Explore interesting patterns in data"

Why it fails: No business decision will result from the analysis

Fix: "Identify customer segments with different purchasing behaviors to personalize marketing"

Practice: Identifying Problem Quality

Evaluate these problem statements. Which are well-formulated and which need improvement?

Problem Statement Quality Assessment
"Use machine learning to improve things" ❌ Poor: Too vague, solution-focused, not actionable
"Which product features correlate with customer ratings above 4 stars in our mobile app?" ✓ Good: Specific, measurable, actionable
"Make customers happy" ❌ Poor: Not measurable, no scope, undefined
"What factors predict employee turnover within 90 days for new hires in sales roles?" ✓ Good: Specific, time-bound, measurable
"Do something about logistics" ❌ Poor: No objective, no scope, too vague

Quiz 1: Problem Definition

A retail manager says "We need to improve sales." Which is the BEST analytics question?

Data Sourcing for Industry Projects

Why Data Sourcing Matters

Once you have defined your business problem, the next critical step is identifying and accessing the right data.

Without appropriate data, even the most well-defined problem cannot be solved. Data sourcing determines the feasibility and quality of your entire project.
  • Data availability constrains what questions you can actually answer
  • Data quality determines the reliability of your insights
  • Data access affects project timelines and feasibility
  • Data costs impact project budgets and sustainability

Types of Data Sources in Industry Settings

Three Categories of Data Sources

Internal Sources

  • Transactional databases
  • CRM systems
  • ERP systems
  • Web analytics
  • Operational logs
  • Employee records
  • Financial systems

External Sources

  • Public datasets
  • Government data
  • Third-party APIs
  • Market research
  • Social media
  • Web scraping
  • Industry reports

Hybrid/Enriched

  • Internal + external combined
  • Purchased enrichment
  • Partner data sharing
  • Surveys + transaction data
  • Demographic overlays

The Data Availability Matrix

Evaluate potential data sources using this framework:

Data Availability Matrix

High Relevance →
← Low Accessibility

Low Relevance
High Accessibility

Status: Easy to get but not useful

Action: Avoid unless exploratory

Example: Public datasets unrelated to your problem

High Relevance
High Accessibility

Status: Ideal data sources

Action: Prioritize these sources

Example: Internal transaction database

Low Relevance
Low Accessibility

Status: Worst case scenario

Action: Eliminate from consideration

Example: Restricted competitor data not related to problem

High Relevance
Low Accessibility

Status: Valuable but challenging

Action: Assess effort vs. value

Example: Partner data requiring legal agreements

High Accessibility →
← High Relevance

Real-World Data Sourcing Challenges

Industry projects face practical challenges that academic projects rarely encounter:

Data Silos

Data scattered across different departments, systems, and formats with no integration

Impact: Requires significant time for data collection and integration

Data Quality Issues

Missing values, inconsistent formats, duplicates, errors, and outdated information

Impact: 60-80% of project time spent on data cleaning

Access Restrictions

Privacy regulations, security policies, legal constraints, and approval processes

Impact: Delays in project start and potential scope changes

Documentation Gaps

Unclear data definitions, missing metadata, undocumented business rules

Impact: Risk of misinterpreting data and drawing wrong conclusions

Key Takeaway: Always assess data availability and quality EARLY in your project. Many projects fail because data challenges were discovered too late.

Data Sourcing Checklist for Capstone Projects

Before Committing to a Data Source:

✓ Availability: Can you actually access this data? What approvals are needed?
✓ Timeframe: How long will it take to obtain access and extract the data?
✓ Completeness: Does the data cover the full scope of your analysis (time period, geographic area, customer segments)?
✓ Quality: What is the expected quality? Are there known issues?
✓ Format: In what format is the data? Will conversion or significant preprocessing be required?
✓ Volume: Is the data volume sufficient for your analysis? Too large to handle?
✓ Documentation: Is there a data dictionary? Are field definitions clear?
✓ Compliance: Are there legal, privacy, or ethical constraints on use?
✓ Cost: Is there any cost to obtain or use the data?
✓ Backup Plan: What alternatives exist if this data source falls through?

Quiz 2: Data Sourcing

For predicting employee turnover, which data source combination is MOST appropriate?

Applying the Analytics Methodology

Revisiting the 6-Phase Framework

Phase 1
Problem Definition &
Data Sourcing
Phase 2
Data Processing &
Management
Phase 3
Analytics
Techniques
Phase 4
Visualization &
Evaluation
Phase 5
Communication &
Recommendations
Phase 6
Ethics &
Security
Today's Focus: Phase 1 is where projects succeed or fail. Getting problem definition and data sourcing right determines everything that follows.

Phase 1: Problem Definition & Data Sourcing

Detailed Workflow

1 Initial Stakeholder Meeting
Understand business context, objectives, constraints. Document in your own words.
2 Apply 5W1H Framework
Systematically clarify who, what, where, when, why, and how for the project.
3 Draft Analytics Question
Translate business need into specific, measurable, actionable analytics question.
4 Identify Required Data
List all data elements needed to answer the analytics question.
5 Map to Data Sources
Identify where each data element can be obtained. Apply availability matrix.
6 Assess Feasibility
Evaluate data access, quality, timeline. Identify gaps and risks.
7 Refine or Pivot
Adjust analytics question based on data reality. Get stakeholder approval.
Finalize Project Scope
Document agreed problem, data sources, success criteria, timeline.

The Project Scoping Template

Essential components of a well-scoped analytics project:

1. Business Objective Statement

What business problem are you solving? Why does it matter?

Example: "Reduce customer churn by identifying at-risk customers early to enable targeted retention"

2. Analytics Question

The specific, measurable question you will answer with data.

Example: "Which features predict subscription cancellation within 60 days with >75% accuracy?"

3. Success Criteria and KPIs

How will you measure if the project succeeded?

Example: "Model achieves 80% accuracy, 70% precision. Pilot reduces churn by 5% in 3 months"

4. Data Requirements Specification

What data do you need? Where will you get it? What are the access requirements?

Example: "Customer database (internal), usage logs (internal), support tickets (Zendesk API)"

5. Constraints and Assumptions

Timeline, budget, resource limitations, known data issues.

Example: "8-week timeline, data limited to last 12 months, assuming data quality >90%"

Case Study: E-commerce Customer Churn

Walking Through the Complete Process

Initial Business Request

The VP of Marketing at an online subscription box company approaches you:

"We're losing too many customers. Can you figure out why people are leaving and help us keep them?"

Your Task: Transform this vague request into a well-defined analytics project.

Let's apply everything we've learned to this real-world scenario step by step.

Case Study: Applying 5W1H

Structured Discovery Questions

WHO is affected?

→ Subscription customers (monthly plan holders), Marketing team, Customer success team

WHAT do we want to achieve?

→ Identify customers likely to cancel before they do, so we can intervene with retention offers

WHERE is this happening?

→ All markets, but focus on US initially (80% of customer base)

WHEN is the timeframe?

→ Predict 30-60 days in advance. Project completion in 10 weeks.

WHY does this matter?

→ 20% annual churn rate costs $5M in lost revenue. Retention is 5x cheaper than acquisition.

HOW will we measure success?

→ Model performance: 75% accuracy, 70% precision. Business impact: 10% reduction in churn in pilot group.

Case Study: Crafting the Analytics Question

From Business Request to Analytics Question

Original Request (Too Vague)

"We're losing too many customers. Can you figure out why people are leaving and help us keep them?"

Refined Analytics Question (Specific & Actionable)

"Which customer behavioral patterns and account characteristics predict subscription cancellation within 30-60 days for US-based monthly subscribers, with sufficient accuracy (>75%) and precision (>70%) to enable cost-effective retention interventions?"

Why This Question Works: It's specific (30-60 days, US, monthly), measurable (>75% accuracy), actionable (enables interventions), relevant (addresses churn), time-bound (project timeline), and tractable (we can get the data).

Case Study: Identifying Data Requirements

What data do we need to answer the analytics question?

Data Category Specific Data Elements Source
Customer Profile Customer ID, subscription start date, plan type, demographics, location Internal CRM
Behavioral Data Login frequency, product views, time on site, feature usage, last login date Web analytics (Google Analytics)
Transaction History Payment history, failed payments, refund requests, plan changes Payment system (Stripe)
Support Interactions Support ticket count, resolution time, satisfaction scores, complaint types Support system (Zendesk)
Product Engagement Box customization rate, product ratings, shipping issues, delivery feedback Internal database
Outcome Variable Subscription status (active/cancelled), cancellation date, cancellation reason Internal CRM

Case Study: Assessing Data Availability

Apply the Data Availability Matrix to each source:

High Relevance + High Accessibility

  • Internal CRM data (customer profiles, subscription status)
  • Transaction history from payment system
  • Internal product engagement data

Action: Prioritize these - start here

High Relevance + Low Accessibility

  • Web analytics data (requires API setup and historical data export)
  • Support ticket data (in separate system, needs integration)

Action: Worth the effort - plan for 2-week data integration

Low Relevance + High Accessibility

  • General industry benchmark data
  • Public e-commerce statistics

Action: Use only for context, not core analysis

Low Relevance + Low Accessibility

  • Competitor customer data (impossible to obtain)
  • Detailed social media sentiment (complex to collect and low direct relevance)

Action: Eliminate - not worth pursuing

Case Study: Selecting Analytics Approach

Based on the defined problem and available data, what methodology makes sense?

Methodology Selection Rationale

Problem Type: Binary classification (will churn or won't churn)

Data Type: Mix of structured numerical and categorical data

Data Volume: 50,000 customers with 18 months of historical data (sufficient for supervised learning)

Business Requirement: Need interpretable results to understand why customers churn

Recommended Approach: Supervised classification models (logistic regression for interpretability, random forest for comparison, evaluation on holdout test set)

Quiz 3: Methodology Application

What is the FIRST step when starting an analytics capstone project?

Practical Considerations: Stakeholder Management

Getting Buy-In and Managing Expectations

Technical excellence alone doesn't guarantee project success. You must manage stakeholders effectively.

Key Stakeholders

  • Sponsor: Executive champion
  • End Users: Who will use insights
  • Data Owners: Control data access
  • IT/Technical: Support infrastructure
  • Compliance: Legal/privacy oversight

Best Practices

  • Set clear expectations early
  • Communicate in business terms, not jargon
  • Regular status updates (weekly)
  • Be transparent about limitations
  • Document all agreements

Common Stakeholder Issues

  • Scope creep: "While you're at it, can you also analyze..."
  • Unrealistic expectations: "Can you predict next year's sales with 95% accuracy?"
  • Changing priorities: Mid-project shifts in business focus
  • Data gatekeeping: Stakeholders reluctant to share data

Solution: Written project scope document signed by all stakeholders at project start.

Timeline Planning: Realistic vs Optimistic

Student projects often underestimate time requirements. Here's what actually takes time:

Project Phase Student Estimate Realistic Industry Timeline
Problem definition & scoping 1 week 2-3 weeks (multiple stakeholder meetings)
Data access approval Immediate 1-4 weeks (legal/IT approvals)
Data collection & integration 1 week 2-4 weeks (multiple systems, APIs, extraction)
Data cleaning & preparation 1 week 3-6 weeks (60-80% of project time)
Analysis & modeling 2 weeks 2-3 weeks (the "fun" part is shortest)
Validation & refinement 1 week 2-3 weeks (multiple iterations)
Documentation & presentation 1 week 2 weeks (stakeholder-ready materials)
TOTAL 8 weeks 14-25 weeks
Planning Principle: Whatever timeline you think is reasonable, add 50% buffer for unexpected delays. They WILL happen.

Risk Assessment and Mitigation

Identify and plan for potential project risks early:

Risk Category Specific Risk Mitigation Strategy
Data Access Cannot obtain necessary data due to privacy/security constraints Identify alternative data sources; have backup project scope
Data Quality Data has >50% missing values or major quality issues Early data quality assessment; plan for imputation or scope adjustment
Technical Data volume too large for available tools/infrastructure Sample data for initial analysis; cloud computing resources
Scope Problem too complex for project timeline Break into phases; focus on MVP (minimum viable product)
Stakeholder Stakeholder changes priorities mid-project Written scope agreement; regular check-ins; document changes
Skills Gap Required techniques beyond current skill level Identify learning resources early; seek mentorship; simplify approach

Quiz 4: Risk Identification

Your capstone project requires customer transaction data, but the company's data is spread across 5 different systems with inconsistent formats. What is the PRIMARY risk?

Why Documentation Matters from Day One

Good documentation is not optional - it's essential for project success and reproducibility.

Essential Documentation Components

Project Charter: Problem statement, objectives, scope, stakeholders, timeline
Data Dictionary: All data sources, field definitions, data types, missing value codes
Analysis Log: Date, what you tried, results, decisions made, lessons learned
Code Repository: Version controlled, well-commented, README file explaining structure
Decision Documentation: Why you chose certain methods, what alternatives you considered
Meeting Notes: Stakeholder discussions, feedback received, action items
Methodology Documentation: Preprocessing steps, model parameters, validation approach
Best Practice: Document as you go, not at the end. Future you (and your teammates) will thank present you.

Common Student Mistakes (and How to Avoid Them)

Learn from Others' Mistakes

❌ Mistake #1: Jumping to Solutions

Starting with "I want to use neural networks" before understanding the problem

Fix: Always start with problem definition. Let the problem guide technique selection.

❌ Mistake #2: Ignoring Data Reality

Designing ambitious project without checking data availability first

Fix: Validate data access in Week 1. Pivot early if needed.

❌ Mistake #3: Scope Creep

Continuously adding new questions and expanding scope

Fix: Write down scope. Say no to new additions. Focus on doing one thing well.

❌ Mistake #4: Poor Time Management

Spending 80% of time on modeling, 10% on data, 10% on communication

Fix: Plan 60% data work, 20% modeling, 20% communication.

❌ Mistake #5: Weak Problem Definition

Accepting vague requirements without clarification

Fix: Use 5W1H framework. Push back on vagueness. Get specifics.

❌ Mistake #6: No Backup Plan

Single source of failure (one data source, one approach)

Fix: Always have Plan B for critical dependencies.

What Makes a Successful Capstone Project?

Success Criteria Beyond Technical Excellence

1. Clear Business Value

Your project solves a real problem that stakeholders care about

2. Appropriate Scope

Achievable within time and resource constraints, yet substantial

3. Data-Driven Insights

Analysis yields actionable findings backed by evidence

4. Methodological Rigor

Appropriate techniques, proper validation, transparent limitations

5. Effective Communication

Findings presented clearly to both technical and business audiences

6. Implementation Readiness

Clear recommendations with feasible next steps

Workshop Activity: Scope Your Own Project

Group Exercise (20 minutes)

Activity Instructions

You will work in groups of 3-4 to analyze a business brief and develop a project scope.

1 Read the Business Brief (2 min)
Your group will receive a real business scenario from an industry partner
2 Apply 5W1H Framework (5 min)
Systematically analyze who, what, where, when, why, and how
3 Define Analytics Question (5 min)
Transform business need into a specific, measurable analytics question
4 Identify Data Sources (5 min)
List required data and potential sources, assess availability
5 Present to Class (3 min per group)
Share your analytics question and rationale

Workshop Scenario

Healthcare Provider: Emergency Department Wait Times

Background: St. Mary's Hospital is a 400-bed facility with a busy emergency department (ED). The Chief Medical Officer is concerned about patient satisfaction scores related to wait times.

Business Request:

"Our ED wait times are hurting patient satisfaction and our reputation. We need to understand what's causing delays and fix them. Can you help us reduce wait times?"

Context:

  • ED sees 200 patients per day on average
  • Recent patient satisfaction scores: 3.2/5.0 (industry benchmark: 4.0/5.0)
  • Common complaints mention "long wait times" but no specific data
  • Hospital has electronic health records (EHR) system with timestamped patient flow data
  • Leadership wants recommendations within 8 weeks
Your Task: Apply the frameworks from today's lecture to scope this project. What's your analytics question? What data do you need? What are the risks?

Quiz 5: Applied Scenario

A healthcare provider wants to "reduce patient wait times." You have access to: appointment schedules, patient check-in logs, doctor availability, and treatment duration records. What is the MOST appropriate initial analytics approach?

Key Takeaways from Week 2

Problem formulation is the most critical element of any analytics project

Remember These Core Principles:

1. Start with the Problem, Not the Solution
Understand the business need before selecting analytical techniques
2. Use Frameworks to Structure Your Thinking
5W1H and SMART-A frameworks transform vague requests into specific questions
3. Validate Data Availability Early
Don't design a project around data you can't access
4. Plan for Reality, Not Best Case
Data issues, delays, and scope changes are normal - plan accordingly
5. Document Everything
Your future self and stakeholders will thank you

Preparing for Your Capstone Project

Action Items for Next Week

✓ Identify Potential Project Topics
Think about business problems you're interested in solving. Consider your professional interests and industry connections.
✓ Research Data Availability
For your potential topics, investigate what data might be available. Contact industry partners if relevant.
✓ Practice Problem Formulation
Take 3 vague business statements and translate them into specific analytics questions using the frameworks from today.
✓ Review Week 1 Content
Refresh your understanding of the 6-phase analytics framework and how Phase 1 connects to later phases.
✓ Set Up Project Documentation Structure
Create folders for your capstone project: data, code, documentation, references.
✓ Complete Workshop Activity
If you didn't finish the in-class workshop, complete the ED wait time scenario analysis.

Additional Resources

Recommended Reading and Tools

Problem Definition:

  • "Cracking the PM Interview" by Gayle McDowell - Problem-solving frameworks
  • "The Lean Startup" by Eric Ries - Validating problem-solution fit

Data Sourcing:

  • Google Dataset Search (datasetsearch.research.google.com)
  • Kaggle Datasets (kaggle.com/datasets)
  • Government open data portals (data.gov, data.gov.au)

Project Management:

  • CRISP-DM methodology for data mining projects
  • Trello or Asana for project task tracking
  • GitHub for code version control and collaboration

Week 2 Summary

Success in analytics projects starts with getting the problem right

We covered:

  • Translating business language to analytics questions
  • The 5W1H and SMART-A frameworks
  • Data sourcing challenges and strategies
  • The Data Availability Matrix
  • Applying the 6-phase methodology
  • Practical project management considerations

Next Week: Data Processing & Management (Phase 2)

1 / 40