Welcome to Your Capstone Journey
Your role: Industry Analytics Consultant
Not textbook exercises - you'll tackle actual industry challenges
You will add new knowledge to your chosen industry through original research and analysis
Scenario:
You are hired as an analytics consultant. An industry has a problem. Your job is to research, analyze, and provide actionable recommendations.
This is YOUR project. You drive the direction.
Industry exploration and formulate YOUR unique business question
Understanding what's already known before you contribute something new
Written report exploring industry background and analytics applications
Literature Review
Explore industry and formulate question
Methodology Peer Review
8-minute presentation + feedback
Industry Research Report
Complete project + elevator pitch
Each assessment builds on the previous one. Start early on your literature review!
Every successful analytics project follows a journey
Key Point: This is an iterative process, not a linear one!
Business Problem: "We're losing customers"
Analytics Question: "Which customer segments have the highest churn probability in the next 90 days?"
Internal: Company databases, CRM, transaction logs
External: Market research, public datasets, APIs
Structured: Databases, spreadsheets, tables
Unstructured: Text, images, videos, social media
Primary: Collected directly (surveys, experiments)
Secondary: Existing data from other sources
Real-time: Streaming data, live feeds
Historical: Archived data, trend analysis
Data is correct and error-free
Directly relates to your question
You can obtain and use it
Worth the investment
Proper permissions obtained
Personal information protected
Clear data provenance
Represents all groups appropriately
Scenario: A retail company wants to reduce customer churn. They have access to:
Explanation: Transaction history provides the ground truth of customer behavior (who churned vs. who stayed) and reveals purchasing patterns. This forms your baseline before exploring other signals.
Automation is faster but you lose hands-on understanding
Manual work takes longer but builds intuition
Automation requires upfront investment
Scenario: You're analyzing student mental health data and want to predict which students are at risk of dropping out. You have:
Explanation: This is a binary classification problem (drop out: Yes/No) with labeled historical data. Classification algorithms can handle both numerical and categorical features to identify patterns that predict dropout risk.
When to use: Understanding historical data, initial exploration
When to use: Forecasting, risk assessment, decision support
When to use: Action-oriented decisions, resource allocation
Understanding what has already happened - patterns in existing data useful for understanding the past
Outputs that describe the quality of your model:
Saying something about the future - this is where business value is created
Output: Future forecasts with confidence intervals
Business Use: Planning, budgeting, resource allocation
Output: Deploying the model to make predictions on new data
Business Use: Real-time decision making, automation, risk scoring
Your capstone project needs to answer a business question about the future, not just describe the past. Expected results can be inferred from testing accuracy and confusion matrix.
The band around your forecast line shows the probability the result will fall within that range
"Worst case" scenario
Plan for minimum expected outcome
Most likely scenario
Base case planning
"Best case" scenario
Opportunity planning
How is the business going to use the model in practice?
Replace or augment human decisions
Key question: How does model performance compare to human performance?
Anticipate customer, employee, or system actions
Key question: What is the cost of being wrong, and is it worth it overall?
Critical: Always consider the real-world consequences of model errors
Avoid:
Include:
Overall correctness
Can be misleading with imbalanced data!
Of predicted positives, how many are actually positive?
Important when false positives are costly
Of actual positives, how many did we catch?
Important when false negatives are costly
Balance between precision and recall
Useful when you need both to be good
Cancer detection: Catch all cases, even with false positives
At-risk student identification: Better to offer support unnecessarily than miss someone
Spam detection: Avoid marking important emails as spam
Fraud detection in payments: Don't block legitimate transactions
Scenario: Your predictive model for identifying at-risk students shows:
Explanation: High precision (95% of flagged students do drop out) but low recall (only catching 40% of actual dropouts) means the model is conservative. In this context, higher recall is better - it's better to offer support to students who might not need it than to miss students who do need help.
What problem were you solving?
Why does it matter?
How did you solve it?
(Brief on methods)
What should stakeholders DO with this information?
What's the expected outcome?
Problem: What challenge did you address? (15 seconds)
Solution: What did you do about it? (20 seconds)
Impact: What difference does it make? (25 seconds)
Recommendations must be:
"Improve student support"
"Implement a weekly check-in program for students identified as at-risk"
Stakeholders can actually do this with their available resources and authority
Tied directly to your findings - show the connection between data and recommendation
Consider constraints: budget, time, technical capability, organizational culture
How will you know if it worked? Define success metrics
Ethics must be considered at EVERY phase of your project
When in doubt, consult your facilitator
Scenario: A university wants to implement facial recognition technology during online lectures to:
Explanation: Biometric data (facial recognition) is highly sensitive personal information. Continuous monitoring raises significant privacy concerns. Students must give informed consent, understanding exactly what data is collected, how it's used, who has access, and their right to opt out. Accuracy, costs, and usability are important, but consent is the foundational ethical requirement.
Before diving into real constraints, let's think BIG. What if you had access to any data and any technology?
Context: You're hired by a university
Problem: Student mental health concerns, especially with remote learning
Task: Investigate and provide recommendations
Think about challenges you've faced or observed
We'll use these insights to inform our Blue Sky project
How can we identify and support students struggling with mental health in remote learning?
Grades, attendance, assignment submissions, LMS engagement patterns
Login patterns, time-of-day activity, video engagement metrics
Discussion forum sentiment, email communication patterns
Self-reported surveys, support service usage
AI emotion detection (facial expressions), voice stress analysis
Home study setup quality, internet stability
What data is actually NECESSARY vs. what would be "nice to have"?
Data flows automatically from sources to centralized system
Secure, encrypted storage with role-based access
Immediate alerts for critical indicators
Automated checks for completeness and accuracy
In your actual project, you'll likely use spreadsheets and manual processes. But understanding these principles helps you make better decisions.
Student risk indicators (traffic light system)
Recent activity summary
Recommended actions
Class-level trends
Comparison across courses
Time-based patterns
Institution-wide metrics
Intervention effectiveness
Resource allocation
In student wellbeing, is it better to have false positives or false negatives? Why?
Needs: Budget justification, policy implications
Format: Executive summary with ROI
Needs: Practical interventions they can implement
Format: Action-oriented guidelines
Needs: Case prioritization, resource allocation
Format: Detailed reports with risk scores
Needs: Transparency about monitoring, opt-out options
Format: Plain language explanations
You'll likely use public datasets or simulated data, not sensitive student information
Work within your skillset, but push yourself to learn one new technique
12 weeks to complete the project - scope accordingly
Free or student-licensed tools only
Make informed trade-offs, not compromises
Understand what you're giving up and why. Document your decisions.
Scenario: You're working on a recommendation system to suggest products to online shoppers. Your model shows:
Explanation: High accuracy doesn't mean the model is adding business value. A recommendation system that only suggests what customers would find anyway doesn't help with discovery or engagement. This demonstrates that model evaluation must align with business objectives, not just statistical metrics. Consider diversity, novelty, serendipity, and long-term engagement.
This is YOUR project. You drive the direction. We guide, but you decide.
Expect to refine your approach multiple times. That's normal and encouraged.
Treat this as a real consulting engagement. Quality matters.
Your work, properly cited sources. Plagiarism will not be tolerated.
Start early on literature review! Don't leave it until the last minute.
You'll encounter challenges. That's where learning happens.
This capstone is your opportunity to showcase everything you've learned and create something meaningful for your portfolio.
Case studies, skill building, and project development time
Office hours: Check learning portal for schedule
Email for questions
Feedback on drafts (with sufficient lead time)
Collaboration is encouraged (but submission is individual)
Discussion forums for questions
Peer review activities in class
Access to software through student licenses
Public datasets and repositories
Technical guides for tools
1. Industry Brainstorming
Start thinking about industries that interest you. What problems do you want to solve?
2. Review Previous Subjects
What skills from DATA4000-5000 can you leverage? Where are your strengths?
3. Read Assessment 1 Guidelines
Thoroughly review the requirements for the literature review
4. Browse Industry News
Read reports and articles for inspiration. What challenges are industries facing?
5. Prepare for Next Week
Come with 2-3 industry ideas to discuss
Strategies for choosing an industry
Narrowing your focus
Finding the sweet spot
Finding quality sources
Academic vs. industry literature
Organizing your research
From broad problem to specific question
Making it answerable
Testing feasibility
We'll work through another case study applying the 6-phase framework
Make it meaningful
Make it impactful
Make it yours