DATA6000 Week 3

Assessment 1 Bootcamp

From Industry Problem to Analytics Solution

Today's Goal: Leave this workshop with significant progress on your Assessment 1 Literature Review

Assessment 1 Due: Week 5 (2 weeks from today)

Today's Learning Outcomes

1. Progress Check

Assess where you are in your Assessment 1 journey and identify gaps

2. Analytics Challenges

Understand real-world data analytics challenges and how they affect YOUR project

3. Analytics Types & Tools

Match your business problem to appropriate analytics methods and tools

4. Active Development

Make tangible progress on your literature review with peer feedback

By the end of today: You should have your industry, 3 business problems, and at least 5 relevant sources identified.

Where Should You Be Right Now?

1

Week 1-2

Industry selection

3

Week 3 (NOW)

3 problems + Sources

4

Week 4

Draft + Visualisations

5

Week 5

Submit

Assessment 1 Requirements Checklist

  • Industry selected
  • 3 key business problems identified
  • Existing analysis reviewed for each problem
  • Data sources evaluated
  • ONE business question formulated
  • Methodology briefly outlined
  • Originality statement prepared
  • 3 visualisations from YOUR dataset
  • 10+ relevant references

Quick Self-Assessment

Answer honestly - this helps identify what you need to focus on today.

How many of the 3 business problems for your industry have you clearly defined?
A) None yet - still deciding on industry
B) 1 problem identified
C) 2 problems identified
D) All 3 problems clearly defined

If you selected A or B: Today's workshop is critical for you. Use the activities to make rapid progress.

The Reality of Analytics Projects

Your Plan

Find data โ†’ Clean it โ†’ Analyse โ†’ Done!

๐Ÿ“Š โ†’ ๐Ÿงน โ†’ ๐Ÿ“ˆ โ†’ โœ…

VS

Reality

Find data โ†’ Wrong format โ†’ Clean โ†’ Missing values โ†’ Re-clean โ†’ Tool issues โ†’ Finally analyse...

๐Ÿ“Š โ†’ โŒ โ†’ ๐Ÿงน โ†’ โ“ โ†’ ๐Ÿ”„ โ†’ ๐Ÿ› ๏ธ โ†’ ๐Ÿ“ˆ

Why this matters for Assessment 1: Understanding these challenges helps you write a realistic methodology section and choose appropriate data sources.

Where Data Scientists Spend Their Time

Time Allocation (Industry Reality)

Activity Time
Data cleaning & organizing 60%
Collecting data sets 19%
Mining data for patterns 9%
Refining algorithms 4%
Building training sets 3%
Other 5%

Source: CrowdFlower Data Science Survey

What This Means For Your Project

When evaluating data sources, consider:

  • Is the data already clean and structured?
  • What preprocessing will be required?
  • Are there missing values to handle?
  • Is the format compatible with your tools?

Assessment Tip: Address data quality in your "Data Sources" section!

Knowledge Check: Data Challenges

A retail dataset has customer ages recorded as "25", "twenty-five", "25 years", and "N/A". What type of data cleaning issue is this?
A) Missing data
B) Outliers
C) Structural errors / inconsistent formatting
D) Duplicate observations

Remember: When discussing data sources in your assessment, mention potential data quality issues and how they might be addressed.

Types of Business Analytics

Your methodology must clearly state which type(s) of analytics you will use.

Descriptive Analytics

Question: "What happened?"

Methods: Summary statistics, data aggregation, visualisation, dashboards

Example: Analysing last year's sales by region

Predictive Analytics

Question: "What could happen?"

Methods: Regression, classification, forecasting, machine learning

Example: Predicting customer churn probability

Prescriptive Analytics

Question: "What should we do?"

Methods: Optimisation, simulation, decision analysis

Example: Recommending optimal pricing strategy

Descriptive
โ†’
Predictive
โ†’
Prescriptive

Increasing complexity and business value

Knowledge Check: Analytics Types

A hospital wants to identify which patients are most likely to be readmitted within 30 days. Which type of analytics is this?
A) Descriptive - analyzing historical readmission rates
B) Predictive - forecasting future readmission likelihood
C) Prescriptive - recommending specific interventions
D) Diagnostic - understanding why readmissions occur
A retailer uses historical sales data to create monthly revenue dashboards. This is:
A) Descriptive analytics
B) Predictive analytics
C) Prescriptive analytics
D) Real-time analytics

Choosing the Right Analytics Tools

Your Assessment 1 must briefly outline the methodologies you will explore.

1. Define Questions
โ†’
2. Locate Data
โ†’
3. Assess Volume
โ†’
4. Select Tools

Step 1: Define Your Questions

  • What specific problem are you solving?
  • What decisions will the analysis inform?
  • How will you measure success?

Step 2: Locate Your Data

  • Is it in one source or multiple?
  • What format is it in?
  • Can you access it legally/ethically?

Step 3: Assess Data Volume

  • How much data do you have?
  • Will it grow over time?
  • What are the storage requirements?

Step 4: Select Tools

  • Match tool capabilities to your needs
  • Consider your skill level
  • Factor in time constraints

Common Analytics Tools Comparison

Tool Best For Skill Level Your Project?
Tableau / Power BI Visualisation, dashboards, descriptive analytics Beginner-Intermediate Visualisations required
Python (Pandas, Scikit-learn) Data manipulation, ML, predictive analytics Intermediate-Advanced If doing predictive
R Statistical analysis, visualisation Intermediate-Advanced Statistical focus
Excel Basic analysis, pivot tables Beginner Simple analyses
SQL Data extraction, database queries Beginner-Intermediate If data is in databases

Assessment 1 Requirement: You must upload your visualisation file (Tableau, Power BI) to the file Dropbox. Plan accordingly!

Knowledge Check: Tool Selection

You need to create 3 interactive visualisations for your assessment and have limited coding experience. Which tool combination is MOST appropriate?
A) Python with Matplotlib only
B) Tableau or Power BI
C) Raw SQL queries
D) R with ggplot2
For a predictive customer churn model with a large dataset (500,000+ records), the most suitable tool would be:
A) Excel pivot tables
B) Tableau calculated fields
C) Python with Scikit-learn
D) Power BI dashboards

Common Project Pitfalls to Avoid

Pitfall 1: Over-Engineering

Creating complex models when simple analysis would suffice.

Symptom: Spending weeks on deep learning when descriptive statistics answer the question.

Solution: Start simple. Add complexity only if needed.

Pitfall 2: Tool Fascination

Choosing tools because they're "cool" rather than appropriate.

Symptom: Using neural networks for a dataset of 100 records.

Solution: Match tools to problem complexity and data size.

Pitfall 3: Ignoring Communication

Creating brilliant analysis that no one understands.

Symptom: Technical jargon overwhelming stakeholders.

Solution: Focus on key findings and actionable insights.

Pitfall 4: Scope Creep

Trying to solve every problem at once.

Symptom: Assessment keeps growing beyond 1000 words.

Solution: Focus on ONE clear business question.

Activity 1: Map Your Business Problem

15 minutes
Individual Exercise: Problem-Analytics-Tool Mapping

Complete this mapping for YOUR Assessment 1 project:

Component Your Response
Industry:
Business Problem 1:
Analytics Type: โ˜ Descriptive โ˜ Predictive โ˜ Prescriptive
Business Problem 2:
Analytics Type: โ˜ Descriptive โ˜ Predictive โ˜ Prescriptive
Business Problem 3:
Analytics Type: โ˜ Descriptive โ˜ Predictive โ˜ Prescriptive
Primary Tool(s):

Evaluating Data Sources

Assessment 1 requires you to "evaluate the types of data sources available."

Data Source Evaluation Criteria

Accessibility

  • Is it publicly available?
  • Are there licensing restrictions?
  • What format is it in?

Quality

  • How complete is the data?
  • Is it accurate and reliable?
  • How recent is it?

Relevance

  • Does it address your problem?
  • Are the variables appropriate?
  • Is the sample size adequate?

Common Data Sources by Industry

Industry Example Data Sources
Healthcare AIHW, Medicare statistics, hospital discharge data, clinical trials
Retail Kaggle datasets, company reports, ABS retail trade data
Finance ASX data, Yahoo Finance, RBA statistics, company filings
Government data.gov.au, ABS, state government open data portals

Activity 2: Data Source Evaluation Matrix

20 minutes
Individual Exercise: Evaluate Your Data Sources

For each potential data source, complete this evaluation:

Criteria Data Source 1 Data Source 2
Name/URL:
Accessibility (1-5):
Data Quality (1-5):
Relevance (1-5):
Key Variables:
Limitations:
Will use for Assessment? โ˜ Yes โ˜ No โ˜ Maybe โ˜ Yes โ˜ No โ˜ Maybe

Finding Credible References

Assessment 1 requires at least 10 relevant, credible references.

High-Quality Sources

  • Peer-reviewed journal articles
  • Industry reports (Gartner, McKinsey, Deloitte)
  • Government publications
  • Conference proceedings
  • Books from reputable publishers

Avoid or Use Sparingly

  • Wikipedia (use as starting point only)
  • Personal blogs without credentials
  • Outdated sources (>5 years for tech)
  • Sources without clear authorship
  • Marketing materials disguised as research

Where to Search

Google Scholar
ProQuest
IEEE Xplore
JSTOR
KBS Library

Knowledge Check: Evaluating Sources

Which of the following would be considered the MOST credible source for discussing machine learning applications in healthcare?
A) A popular tech blog post from 2019
B) A Wikipedia article on AI in medicine
C) A 2023 peer-reviewed article in Nature Medicine
D) A company white paper promoting their AI product
Your reference list has 12 sources, but 8 of them are from the same author. This is problematic because:
A) You can only cite each author once
B) It suggests narrow research and potential bias
C) Self-citation is not allowed in academic work
D) The reference count is too low

Activity 3: Industry & Dataset Discovery

25 minutes
Individual Exercise: Find Your Industry & Data

Use the printed handout to complete this activity.

This structured exercise will help you:

  1. Explore industries that match your interests and career goals
  2. Discover publicly available datasets for each industry
  3. Evaluate which industry-dataset combination is most feasible
  4. Make an informed decision for your Assessment 1

Key Dataset Repositories to Explore:

General

  • Kaggle.com
  • data.gov.au
  • UCI ML Repository
  • Google Dataset Search

Australian Gov

  • ABS (abs.gov.au)
  • AIHW (health)
  • data.qld.gov.au
  • data.nsw.gov.au

Specialised

  • Yahoo Finance
  • World Bank Data
  • WHO Data
  • Statista

Important: Complete the handout thoroughly. By the end, you should have identified at least 2 potential industry-dataset combinations to explore further.

Formulating Your Unique Business Question

Assessment 1 requires you to generate a unique business question for ONE of your problems.

A good business question is:

Question Framework: Transform Problems into Questions

Business Problem Weak Question Strong Question
Customer churn "Why do customers leave?" "Which customer behaviors in the first 30 days predict churn within 6 months?"
Hospital readmissions "How can we reduce readmissions?" "What patient characteristics and discharge factors predict 30-day readmission for cardiac patients?"
Sales forecasting "What will sales be?" "How do seasonal patterns and promotional activities influence weekly sales volume by product category?"

Demonstrating Originality

You must explain why your analysis is original given existing research.

Ways to Be Original

  • New context: Apply existing methods to Australian/local data
  • New combination: Combine methods not previously used together
  • New timeframe: Analyse recent data where older studies exist
  • New variables: Include factors not previously considered
  • New industry application: Transfer successful approaches from other sectors

Example Originality Statements

"While Smith (2022) examined customer churn in US telecommunications, no study has applied these methods to the Australian market with its unique regulatory environment."


"Previous research focused on demographic factors; this analysis incorporates social media sentiment data not available in earlier studies."

Tip: Your literature review should naturally lead to identifying this gap. If you can't find a gap, your literature review may not be comprehensive enough.

Visualisation Requirements

Critical Requirement: You need at least THREE relevant visualisations from YOUR dataset. The Tableau/Power BI file must be uploaded to the file Dropbox.

Visualisation Planning

Visualisation 1

Purpose: Show data overview/distribution

Examples: Histogram, bar chart, pie chart

Demonstrates: You understand your data

Visualisation 2

Purpose: Show relationships/trends

Examples: Scatter plot, line chart, heatmap

Demonstrates: Insight into patterns

Visualisation 3

Purpose: Support your business question

Examples: Depends on your question

Demonstrates: Data can answer your question

By next week: Have your dataset downloaded and create at least one draft visualisation to bring to class.

Activity 4: Draft Your Business Question

15 minutes
Individual Exercise: Business Question Development

Using the framework from Slide 20, draft your unique business question:

Selected Business Problem:
Draft Question (v1):

Quality Check - Does your question:

Refined Question (v2):

Communicating Your Analysis

Your literature review must communicate complex ideas clearly.

"Stakeholders need the key findings and action items. Save the technical details for the appendix."

โ€” Andrew Seitz, Senior Data Analyst, Snowflake

Writing Tips for Assessment 1

Do

  • Use clear, concise language
  • Define technical terms when first used
  • Connect each section logically
  • Use visuals to support text
  • Cite sources appropriately

Don't

  • Assume reader knows jargon
  • Include irrelevant technical details
  • List sources without synthesis
  • Exceed word count (1000 ยฑ 10%)
  • Forget the visualisation upload

Preparing for Week 4

By next week, you should have:

Completed:

  • Industry and 3 business problems finalised
  • At least 8-10 references collected
  • Data source identified and downloaded
  • Business question drafted
  • At least 1 visualisation created

In Progress:

  • Literature review draft (500+ words)
  • Methodology section outline
  • Originality statement draft
  • Additional visualisations

Week 4 Workshop: Bring your draft literature review for peer feedback. We will workshop your visualisations and methodology sections.

Today's Key Takeaways

1. Know Your Analytics Type

Descriptive, Predictive, or Prescriptive - match to your business problem

2. Choose Tools Wisely

Match tools to your skills, data, and timeline. Tableau/Power BI required for visualisations.

3. Evaluate Data Sources

Consider accessibility, quality, and relevance before committing.

4. Start Writing NOW

2 weeks to submission. Draft early, revise often.

Questions? Use the remaining workshop time for facilitator consultation.

Resources & Support

Data Sources

  • Kaggle.com - datasets and competitions
  • data.gov.au - Australian government data
  • UCI Machine Learning Repository
  • Google Dataset Search
  • KBS Library databases

Learning Resources

  • Tableau Public tutorials
  • Power BI documentation
  • DataCamp (Python/R)
  • LinkedIn Learning (via KBS)
  • YouTube tutorials

Academic Support

  • KBS Academic Success Centre
  • Library research assistance
  • Facilitator consultation hours
  • Peer study groups

Assessment Support

  • Assessment brief on MyKBS
  • Marking rubric review
  • Turnitin similarity check
  • File Dropbox for visualisations
1 / 27