DATA4100 · Week 2
Statistical Methods for Summarising & Visualising Data
Turning retail data into decisions — hands-on with Power BI and Tableau using the Superstore dataset.
Kaplan Business School Australia
Today's Workshop
Learning Outcomes & Agenda
Four things you will be able to do by the end of this session
1
Ask the Right Questions
Frame business problems as analytical questions using the Superstore case as context.
2
Choose the Right Chart
Use the chart-selection framework to match your data and question to the correct visualisation type.
3
Build Charts in Power BI
Create bar, stacked, pie, treemap, line, and scatter chart visuals step by step from a CSV file.
4
Interpret & Communicate
Read chart outputs critically and explain what each visualisation means for the business.
Dataset
Superstore Sales — a fictional US retailer selling Furniture, Office Supplies, and Technology across four regions. Download from the Week 2 reading list on MyKBS → GitHub → Datasets → Superstore Dataset.csv
Context
The Superstore Business Case
What data do we have and what are we trying to find?

You are a Business Analyst at Superstore. Your mandate is to identify weaknesses and opportunities to enhance growth and profitability. Every chart you build should lead to a decision.

ORDER DETAILS
order_idorder_dateship_dateship_mode
CUSTOMER & PRODUCT
customersegmentmanufactoryproduct_name
CATEGORISATION
categorysubcategory
GEOGRAPHY
regioncitystatezip
FINANCIALS
salesprofitquantitydiscountprofit_margin
  • Open the CSV in Excel first — scan the rows and columns before jumping into visualisation tools.
  • How many rows (orders) are there? What date range does the data cover?
  • Are there any blank cells, unusual values, or obvious data quality issues?
  • Which fields are numbers? Which are categories? Which are dates?
Why This Matters
A Business Analyst who understands the data structure before visualising it asks better questions and catches errors early. Never skip this step.
Key Data Fact
The dataset spans January 2019 – December 2022 and contains 9,994 order rows across three product categories and four US regions.
Business Questions
Five Questions Worth Answering
Each question maps to a statistical method and a specific chart type
1
How do sales and order volumes vary across the four regions — Central, East, South, West?
Method: Descriptive Statistics → Bar Chart / Map
2
Which product subcategory performed best in terms of total sales?
Method: Descriptive Statistics → Bar Chart (sorted descending)
3
What are the top-performing products in Furniture, Office Supplies, and Technology?
Method: Descriptive Statistics → Treemap / Clustered Bar
4
Has sales performance improved over time — and what might the next 12 months look like?
Method: Time Series & Forecasting → Line Chart + Forecast
5
Does offering higher discounts actually increase profit — or does it destroy it?
Method: Correlation → Scatter Plot (Discount vs. Profit)
Activity
In groups, add three more questions of your own. For each one, decide: which statistical method applies, and which chart type would you use?
Overview
Six Statistical Methods, Six Visual Approaches
Every method has a natural chart type — and a set of fields that suit it
01
Descriptive Statistics
Bar, Column, Stacked Bar, Pie, Donut, Treemap, Table
sales · profit · quantity · profit_margin per segment / region / category
02
Time Series & Forecasting
Line Chart (continuous date axis), Forecast overlay
order_date + sales · profit · quantity
03
Correlation
Scatter Plot, Correlation Matrix, Heatmap
discount vs. profit · sales vs. quantity
04
Geographical Methods
Choropleth Map, Bubble Map, Filled Map
city · state · region + any financial metric
05
Regression
Scatter Plot with Trend Line (line of best fit)
discount → profit · quantity → sales
06
Distributions
Histogram, Density Plot, Box Plot
sales · profit · discount (shape of the data)
Key Insight
Choosing the wrong chart doesn't just look bad — it misleads the audience. The next slide gives you a framework to always choose correctly.
Framework
Which Chart Do I Need?
Start with your purpose — the chart type follows from what you want to show
What would you like to show? Comparison Relationship Distribution Composition Bar Chart Col. Chart Scatter Bubble Histogram Box Plot Pie / Donut Stacked Bar Comparison Relationship Distribution Composition Over Time? → Add a Line Chart (any of the above)
Nationalities at campus?
Pie or Bar Chart
→ Composition (parts of whole)
Attendance vs. grades?
Scatter Plot
→ Relationship between two numeric vars
Distribution of final grades?
Histogram
→ Distribution of one numeric var
Remember
This framework applies every time you build a new chart. Bookmark this slide.
Statistical Method 01
Descriptive Statistics: Mean, Median & Mode
Measures of location — where is the centre of the data?
Mean (Average)
Sum all values ÷ count of values. Sensitive to extreme outliers — a single huge sale can pull it up.
M
Median
Middle value in an ordered list. If n is even, average the two middle values. Robust to outliers.
Mo
Mode
The most frequently occurring value. Useful for categories (e.g. most common ship mode).
Data: 9 14 11 10 10 8 10 9 14 8 15 12 9 9 7 9
Ordered: 7 8 8 9 9 9 9 9 10 10 10 11 12 14 14 15

Mean = 164 ÷ 16 = 10.25 min
Median = (9 + 10) ÷ 2 = 9.5 min (n is even)
Mode = 9 (appears 5 times)
StatisticExcel FormulaPower BI Aggregation
Mean=AVERAGE(A1:A100)Average (default)
Median=MEDIAN(A1:A100)Median (from dropdown)
Mode=MODE(A1:A100)Not built-in — use DAX
Bridge to Power BI
We just calculated these by hand. Now we'll build a table visual in Power BI to verify — dragging segment to Rows and Average of profit_margin to Values. The table should show: Consumer 0.11 · Corporate 0.12 · Home Office 0.14.
Which should you use?
For skewed data (like sales, which has expensive outliers), median is more representative than mean. Power BI lets you switch aggregation with one click.
Power BI — Step 0
Loading the Superstore Dataset into Power BI
Do this once at the start of every Power BI session
1
Open Power BI Desktop. On the splash screen, click "Get data from another source →"
2
In the Get Data dialog, select Text/CSV from the list on the right. Click Connect.
3
Navigate to your Downloads folder, select Superstore Dataset.csv. The preview shows the first 200 rows. Confirm the delimiter is Comma and the file origin is UTF-8.
4
Click Load (not Transform Data). Your fields will now appear in the Data pane on the right side of the screen.
5
Click the Table icon (second icon on the left panel) to view the raw data and confirm it loaded correctly before building any charts.
Common Issue
If order_date shows as Text rather than Date, Power BI won't recognise it as a date field. You'll fix this in the Time Series section of this workshop.
Superstore Dataset.csv — Preview (first 200 rows) order_id order_date customer segment sales CA-2019-103800 3/01/2019 Darren P. Consumer 261.96 CA-2019-112326 4/01/2019 Phillina O. Home Office 731.94 CA-2019-141817 5/01/2019 Mick Brown Consumer 14.62 · · · 9,994 rows total · · · Load Transform
After Loading
All column names appear in the Data pane (right side). You are now ready to build visuals by dragging fields onto the canvas.
Power BI — Chart 01
Bar Chart: Sales by Subcategory
Comparison — which product subcategory generates the most revenue?
Business Question
"Which subcategory should Superstore prioritise for marketing investment?"
1
Click on a blank area of the canvas. In the Visualizations pane (right), select the Clustered Column Chart icon (first row, second icon).
2
From the Data pane, drag subcategory to the X-axis field well.
3
Drag sales to the Y-axis field well. Power BI will automatically use Sum of sales.
4
Click the Focus mode icon (square with arrow, top-right of the chart frame) for a full-screen view. Click Back to report to return.
Interpretation
Phones and Chairs lead in sales (~$330K each). Fasteners and Labels are at the bottom (<$15K). Sort the axis descending to make the ranking instantly readable: click the chart → the "..." menu → Sort descending.
Visualizations Config
Clustered Column Chart
X-axis
subcategory ×
Y-axis
Sum of sales ×
Legend
Add data fields here
Output: Sum of Sales by Subcategory
350K 250K 150K 50K Phones Chairs Storage Tables Binders Machines Bookcases Accessories Copiers Appliances Fasteners
Power BI — Charts 02 & 03
Clustered & Stacked Bar Charts
Comparing multiple measures, and showing part-to-whole breakdowns
"Does the category with most orders also generate the most profit?"
1
Select Clustered Column Chart
2
Drag category → X-axis
3
Drag quantity and profit → Y-axis (both)
4
Two differently-coloured bars now appear per category. Click Focus mode.
Read the chart
Technology has high profit despite fewer orders than Office Supplies — higher-value items. Furniture has many orders but low profit margins.
"What's the product mix breakdown within each category?"
1
Click the Stacked Bar Chart icon (horizontal bars, first column)
2
Drag category → Y-axis
3
Drag Count of sales → X-axis
4
Drag subcategory → Legend. Each colour = one subcategory.
Common Mistake
Don't add too many legend items. If you have more than 8 subcategories in one stack, the colours become impossible to distinguish. Use a filter or aggregate smaller categories into "Other".
Office Supplies Furniture Technology 0K 1K 2K 3K 4K 5K 6K 7K
Power BI — Chart 04
Treemap: Top Customers by Profit
Who are the highest-value customers? Use filters to focus on the top tier.
Business Question
"Which customers should Superstore focus retention efforts on?"
1
In the Visualizations pane, select the Treemap icon (looks like nested squares).
2
Drag customerCategory field well.
3
Drag profitValues field well. Each customer gets a rectangle — bigger = more profit.
4
Add a filter: In the Filters pane, find Sum of profit. Change the filter type to "is greater than", type 1000, and click Apply filter. This removes low-value customers from the view.
Interpretation
Tamara Chand is the largest rectangle — the single highest-profit customer. The filter at >$1,000 reveals roughly 20–30 customers worth targeting for loyalty programs.
Visualizations Config
Treemap
Category
customer ×
Details
Add data fields here
Values
Sum of profit ×
FILTER — Filters on this visual
Sum of profit
is greater than → 1000
Click "Apply filter"
Output: Top Customers by Profit (filtered)
Tamara Chand Hunter L. Raymond B. Sanjit C. Adrian B. Tom A. Daniel R. Greg T. Keith D.
Power BI — Charts 05 & 06
Tables & Matrices
Precise numerical summaries — great for verifying calculations and cross-tabulating categories
"Which customer segment is most profitable per sale?"
1
Select the Table icon in the Visualizations pane.
2
Drag segmentColumns (or the single Rows field)
3
Drag profit_marginValues. Click the dropdown arrow next to it and choose Average.
SegmentAvg Profit Margin
Consumer0.11
Corporate0.12
Home Office0.14 ← highest
Total0.12
"Which specific subcategory has the highest average transaction value?"
1
Select the Matrix icon (looks like a grid with +/- expand arrows).
2
Drag categoryRows
3
Drag subcategoryColumns
4
Drag salesValues, set to Average. Use the horizontal slider at the bottom of the matrix to scroll and see all columns.
CategoryChairsCopiersPhones
Furniture503.86
Office Supplies
Technology2,198.94
Blank Cells
Blank cells in a matrix don't mean zero — they mean that subcategory doesn't belong to that category. This is correct behaviour.
Power BI — Charts 07 & 08
Pie Chart & Donut Chart
Parts-to-whole — use only when you have fewer than 6 categories and the total is meaningful
"Is there a dominant region in terms of order volume?"
1
Select the Pie Chart icon.
2
Drag regionLegend
3
Drag salesValues, set to Count. Power BI calculates percentages automatically.
East 32% West 28.5% South 23% Central 16% ■ East ■ West ■ South ■ Central
"Which customer segment generates the most total profit?"
1
Select the Donut Chart icon (next to pie chart).
2
Drag segmentLegend
3
Drag profitValues, set to Sum. A donut works the same as a pie but leaves room for a KPI label in the centre.
$286K total profit ■ Consumer 46.8% ■ Corporate 32.1% ■ Home 21%
Common Mistake
Don't use pie/donut charts when slices have similar sizes — humans are bad at comparing angles. Use a bar chart instead when differences are subtle.
Statistical Method 02
Time Series Analysis
Tracking how a numerical measure changes at regular time intervals

A time series is a set of observations of a quantitative variable taken at equally-spaced points in time — daily, monthly, quarterly, yearly.

  • Trend — Is the overall direction going up, down, or flat over the period?
  • Seasonality — Does a pattern repeat at regular intervals (e.g., sales spike every November/December)?
  • Outliers — Are there sudden spikes or drops that need explaining?
  • Structural breaks — Did something change the baseline? (e.g. COVID-19 in March 2020)
Superstore Context
"The dataset runs January 2019 – December 2022. Our line chart will show whether median sales are growing, and whether there are seasonal patterns worth forecasting."
Median Monthly Sales — Jan 2019 to Jul 2022
80 65 50 35 Peak: ~80 Jan 19 Jul 19 Jan 21 Jan 22 Jul 22 order_date (monthly bins) Median Sales ($) Noticeable spikes: Q3–Q4 each year (seasonal)
Power BI — Time Series Setup
Preparing Date Data for Time Series Charts
order_date loads as Text by default — you must convert it and create monthly bins before plotting
1
Click the Table view icon on the left sidebar (looks like a grid).
2
Click on the order_date column header to select it.
3
In the Column tools tab at the top, find the Data type dropdown (currently says "Text"). Change it to Date.
4
The column will now show dates formatted as Wednesday, 2 January 2019 etc. Power BI can now use this as a time axis.
5
With order_date still selected, go to Column tools → Groups icon (looks like a binned histogram icon).
6
In the Groups dialog: Name it order_date (bins), Group type = Bin, Bin size = 1 Month. Click OK.
7
A new column order_date (bins) appears in your data. Each row now shows January 2019, February 2019 etc. Use this for the line chart X axis.
order_date (original)order_date (bins)
Wednesday, 2 January 2019January 2019
Thursday, 3 January 2019January 2019
Friday, 4 January 2019January 2019
Sunday, 6 January 2019January 2019
Monday, 11 February 2019February 2019
Tuesday, 12 February 2019February 2019
Why Monthly Bins?
Plotting every individual day produces a noisy, cluttered line. Grouping into months reveals the trend and seasonality clearly — the signal without the noise.
Groups Dialog Settings
Name: order_date (bins)
Field: order_date
Group type: Bin
Bin type: Size of bins
Bin size: 1Months
Power BI — Chart 09
Line Chart: Median Sales Over Time
Using the monthly bins column to visualise the time series
Business Question
"Is Superstore's sales performance improving over time? Are there seasonal patterns?"
1
Back in Report view, click a blank canvas area. Select the Line Chart icon in Visualizations.
2
Drag order_date (bins) (the new column you just created) → X-axis.
3
Drag salesY-axis. Click the dropdown arrow next to it and change the aggregation from Sum to Median.
4
Click Focus mode to view full-screen. The X axis should show Jan 2019 → Jul 2022.
Read the chart
Notice recurring spikes in Q3–Q4 each year — the holiday season drives higher-value orders. The overall median is relatively flat (~$50–60), suggesting growth is in volume not ticket size. Hover over any point to see the exact month and value.
Visualizations Config
Line Chart
X-axis
order_date (bins) ×
Y-axis
Median of sales ×
Secondary y-axis
Add data fields here
Legend
Add data fields here
Output: Median Sales over Time
80 65 50 35 Jan 2019 Jan 2021 Jul 2022 Q4 spike
Statistical Method 03
Correlation: Does Discounting Hurt Profit?
Measuring the strength and direction of the linear relationship between two variables
ValueMeaningChart Look
r = +1.0Perfect positive relationshipAll points on a rising line
r = +0.5 to +0.9Moderate to strong positiveCloud of points rising left→right
r = 0No linear relationshipRandom scatter, no slope
r = −0.5 to −0.9Moderate to strong negativeCloud of points falling left→right
r = −1.0Perfect negative relationshipAll points on a falling line
  • Measures strength, not slope — high correlation doesn't tell you how much profit changes per unit of discount.
  • Linear only — a curved (nonlinear) relationship can look like r ≈ 0.
  • Two variables only — can't capture how three or more variables interact.
  • Correlation ≠ Causation — discounts don't cause low profit; other factors matter too.
Discount vs. Profit — Scatter Plot
5K 0 -3K -5K 0.0 0.2 0.4 0.6 $0 Discount Profit ($)
Key Finding
Clear negative correlation: higher discounts = lower (or negative) profit. Orders with ≥50% discount regularly lose money. Superstore's discount strategy needs review.
Power BI — Chart 10
Scatter Plot: Discount vs. Profit
Visualising the correlation between two continuous variables
Business Question
"Does increasing the discount rate cause Superstore to lose money?"
1
Select the Scatter Chart icon in the Visualizations pane.
2
Drag discountX-axis
3
Drag profitY-axis
4
Important: Click the dropdown arrow next to each field and choose "Don't summarize". This shows each individual order as a dot — not just the totals.
5
Optional: Drag categoryLegend to colour-code by product category. Drag salesSize to make larger orders appear as bigger dots.
Why "Don't Summarize"?
If you leave the default "Sum", Power BI collapses all orders into one dot — you'd see nothing useful. "Don't Summarize" keeps every row as its own data point.
Visualizations Config
Scatter Chart
X-axis
discount — Don't summarize ×
Y-axis
profit — Don't summarize ×
Legend (optional)
category ×
Size (optional)
sales ×
What to Look For
The cluster of red dots in the bottom-right confirms it: high-discount orders consistently generate negative profit. This is an actionable finding — recommend a maximum discount threshold.
Going Further
In Tableau, you can add a Trend Line via Analytics pane → Trend Line → Linear. This adds a line of best fit and shows the r² value.
Tableau
Time Series & Forecasting in Tableau
Tableau's built-in exponential smoothing forecast extends your line chart into the future
1
Open the Superstore sample workbook (Start screen → Sample Workbooks → Superstore), or connect to the CSV via the left pane.
2
Drag Order DateColumns. Click the field pill and change to Continuous → Month (scroll down in the menu for the second "Month" option).
3
Drag Segment and SUM(Sales)Rows.
4
Drag Segment → the Colour box under Marks. You now have three separate coloured time-series lines (Consumer, Corporate, Home Office).
5
Click the Analytics tab on the left panel (next to the Data tab).
6
Drag Forecast from the Model section and drop it onto the chart. A shaded extension appears beyond the last data point.
7
Right-click the forecast area → Forecast → Describe Forecast to see the model details (seasonal periods, trend, confidence intervals).
Forecast Chart — Sales by Segment
60K 40K 20K Actual Forecast Consumer Corporate 2019 2020 2021 2022 2023
What the Shaded Area Means
The shaded confidence band = the 95% range of likely values. A wider band = more uncertainty. If the band is very wide, the forecast is unreliable — don't over-commit to it.
Reference Card
Common Mistakes — One per Chart Type
These are the errors that cost marks in assessments and credibility in professional reports
Chart TypeCommon MistakeWhy It MattersFix
Bar / Column Y-axis doesn't start at zero Small differences appear huge — visually misleading Always start bar charts at zero. Use a line chart if you need to zoom in.
Pie / Donut More than 6 slices Tiny slices are unreadable; angles are hard to compare Aggregate small categories into "Other" or switch to a bar chart.
Stacked Bar Too many legend colours Viewers can't distinguish 12+ colours reliably Limit to 6–8 segments or group smaller ones.
Line Chart Connecting non-continuous data Suggests a trend where there may be none Use a line chart only for time series or naturally ordered data.
Scatter Plot Forgetting "Don't Summarize" You get one dot instead of thousands — no pattern visible Always change aggregation to "Don't summarize" for both axes.
Treemap Showing all records unfiltered Hundreds of tiny equal boxes convey nothing Apply a filter (e.g., profit > $1,000) to focus on the meaningful tier.
Forecast Over-trusting a wide confidence interval Wide band = high uncertainty; decisions based on it can fail Report the confidence interval alongside the forecast line. Note limitations.
Summary
The Complete Picture: Question → Method → Chart → Tool
Use this as a reference card for your dashboard assessment
Business QuestionStatistical MethodChart TypeTool
"Which subcategory has the highest sales?" Descriptive Statistics Bar Chart (sorted) Power BI
"How do quantity and profit compare by category?" Descriptive Statistics Clustered Column Chart Power BI
"What is the product mix within each category?" Descriptive Statistics Stacked Bar Chart Power BI
"Who are the top-profit customers?" Descriptive Statistics Treemap + Filter Power BI
"Which segment has the highest profit margin?" Descriptive Statistics Table / Matrix Power BI
"What share of orders comes from each region?" Descriptive Statistics Pie / Donut Chart Power BI
"Is sales performance growing? Any seasonality?" Time Series Line Chart (monthly bins) Power BI / Tableau
"What will sales look like in the next 12 months?" Forecasting Line Chart + Forecast Band Tableau
"Does discounting destroy profit?" Correlation Scatter Plot Power BI / Tableau
"Which states/cities are high-growth opportunities?" Geographical Filled Map / Choropleth Power BI / Tableau
Dashboard Activity
Pick three questions from this table. Build charts that answer them in Power BI. Give your dashboard a title that tells a story (e.g. "Superstore Profitability — What's Working and What Isn't"). Add slicers for Region and Year so your audience can explore.