Chapter 19
Data Analysis Overview
Basic Business Statistics
11
thEdition
Learning Objectives
In this chapter, you learn:
The steps involved in choosing what statistical methods to use to conduct a data analysis
Good Data Analysis Requires
Choosing The Proper Technique(s)
Choosing the proper technique(s) to use requires the consideration of:
The purpose of the analysis
The type of variable being analyzed
Numerical
Categorical
The assumptions about the variable you are willing to make
Questions To Ask When Analyzing Numerical Variables
Do you seek to:
Describe the characteristics of the variable (possibly broken into several groups)
Draw conclusions about the mean and standard deviation of the variable in a population
Determine whether the mean and standard deviation of the variable differs depending on the group
Determine which factors affect the value of the variable
Predict the value of the variable based on the value of other variables
How to Describe the Characteristics of a Numerical Variable
Develop tables and charts and compute
descriptive statistics to describe the variable’s characteristics:
Tables and charts
Stem-and-leaf display, percentage distribution, histogram, polygon, boxplot, normal probability plot
Statistics
Mean, median, mode, quartiles, range, interquartile range, standard deviation, variance, and coefficient of variation
How to draw conclusions about the population mean or standard deviation
Confidence interval for the mean based on the t-distribution
Hypothesis test for the mean (t-test)
Hypothesis test for the variance ( test)
How to determine whether the mean or standard deviation differs by group
Two independent groups studying central tendency
Normally distributed numerical variables
Pooled t-test if you can assume variances are equal
Separate-variance t-test if you cannot assume variances are equal
Both tests assume the variables are normally distributed and you can examine this assumption by developing boxplots and normal probability plots
To decide if the variances are equal you can conduct an F-test for the differences between two variances
Numerical variables not normally distributed
Wilcoxon rank sum test
How to determine whether the mean or standard deviation differs by group
Two groups of matched items or repeated measures studying central tendency
Paired differences normally distributed
Paired t-test
Paired differences not normally distributed
Wilcoxon signed ranks test
Two independent groups studying variability
Numerical variables normally distributed
F-test
continued continued
How to determine whether the mean or standard deviation differs by group
Three or more independent groups and studying central tendency
Numerical variables normally distributed
One Way Analysis of Variance
Three or more groups of matched or repeated measurements
Numerical variables normally distributed
Randomized block design
Numerical variables not normally distributed
Friedman test
continued continued
How to determine which factors affect the value of the variable
Two factors to be examined
Two-factor factorial design
How to predict the value of a variable based on the value of other variables
One independent variable
Simple linear regression model
Two or more independent variables
Multiple regression model
Data taken over a period of time and you want to forecast future time periods
Moving averages
Exponential smoothing
Least-squares forecasting
Autoregressive modeling
How to determine whether the values of a variable are stable over time
Studying a process and have collected data over time
Develop R and chartsX
Questions To Ask When Analyzing Categorical Variables
Do you seek to:
Describe the proportion of items of interest in each category (possibly broken into several groups)
Draw conclusions about the proportion of items of interest in a population
Determine whether the proportion of items of interest differs depending on the group
Predict the proportion of items of interest based on the value of other variables
Determine whether the proportion of items of interest is stable over time
How to describe the proportion of items of interest in each category
Summary tables
Charts
Bar chart
Pie chart
Pareto chart
Side-by-side bar charts
How to draw conclusions about the proportion of items of interest
Confidence interval for proportion of items of interest
Hypothesis test for the proportion of items of interest (Z-test)
How to determine whether the proportion of items of interest differs depending on the group
Categorical variable has two categories
Two independent groups
Two proportion Z-test
for the difference between two proportions
Two groups of matched or repeated measurements
McNemar test
More than two independent groups
for the difference among several proportions
More than two categories and more than two groups
test
test
How to predict the proportion of items of interest based on the value of other variables
Logistic regression
How to determine whether the proportion of items of interest is stable over time
Studying a process and data is taken over time
Collected items of interest over time
p-chart
Data Analysis Tree
Numerical & Categorical Variables
Numerical Variables Numerical Variables
Categorical Variables Categorical Variables
Possible Questions
How to describe the characteristics of the variable (possibly broken into several groups)?
How to draw conclusions about the mean and standard deviation of the variable in the population?
How to determine whether the mean and standard deviation of the variable differs depending on the group?
How to determine which factors affect the value of the variable?
How to predict the value of the variable based on the value of other variables?
How to determine whether the values of the variable are stable over time?
How to describe the proportion of items of interest in each category (possibly broken into several groups)?
How to draw conclusions about the proportion of items of interest in a population?
How to determine whether the proportion of items of interest differs depending on the group?
How to predict the proportion of items of interest based on the value of other variables?
How to determine whether the proportion of items of interest is stable over time?
Possible Questions
How to describe the characteristics of the variable (possibly broken into several groups)?
How to draw conclusions about the mean and standard deviation of the variable in the population?
How to determine whether the mean and standard deviation of the variable differs depending on the group?
How to determine which factors affect the value of the variable?
How to predict the value of the variable based on the value of other variables?
How to determine whether the values of the variable are stable over time?
How to describe the proportion of items of interest in each category (possibly broken into several groups)?
How to draw conclusions about the proportion of items of interest in a population?
How to determine whether the proportion of items of interest differs depending on the group?
How to predict the proportion of items of interest based on the value of other variables?
How to determine whether the proportion of items of interest is stable over time?
Data Analysis Tree Numerical Variables
How to describe the characteristics of the variable (possibly broken into several groups)?
How to draw conclusions about the mean and standard deviation of the variable in the population?
How to determine whether the mean and standard deviation of the variable How to describe the characteristics of the variable (possibly broken into several groups)?
How to draw conclusions about the mean and standard deviation of the variable in the population?
How to determine whether the mean and standard deviation of the variable
continued continued
Create Tables &
Charts
Calculate Statistics
Mean
Variance / Standard Deviation
Mean Variance Create Tables &
Charts
Calculate Statistics
Mean
Variance / Standard Deviation
Mean Variance
Stem-and-leaf display, percentage distribution, histogram, polygon, boxplot, normal probability plot
Mean, median, mode, quartiles, range,
interquartile range, standard deviation, variance, coefficient of variation
Confidence interval for mean (t or z) Hypothesis test for mean (t or z)
Hypothesis test for variance
Pooled t test (both variables must be normal, variances equal)
Separate variance t test (both variables must be normal)
Wilcoxon rank sum test (variables do not have to be normal)
F-test (both variables must be normal)
Stem-and-leaf display, percentage distribution, histogram, polygon, boxplot, normal probability plot
Mean, median, mode, quartiles, range,
interquartile range, standard deviation, variance, coefficient of variation
Confidence interval for mean (t or z) Hypothesis test for mean (t or z)
Hypothesis test for variance
Pooled t test (both variables must be normal, variances equal)
Separate variance t test (both variables must be normal)
Wilcoxon rank sum test (variables do not have to be normal)
F-test (both variables must be normal)
test)
2 independent groups
2 matched
Data Analysis Tree Numerical Variables
continued continued
How to determine which factors affect the value of the variable?
How to predict the value of the variable based on the value of other variables?
How to determine whether the values of the variable are stable over time?
How to determine which factors affect the value of the variable?
How to predict the value of the variable based on the value of other variables?
How to determine whether the values of the variable are stable over time?
Two factors to be examined
One independent variable
Two or more
Independent variables Data taken over time to forecast the future
Studied a process and taken data over time Two factors
to be examined
One independent variable
Two or more
Independent variables Data taken over time to forecast the future
Studied a process and taken data over time
Two factor factorial design
Simple linear regression
Multiple regression model Moving averages
Exponential smoothing Least squares forecasting Autoregressive modeling
Develop and R charts Two factor factorial design
Simple linear regression
Multiple regression model Moving averages
Exponential smoothing Least squares forecasting Autoregressive modeling
Develop and R chartsX
Data Analysis Tree Categorical Variables
χ2
continued continued
How to describe the proportion of items of interest in each category (possibly broken into several groups)
How to draw conclusions about the proportion of items of interest in a population
How to determine
whether the proportion of How to describe the proportion of items of interest in each category (possibly broken into several groups)
How to draw conclusions about the proportion of items of interest in a population
How to determine
whether the proportion of
Summary tables Bar charts Pie charts Pareto charts Side-by-side charts
Confidence interval for the proportion of items of interest Hypothesis test for the proportion of items of interest
Two proportion Z test
test for the difference between two proportions McNemar test
test for the difference among several proportions Summary tables
Bar charts Pie charts Pareto charts Side-by-side charts
Confidence interval for the proportion of items of interest Hypothesis test for the proportion of items of interest
Two proportion Z test
test for the difference between two proportions McNemar test
test for the difference among several proportions Two categories & two
independent groups Two categories & two matched groups
Two categories & more than two independent Two categories & two independent groups Two categories & two matched groups
Two categories & more
than two independent χ2
How to predict the proportion of items of interest based on the value of other variables
How to determine
whether the proportion of items of interest is stable over time
How to predict the proportion of items of interest based on the value of other variables
How to determine
whether the proportion of items of interest is stable over time
Data Analysis Tree Categorical Variables
continued continued
Logistic Regression
p-chart
Logistic Regression
p-chart Studying a process
and collected items of interest over time Studying a process and collected items of interest over time
Chapter Summary
Discussed how to choose the appropriate technique(s) for data analysis for both
numerical and categorical variables
Discussed potential questions and the associated appropriate techniques for numerical variables
Discussed potential questions and the associated appropriate techniques for categorical variables