1
Statistics, Sample Test (Exam Review) Solution
Module 1: Chapters 1, 2 & 3 Review
Chapter 1: Introduction to Statistics
Chapter 2: Exploring Data with Tables and Graphs
Chapter 3: Describing, Exploring, and Comparing Data
Chapter 1: Introduction to Statistics
1. True or False: The value of variance and standard deviation is never negative.
True – these are absolute quantities that is a measure of variation of all values from the
mean (it can be zero)
( )
( )
( )
( )
2
2
2
2
2
2
Population Variance:
Population Standard Deviation:
Sample Variance:
1
Sample Standard Deviation:
1
x
N
x
N
x x
s
n
x x
s
n




−
=
−
=
−
=
−
−
=
−




2. What kind of variable “weights of bears” is? Quantitative or Qualitative
Quantitative – variable “weights of bears” gives numbers that represent counts or
measurements
3. What kind of variable “gender of bears” is? Quantitative or Qualitative
Qualitative – “gender of bears” is distinguished by nonnumeric characteristics
4. Define a population in statistics.
Population is the complete collection of all elements (scores, people, measurement, etc)
to be studied
5. The value of the middle term in a ranked data set is called
the median
2
6. Given any data, how do you find the mode?
Mode is the value that appears with the greatest frequency among the data. A data set can
have one, more than one, or no mode (when all numbers appear with equal frequency).
7. True or False: The “number of chairs” is considered to be a continuous variable.
False – The number of chairs is not continuous. We cannot have ¼ amounts of chairs.
Discrete: Data result when the number of possible values is either a finite number or a
countable number of possible values: 0, 1, 2, 3, . . .
Examples: Number of students in a class, Number of cars in a parking lot.
Continuous: Data that can take any value in an interval. Data result from infinitely many
possible values that correspond to some continuous scale that covers a range of values
without gaps, interruptions, or jumps. (Interval)
Examples: The Weight, or height of a person.
8. What is a Pareto chart? What does each axis represent?
A Pareto Chart is a bar graph, for categorical (qualitative) data (similar to Histogram
for quantitative data). The vertical scale represents frequencies or relative frequencies,
and horizontal scale represents different categories. Bars are arranged in descending order
to emphasize the order of impact.
9. Define a parameter and a statistic.
parameter: a numerical measurement describing some characteristic of a population
statistic: a numerical measurement describing some characteristic of a sample
10. Define random sample and simple random sample.
random sample: members of the population are selected in such a way that each
individual member has an equal chance of being selected
simple random sample (of size n): subjects selected in such a way that every possible
sample of the same size n has the same chance of being chosen
11. Define the following types of sampling: systematic, convenience, stratified, cluster
systematic sampling: select some starting point, and then select every Kth
element in
population
convenience sampling: use results that are easy to get
stratified sampling: subdivide the population into at least two different subgroups that
share the same characteristics, then draw a sample from each subgroup (stratum)
cluster sampling: divide the population into sections (or clusters that are similar to one
3
another ), randomly select some of those clusters, choose all members from selected
clusters
12. What are different levels of measurement of data? Give examples.
nominal level of measurement: qualitative data
ex) gender of subjects
ordinal level of measurement: categories with some order (differences between data
values either cannot be determined or is meaningless but there is an order)
ex) course grades A, B, C, D, F
interval level of measurement: differences between data values are meaningful, but there
is no natural starting point (the value 0 does not mean lack of)
ex) years such as 1000, 2000, 1492, 1776
ratio level of measurement: interval level modified to include natural zero starting point
ex: price of college textbooks ($0 means no cost)
LEVELS OF
Measurement
Examples
RATIO
➢ Distances (in km) travelled by cars (0 km represents no
distance travelled, and 400 km is twice as far as 200
km.)
➢ Prices of college textbooks ($0 does represent no cost,
and a $100 book does cost twice as much as a $50
book.)
INTERVAL
➢ Body temperatures of 98.20
F and 98.60
F
➢ The years 1769 and 1845
ORDINAL
➢ Ranks of colleges in U.S. News and World Report
(Ranks can be first, second, third, and so on, which
determines an ordering)
➢ A school teacher assigns grades of A, B, C, D, or F
(These grades can be arranged in order, but we can’t
determine difference between the grades.)
NOMINAL
➢ Eye colors (blue, brown, black, other)
➢ Political party (Democrat, republican, Independent,
other)
13. What’s the difference between an observational study and an experiment? Give
examples.
observational study: observing and measuring specific characteristics without
attempting to modify the subjects being studied
ex) Charles Darwin’s observation of Darwinian finches at the Galapagos Islands
experiment: apply some treatment and then observe its effects on the subjects
ex) giving some type of medicine and see whether it cures certain type of disease among
subjects
14. Describe Cross Sectional, Retrospective, and Prospective Studies. Give examples.
4
Cross Sectional Study: Data are observed (an Observational Study ), measured, and
collected at one point in time. (A cross-sectional study is like a snapshot of a
particular group of people at a given point in time; it is used to describe what is
happening at that time.)
Example: A medical study examining the frequency of cancer among a population of
different geographical locations. By doing this, any differences among them can most
likely be attributed to geographical locations differences rather than something that
happened over time.
Retrospective (or Case Control) Study: Data are collected from the past by going
back in time (data that already exist).
Example: Researcher ask participants about their smoking habits over the past 20 years.
Then, they can analyze any possible correlations between their smoking habits and
diseases such as lung cancer.
Prospective (or Longitudinal or Cohort) Study: Data are collected in the future from
groups (called cohorts) sharing common factors. (Longitudinal studies look at a group of
people over an extended period.)
Example: A medical study follows a cohort of middle-aged people who vary in terms of
smoking habits, to test the hypothesis that the 20-year incidence rate of lung cancer will
be highest among heavy smokers, followed by moderate smokers, and then nonsmokers.
15. What is Sampling Error, Non-sampling Error, and Nonrandom Sampling Error?
Sampling Error: Sampling error is the difference between a sample result and the true
population result that is the consequence of chance sample variations
Non-sampling Error:
The non-sampling error occurs due to data that are incorrectly collected, recorded, or
analyzed. It may happen by selecting a biased sample, using a defective instrument, or
copying the data incorrectly.
Nonrandom Sampling Error:
Nonrandom Sampling Error is the result of using a sampling method that is not
random, such as using a convenience sample or a voluntary response sample.
Voluntary response sample: (or self-selected survey)
5
One in which the respondents themselves decide whether to be included. In this case,
valid conclusions can be made only about the specific group of people who agree to
participate.
16. What are some characteristics of an Experiment?
Confounding: Occurs in an experiment when the experimenter is not able to distinguish
between the effects of different factors.
Blinding: Subject does not know he or she is receiving a treatment or placebo.
Blocks: Groups of subjects with similar characteristics.
Completely Randomized Experimental Design: Subjects are put into blocks through a
process of random selection.
Replication: Repetition of an experiment when there are enough subjects to recognize
the differences in different treatments.
Sample Size: Sample size must be large enough to display the true nature of the
population data and should be obtained using an appropriate random method.
17. Explain some Misuses of Statistics.
Bad Samples, Small Samples, Misleading Graphs, Distorted Percentages, Loaded
Questions, Order of Questions, Refusals, Correlation & Causality, Self Interest Study,
Precise Numbers, Partial Pictures,
Pictographs (Double the length, width, and height of a cube, and the volume increases
by a factor of eight). To correctly interpret a graph, we should analyze the numerical
information given in the graph instead of being misled by its general shape.
Deliberate Distortions Loaded question:
95% yes: Should the Governor have the line-item veto to eliminate waste?
53% yes: “Should the Governor have the line-item veto, or not?
If sample data are not collected in an appropriate way, the data may be completely
useless that no amount of statistical training can salvage them. Randomness typically
plays a critical role in determining which data to collect.
6
Statistics, Sample Test (Exam Review) Solution
Module 1: Chapters 1, 2 & 3 Review
Chapter 2: Exploring Data with Tables and Graphs
1. Given the frequency table, answer the following questions.
Age group Frequency
11-20 5
21-30 6
31-40 9
41-50 11
51-60 4
a. The number of classes in the table is 5 [number of statistical age groups defined]
b. The class width is 10
(upper limit – lower limit + 1 unit or difference of two consecutive lower limits or upper
limits i.e. 21-11)
c. The midpoint of the 4th
class is 45.5
(41+50)/2 = 45.5
d. The Lower Boundary of the 5th
class is 50.5
(50+51)/2 = 50.5 (think of it as a midpoint between the upper limit of 4th
class and the
lower limit of 5th
class)
e. The Upper Limit of the 1st
class is 20
1st
class is 11-20  upper limit
f. The sample size is 35
5+6+9+11+4 = 35
g. The relative frequency of the 1st
class is
relative frequency: f/n
relative frequency of the 1st
class = f/n = 5/35 = 1/7 ≈ 0.1429 (or 14.29 %)
Age group Frequency Midpoint
=(LL+UL)/2
LB - UB RF= f / n
1) 11-20 5 (11+20) / 2 = 15.5 10.5-20.5 5/35 = 1/7
2) 21-30 6 25.5 20.5-30.5 6/35
3) 31-40 9 35.5 30.5-40.5 9/35
4) 41-50 11 45.5 40.5-50.5 11/35
5) 51-60 4 55.5 50.5-60.5 4/35
35
n f
= =

h. Find the modal class, and the mode.
7
The modal class: # 4. 41-50 with largest frequency of 11.
The mode = Midpoint of that class = 45.5
2. The following frequency table describes the speeds of drivers ticketed through a 30 mph
speed zone.
Speed Frequency (number of drivers)
42-45 25
46-49 14
50-53 7
54-57 3
58-61 1
a. Calculate the relative frequencies for all classes.
n = 50
first class: f/n = 25/50 = 0.5 (or 50%)
second class: 14/50 = 0.28 (or 28%)
third class: 7/50 = 0.14 (or 14%)
fourth class: 3/50 = 0.06 (or 6%)
fifth class: 1/50 = 0.02 (or 2%)
∑rf = 1 (or 100%)
b. What percentage represents the speed of 53 mph or less?
cumulative frequency distribution of 53 mph or less refers to first three classes
cumulative frequency = 0.5 + 0.28 + 0.14 = 0.92
Or: (25 +14 +7) / 50 = 46 / 50 = 92%
92% represents the speed of 53 mph or less
c. What are the class boundaries?
class boundaries are midpoints between corresponding upper and lower limit
for the outer bound, same amount is either subtracted or added
class boundaries: 41.5-45.5, 45.5-49.5, 49.5-53.5, 53.5-57.5, 57.5-61.5
Speed Frequency (number of
drivers)
Q a. RF = f / n Q c: Boundaries
1 42-45 25 25/50 =1/2 41.5-45.5
2 46-49 14 14/50=7/25 45.5-49.5
3 50-53 7 7/50 49.5-53.5
4 54-57 3 3/50 53.5-57.5
5 58-61 1 1/50 57.5-61.5
50
n f
= =

8
d. Construct a histogram corresponding to the frequency distribution table.
30 --
25 --
20 --
15 –
10 --
5 –
0 – | | | | |
41.5 45.5 49.5 53.5 57.5 61.5
3. The following frequency table describes the speeds of drivers ticketed through a 30 mph
speed zone.
Speed Frequency (number of drivers)
42-45 25
46-49 14
50-53 7
54-57 3
58-61 1
a. Prepare the cumulative frequency distribution. (See below)
b. Prepare the cumulative relative frequency distribution.
Cumulative speed Cumulative frequency Cumulative relative frequency
42-45 25 25/50 = 0.5 (or 50%)
42-49 25+14 = 39 39/50 = 0.78 (or 78%)
42-53 25+14+7 = 46 46/50 = 0.92 (or 92%)
42-57 25+14+7+3 = 49 49/50 = 0.98 (or 98%)
42-61 25+14+7+3+1 = 50 50/50 = 1 (or 100%)
Frequency
SPEED (mph)
9
c. Draw an ogive of the cumulative percentage distribution.
d. Using the ogive find the percentage of drivers who drove 47 mph or less.
Number 47 is somewhere between 45.5 and 49.5 on the horizontal axis which corresponds to
approximately 60%, therefore, 60% to 61%, of drivers drove 47 mph or less.
0
20
40
60
80
100
120
41.5 45.5 49.5 53.5 57.5 61.5
0
20
40
60
80
100
120
41.5 45.5 49.5 53.5 57.5 61.5
10
4. Given the following sample.
Sample: The ages of forty tenured faculty at CSULB (n = 40 ages)
Age Sample Dataset
45 59 51 62 58 54 42 59 49 47 52 63 40 53 61 47 54 58 53
32 61 39 51 37 43 53 46 56 58 48 55 50 57 60 54 63 60 55
a. Construct a dotplot.
b. Construct a Stemplots (Stem and Leaf Plot).
Solution to a: A dotplot is a graphical display of data using dots for relatively small data sets
where values fall into a number of discrete values (categories). Data values are plotted as dots
along a horizontal scale of values. Equal values will be drawn as a stack of dots. The purpose of
the dotplot is to represent each observation as a dot.
Below is the list of the 40 ages in order from youngest to oldest.
Ages (Sorted)
32 37 39 40 42 43 45 46 47 47 48 49 50 51 51 52 53 53 53 54
54 54 55 55 56 56 57 58 58 58 59 59 59 60 60 61 61 62 63 63
Clearly, the ages range from 32 to 63 years. Also, there is more tenured faculty at older ages.
Solution to b: Stemplots concisely display the data in order from smallest to largest. Stemplots
can provide useful information about small data sets.
A stemplot, like a histogram, is a tool to help you visualize a quantitative data set. The name
“Stem plot” is due to the fact that there is one “stem” with the largest place-value digits to the
left and one “leaf” to the right. We can see the distribution of data while keeping the original
data values.
Again, we find that the range of ages spans from 32 years to 63 years. We also find that there is
more tenured CSULB faculty at older ages.
3 2 7 9
4 0 2 3 5 6 7 7 8 9
5 5 5 6 6 7 8 8 8 9 9 9
6 0 0 1 1 2 3 3
11
Statistics, Sample Test (Exam Review) Solution
Module 1: Chapters 1, 2 & 3 Review
Chapter 3: Describing, Exploring, and Comparing Data
1. The following data gives the number of hours that a few employees at the GM factory
worked last week.
17, 38, 27, 14, 18, 34, 16, 42, 28, 24, 40, 20, 23, 31, 37, 21, 30, 25
Ranked Data: (Note: We don’t need to rank data for some calculations such as finding the mean,
however, it’s a good practice to do so for those calculations that need ranked data.)
14, 16, 17, 18, 20, 21, 23, 24, 25, 27, 28, 30, 31, 34, 37, 38, 40, 42
n = 18
a) Find the mean
x
x
n
= =
 (14+16+17+18+20+21+23+24+25+27+28+30+31+34+37+38+40+42)/18
= 485/18 ≈ 26.9444
b) Find the mode
there is no mode (each term applies only once)
c) Find the median.
(25+27)/2 = 26
d) Find the midrange.
MR = (Min + Max)/2 = (14+42)/2 = 28
e) Find the range
R= Max – Min = 42 – 14 = 28.
f) Find the variance.
( ) ( ) ( )
2
2
2
2
Sample Variance:
1 ( 1)
x x n x x
s
n n n
− −
= =
− −
  
( ) ( )
2
2 2 2
18 14 16 ... 42 14 16 ... 42
74.99673203... 75
18(18 1)
+ + + − + + +
= =
−
= (18 (14343) – (485)2
) / 18(17) = 75
Or we can do the following:
( )
2
2 2 2
2 (14 26.944) (16 26.944) ... (42 26.944)
Sample Variance: 75
1 18 1
x x
s
n
− − + − + + −
= = 
− −
12
g) Find the standard deviation.
( )
2
Sample Standard Deviation: 75
1
x x
s
n
−
= =
−

s ≈ 8.66
h) Find the interquartile range (IQR).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
14 16 17 18 20 21 23 24 25 27 28 30 31 34 37 38 40 42
Q2 = median =
25+27
2
= 26
Q1 = Median of the first half of data = 20
Q3 = Median of the second = 34
interquartile range: Q3 – Q1 = 34 – 20 = 14
2. IQ scores have a mean of 100 and a standard deviation of 15.
a) Find the coefficient of variance.
15
: 100, 15& 15%
100
Given CV

 

= = = = =
b) Using the range rule of thumb to establish the minimum and maximum “usual” IQ
scores.
2
 
  100 – 2(15) = 70 to 100 + 2(15) = 130
usual minimum is 70 and usual maximum is 130
c) Using the Chebyshev’s Theorem, find what is the least percentage of those who will
have an IQ score of 70 to 130.
1 – 1/K2
K = 2 ( K is the number of standard deviations away from the mean)
1 – 1/22
= 1 – ¼ = ¾
At least 75% have an IQ score of 70 to 130.
d. Using the empirical rule, find the percentage of those who will have an IQ score of
70 to 130.
95% will have an IQ score of 70 to 130.
(70 to 130 are 2 standard deviations away from the mean)
3. Given the following set of data: 32, 19, 14, 7, 15, 3, 4, 5, 9, 16, 15, 16, 19, 50
a) Rank the data from smallest to largest.
b) Prepare a box-and-whisker plot. [Box plot]
13
c) Does this data set contain any outliers? [Make sure to show the lower and the upper fences
on your graph]
d) Are the data symmetric or skewed? [If skewed, are they skewed left or right?]
a) Answer: 3, 4, 5, 7, 9, 14, 15, 15, 16, 16, 19, 19, 32, 50
b) Answer: There are different ways to calculate Q1, and Q3 Here’s one:
Q1 = 7 (4th
data) since L = (25/100)(14) = 3.5 ≈ 4;
Q2 = median = (15+15)/2 = 15;
Q3 = 19 (11th
data) since L = (75/100)(14) = 10.5 ≈ 11
Minimum Q1 Median Q3 Maximum
3 7 15 19 50
c) Answer: Outlier: 50
The values for Q1 – 1.5×IQR and Q3 + 1.5×IQR are the "fences" that mark off the "reasonable"
values from the outlier values. Outliers lie outside the fences.
IQR = Q3 – Q1 = 19 – 7 = 12; IQR x 1.5 = 12 x 1.5 = 18
A data is considered an outlier if its value is less than
Q1 – 1.5 IQR = 7 – 18 = – 11
A data is considered an outlier if its value is larger than
Q3 + 1.5 IQR = 19 + 18 = 37
A data is considered an extreme outlier if its value is larger than
Q1 – 3 IQR = 7 – 36 = –29
A data is considered an extreme outlier if its value is larger than
Q3 + 3 IQR = 19 + 36 = 55
d) Are the data symmetric or skewed? [If skewed, are they skewed left or right?]
Answer: Skewed to the right
Note 1: Make sure the drawing is to scale.
Note 2: Skewed data show an uneven boxplot in which case the median cuts the box
into two unequal pieces. Longer part on the right or above the median indicates data is skewed
to the right. Longer part on the left or below the median indicates data is skewed to the left.
Note 3: Sometimes the box may look even or uneven and even Skewed to one side,
however, the whiskers (tails on each side of the box) may indicate otherwise. Therefore, pay
attention to both.
14
4. Draw the box-and-whisker plot for the following data set:
77, 79, 80, 86, 87, 87, 94, 99
Median: (86 + 87) ÷ 2 = 86.5 = Q2
Answer: There are different ways to calculate Q1, and Q3 Here’s an easier way:
This splits the list into two halves: 77, 79, 80, 86 and 87, 87, 94, 99. Since the halves of the
data set each contain an even number of values, the sub-medians will be the average of the
middle two values. Copyright © 2004-2011 All Rights Reserved
Q1 = (79 + 80) ÷ 2 = 79.5
Q3 = (87 + 94) ÷ 2 = 90.5
Minimum = 77, Q1 = 79.5, Q2= 86.5, Q3= 90.5, Maximum = 99
Box & Whisker Plot:
OR:
Minimum Q1 Median Q3 Maximum
77 79.5 86.5 90.5 99
This set of five values has been given the name "the five-number summary".
To find the outliers:
IQR = Q3 – Q1= 90.5 -79.5 = 11.
The values for Q1 – 1.5×IQR and Q3 + 1.5×IQR are the "fences" that mark off the "reasonable"
values from the outlier values. Outliers lie outside the fences.
The outliers will be any values below Q1 – 1.5×IQR = 79.5 – 1.5 × 11 = 79.5 – 16.5 = 63 or
above Q3 + 1.5×IQR = 90.5 + 1.5 × 11 = 90.5 + 16.5 = 107.
The extreme values (Outliers) will be those below Q1 – 3×IQR or above Q3 + 3×IQR.
15
Answer: This data is almost symmetric (Normal, bell shaped)
5. Answer the following:
a. If the mean of a set of data is 23.00, and 12.60 has a z-score of –1.30, then the
standard deviation must be: A) 4.00 B) 32.00 C) 64.00 D) 8.00
Answer: D
12.6 23
8
1.3
x
z
x
z




−
= →
−
=
−
= =
−
b. Find the z score for each student and indicate which one is higher.
Art Major (AM) 𝒙 = 𝟒𝟔 𝒙
̅ = 𝟓𝟎, 𝒔 = 𝟓
Theatre Major (TM) 𝒙 = 𝟕𝟎 𝒙
̅ = 𝟕𝟓, 𝒔 = 𝟕
a. Both students have the same score.
b. Neither student received a positive score; therefore, the higher score cannot be
determined.
c. The theater major has a higher score than the art major.
d. The art major has a higher score than the theater major.
Answer: D
12.6 23
8
1.3
x
z
x
z




−
= →
−
=
−
= =
−
16
Answer: C
46 50 4
0.8
5 5
70 75 5
0.7143
7 7
Art
Theater
x x
z
s
x x
z
s
−
=
− −
= = = −
−
=
− −
= = = −
6. Answer the following:
a. If the five-number summary for a set of data is 0, 3, 6, 7, and 16, then the mean of
this set of data is
A. 6 B. There is insufficient information to calculate the mean
C. 8 D. 5 Answer: B, why?
b. Which of the following is true?
a. 𝑎. 𝐷50 = 𝑃5= 𝑄25 𝑏. 𝐷5 = 𝑃50= 𝑄2 𝑐. 𝐷50 = 𝑃5= 𝑄2 𝑑. 𝐷5 =
𝑃5= 𝑄5
Answer: b
D: Decile, 10 equal parts, P: Percentile, 100 equal parts, Q: Quartile, 4 equal parts
7. A student received the following grades: An A in Statistics (4 units), a F in Physics II (5
units), a B in Sociology (3 units), a B in a Literature seminar (2 units), and a D in Tennis
(1 units). Assuming A = 4 grade points, B = 3 grade points, C = 2 grade points, D = 1
grade point, and F = 0 grade points, the student's grade point average is:
Grades W (Number of units) Worth (Points)
A 4 4
F 5 0
B 3 3
B 2 3
D 1 1
1
1
( 4) ( 0) ( 3) ( 3)
4 5 3 2 1
4
( 1) 32
2.1
5 3 2 1
33
15
n
i
n
i
x
x
w
w
=
=
• + • + • + • + •
= = = =
+ + + +


Answer: 2.133
17
8. A
n
s
w
e
r
:
B
,
W
h
y
?
Given the following data set, find the value that corresponds to the
a) 75th percentile.
b) 30th percentile.
c) Find the percentile corresponding to number 44.
d) Using range Rule of Thumb, estimate the standard deviation.
10, 44, 15, 23, 14, 18, 72, 56
:10,14,15,18,23,44,56,72
75
8, : 8 6
100 100
Since 6 is a whole nummber
6th+7th 44 56
50
2 2
Ranked
k
n location L n
= = • = • =
+
= =
a) Answer: 50
30
8, : 8 2.4
100 100
Alw Ro
ays : 3rd value: 15
und Up
k
n location L n
= = • = • =
b) Answer: 3rd
value = 15
63
# of values 44 5
: 0.625
8
Round accordingly
percentile p
n

= = =
c) Answer: P63
d)
72 10
15.5
4 4 4
R Max Min
s
− −
 = = =
18
9.
The following data representing numbers of keyboards assembled for a sample of 25 days
in a company:
45 52 48 41 56 46 44 48 53 51 53 51
48 46 43 52 50 54 47 44 47 50 49 52
42
a. Construct a frequency distribution table with 4 classes.
b. Use the data to find the mean.
c. Use the frequency distribution to find the mean.
d. Are the two means reasonably close?
Range(Hi-Low)
Width=
Number of classes
4 classes:
56-41
3.75
4
W = = & rounding will result in: w = 4,
Lower limit of the 1st
class (starting point) is a convenient Number  the smallest value such as
41.
Frequency Distribution Table
Classes Frequency Relative Frequency
41 - 44 5 0.2
45 - 48 8 0.32
49 - 52 8 0.32
53 - 56 4 0.16
b.
x
x
n
= =
 45 +52 + …+42
25
=
1212
25
= 48.48
c. 𝑥̅ =
∑ 𝑓⋅𝑋𝑚
𝑛
= 𝑥̅ =
5(42.5)+8(46.5)+8(50.5)+4(54.5)
25
=
1206.5
25
= 48.26, (𝑋𝑚 is the midpoint of
each class.)
d. Yes, they are reasonably close

Practice Test 1 solutions

  • 1.
    1 Statistics, Sample Test(Exam Review) Solution Module 1: Chapters 1, 2 & 3 Review Chapter 1: Introduction to Statistics Chapter 2: Exploring Data with Tables and Graphs Chapter 3: Describing, Exploring, and Comparing Data Chapter 1: Introduction to Statistics 1. True or False: The value of variance and standard deviation is never negative. True – these are absolute quantities that is a measure of variation of all values from the mean (it can be zero) ( ) ( ) ( ) ( ) 2 2 2 2 2 2 Population Variance: Population Standard Deviation: Sample Variance: 1 Sample Standard Deviation: 1 x N x N x x s n x x s n     − = − = − = − − = −     2. What kind of variable “weights of bears” is? Quantitative or Qualitative Quantitative – variable “weights of bears” gives numbers that represent counts or measurements 3. What kind of variable “gender of bears” is? Quantitative or Qualitative Qualitative – “gender of bears” is distinguished by nonnumeric characteristics 4. Define a population in statistics. Population is the complete collection of all elements (scores, people, measurement, etc) to be studied 5. The value of the middle term in a ranked data set is called the median
  • 2.
    2 6. Given anydata, how do you find the mode? Mode is the value that appears with the greatest frequency among the data. A data set can have one, more than one, or no mode (when all numbers appear with equal frequency). 7. True or False: The “number of chairs” is considered to be a continuous variable. False – The number of chairs is not continuous. We cannot have ¼ amounts of chairs. Discrete: Data result when the number of possible values is either a finite number or a countable number of possible values: 0, 1, 2, 3, . . . Examples: Number of students in a class, Number of cars in a parking lot. Continuous: Data that can take any value in an interval. Data result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps. (Interval) Examples: The Weight, or height of a person. 8. What is a Pareto chart? What does each axis represent? A Pareto Chart is a bar graph, for categorical (qualitative) data (similar to Histogram for quantitative data). The vertical scale represents frequencies or relative frequencies, and horizontal scale represents different categories. Bars are arranged in descending order to emphasize the order of impact. 9. Define a parameter and a statistic. parameter: a numerical measurement describing some characteristic of a population statistic: a numerical measurement describing some characteristic of a sample 10. Define random sample and simple random sample. random sample: members of the population are selected in such a way that each individual member has an equal chance of being selected simple random sample (of size n): subjects selected in such a way that every possible sample of the same size n has the same chance of being chosen 11. Define the following types of sampling: systematic, convenience, stratified, cluster systematic sampling: select some starting point, and then select every Kth element in population convenience sampling: use results that are easy to get stratified sampling: subdivide the population into at least two different subgroups that share the same characteristics, then draw a sample from each subgroup (stratum) cluster sampling: divide the population into sections (or clusters that are similar to one
  • 3.
    3 another ), randomlyselect some of those clusters, choose all members from selected clusters 12. What are different levels of measurement of data? Give examples. nominal level of measurement: qualitative data ex) gender of subjects ordinal level of measurement: categories with some order (differences between data values either cannot be determined or is meaningless but there is an order) ex) course grades A, B, C, D, F interval level of measurement: differences between data values are meaningful, but there is no natural starting point (the value 0 does not mean lack of) ex) years such as 1000, 2000, 1492, 1776 ratio level of measurement: interval level modified to include natural zero starting point ex: price of college textbooks ($0 means no cost) LEVELS OF Measurement Examples RATIO ➢ Distances (in km) travelled by cars (0 km represents no distance travelled, and 400 km is twice as far as 200 km.) ➢ Prices of college textbooks ($0 does represent no cost, and a $100 book does cost twice as much as a $50 book.) INTERVAL ➢ Body temperatures of 98.20 F and 98.60 F ➢ The years 1769 and 1845 ORDINAL ➢ Ranks of colleges in U.S. News and World Report (Ranks can be first, second, third, and so on, which determines an ordering) ➢ A school teacher assigns grades of A, B, C, D, or F (These grades can be arranged in order, but we can’t determine difference between the grades.) NOMINAL ➢ Eye colors (blue, brown, black, other) ➢ Political party (Democrat, republican, Independent, other) 13. What’s the difference between an observational study and an experiment? Give examples. observational study: observing and measuring specific characteristics without attempting to modify the subjects being studied ex) Charles Darwin’s observation of Darwinian finches at the Galapagos Islands experiment: apply some treatment and then observe its effects on the subjects ex) giving some type of medicine and see whether it cures certain type of disease among subjects 14. Describe Cross Sectional, Retrospective, and Prospective Studies. Give examples.
  • 4.
    4 Cross Sectional Study:Data are observed (an Observational Study ), measured, and collected at one point in time. (A cross-sectional study is like a snapshot of a particular group of people at a given point in time; it is used to describe what is happening at that time.) Example: A medical study examining the frequency of cancer among a population of different geographical locations. By doing this, any differences among them can most likely be attributed to geographical locations differences rather than something that happened over time. Retrospective (or Case Control) Study: Data are collected from the past by going back in time (data that already exist). Example: Researcher ask participants about their smoking habits over the past 20 years. Then, they can analyze any possible correlations between their smoking habits and diseases such as lung cancer. Prospective (or Longitudinal or Cohort) Study: Data are collected in the future from groups (called cohorts) sharing common factors. (Longitudinal studies look at a group of people over an extended period.) Example: A medical study follows a cohort of middle-aged people who vary in terms of smoking habits, to test the hypothesis that the 20-year incidence rate of lung cancer will be highest among heavy smokers, followed by moderate smokers, and then nonsmokers. 15. What is Sampling Error, Non-sampling Error, and Nonrandom Sampling Error? Sampling Error: Sampling error is the difference between a sample result and the true population result that is the consequence of chance sample variations Non-sampling Error: The non-sampling error occurs due to data that are incorrectly collected, recorded, or analyzed. It may happen by selecting a biased sample, using a defective instrument, or copying the data incorrectly. Nonrandom Sampling Error: Nonrandom Sampling Error is the result of using a sampling method that is not random, such as using a convenience sample or a voluntary response sample. Voluntary response sample: (or self-selected survey)
  • 5.
    5 One in whichthe respondents themselves decide whether to be included. In this case, valid conclusions can be made only about the specific group of people who agree to participate. 16. What are some characteristics of an Experiment? Confounding: Occurs in an experiment when the experimenter is not able to distinguish between the effects of different factors. Blinding: Subject does not know he or she is receiving a treatment or placebo. Blocks: Groups of subjects with similar characteristics. Completely Randomized Experimental Design: Subjects are put into blocks through a process of random selection. Replication: Repetition of an experiment when there are enough subjects to recognize the differences in different treatments. Sample Size: Sample size must be large enough to display the true nature of the population data and should be obtained using an appropriate random method. 17. Explain some Misuses of Statistics. Bad Samples, Small Samples, Misleading Graphs, Distorted Percentages, Loaded Questions, Order of Questions, Refusals, Correlation & Causality, Self Interest Study, Precise Numbers, Partial Pictures, Pictographs (Double the length, width, and height of a cube, and the volume increases by a factor of eight). To correctly interpret a graph, we should analyze the numerical information given in the graph instead of being misled by its general shape. Deliberate Distortions Loaded question: 95% yes: Should the Governor have the line-item veto to eliminate waste? 53% yes: “Should the Governor have the line-item veto, or not? If sample data are not collected in an appropriate way, the data may be completely useless that no amount of statistical training can salvage them. Randomness typically plays a critical role in determining which data to collect.
  • 6.
    6 Statistics, Sample Test(Exam Review) Solution Module 1: Chapters 1, 2 & 3 Review Chapter 2: Exploring Data with Tables and Graphs 1. Given the frequency table, answer the following questions. Age group Frequency 11-20 5 21-30 6 31-40 9 41-50 11 51-60 4 a. The number of classes in the table is 5 [number of statistical age groups defined] b. The class width is 10 (upper limit – lower limit + 1 unit or difference of two consecutive lower limits or upper limits i.e. 21-11) c. The midpoint of the 4th class is 45.5 (41+50)/2 = 45.5 d. The Lower Boundary of the 5th class is 50.5 (50+51)/2 = 50.5 (think of it as a midpoint between the upper limit of 4th class and the lower limit of 5th class) e. The Upper Limit of the 1st class is 20 1st class is 11-20  upper limit f. The sample size is 35 5+6+9+11+4 = 35 g. The relative frequency of the 1st class is relative frequency: f/n relative frequency of the 1st class = f/n = 5/35 = 1/7 ≈ 0.1429 (or 14.29 %) Age group Frequency Midpoint =(LL+UL)/2 LB - UB RF= f / n 1) 11-20 5 (11+20) / 2 = 15.5 10.5-20.5 5/35 = 1/7 2) 21-30 6 25.5 20.5-30.5 6/35 3) 31-40 9 35.5 30.5-40.5 9/35 4) 41-50 11 45.5 40.5-50.5 11/35 5) 51-60 4 55.5 50.5-60.5 4/35 35 n f = =  h. Find the modal class, and the mode.
  • 7.
    7 The modal class:# 4. 41-50 with largest frequency of 11. The mode = Midpoint of that class = 45.5 2. The following frequency table describes the speeds of drivers ticketed through a 30 mph speed zone. Speed Frequency (number of drivers) 42-45 25 46-49 14 50-53 7 54-57 3 58-61 1 a. Calculate the relative frequencies for all classes. n = 50 first class: f/n = 25/50 = 0.5 (or 50%) second class: 14/50 = 0.28 (or 28%) third class: 7/50 = 0.14 (or 14%) fourth class: 3/50 = 0.06 (or 6%) fifth class: 1/50 = 0.02 (or 2%) ∑rf = 1 (or 100%) b. What percentage represents the speed of 53 mph or less? cumulative frequency distribution of 53 mph or less refers to first three classes cumulative frequency = 0.5 + 0.28 + 0.14 = 0.92 Or: (25 +14 +7) / 50 = 46 / 50 = 92% 92% represents the speed of 53 mph or less c. What are the class boundaries? class boundaries are midpoints between corresponding upper and lower limit for the outer bound, same amount is either subtracted or added class boundaries: 41.5-45.5, 45.5-49.5, 49.5-53.5, 53.5-57.5, 57.5-61.5 Speed Frequency (number of drivers) Q a. RF = f / n Q c: Boundaries 1 42-45 25 25/50 =1/2 41.5-45.5 2 46-49 14 14/50=7/25 45.5-49.5 3 50-53 7 7/50 49.5-53.5 4 54-57 3 3/50 53.5-57.5 5 58-61 1 1/50 57.5-61.5 50 n f = = 
  • 8.
    8 d. Construct ahistogram corresponding to the frequency distribution table. 30 -- 25 -- 20 -- 15 – 10 -- 5 – 0 – | | | | | 41.5 45.5 49.5 53.5 57.5 61.5 3. The following frequency table describes the speeds of drivers ticketed through a 30 mph speed zone. Speed Frequency (number of drivers) 42-45 25 46-49 14 50-53 7 54-57 3 58-61 1 a. Prepare the cumulative frequency distribution. (See below) b. Prepare the cumulative relative frequency distribution. Cumulative speed Cumulative frequency Cumulative relative frequency 42-45 25 25/50 = 0.5 (or 50%) 42-49 25+14 = 39 39/50 = 0.78 (or 78%) 42-53 25+14+7 = 46 46/50 = 0.92 (or 92%) 42-57 25+14+7+3 = 49 49/50 = 0.98 (or 98%) 42-61 25+14+7+3+1 = 50 50/50 = 1 (or 100%) Frequency SPEED (mph)
  • 9.
    9 c. Draw anogive of the cumulative percentage distribution. d. Using the ogive find the percentage of drivers who drove 47 mph or less. Number 47 is somewhere between 45.5 and 49.5 on the horizontal axis which corresponds to approximately 60%, therefore, 60% to 61%, of drivers drove 47 mph or less. 0 20 40 60 80 100 120 41.5 45.5 49.5 53.5 57.5 61.5 0 20 40 60 80 100 120 41.5 45.5 49.5 53.5 57.5 61.5
  • 10.
    10 4. Given thefollowing sample. Sample: The ages of forty tenured faculty at CSULB (n = 40 ages) Age Sample Dataset 45 59 51 62 58 54 42 59 49 47 52 63 40 53 61 47 54 58 53 32 61 39 51 37 43 53 46 56 58 48 55 50 57 60 54 63 60 55 a. Construct a dotplot. b. Construct a Stemplots (Stem and Leaf Plot). Solution to a: A dotplot is a graphical display of data using dots for relatively small data sets where values fall into a number of discrete values (categories). Data values are plotted as dots along a horizontal scale of values. Equal values will be drawn as a stack of dots. The purpose of the dotplot is to represent each observation as a dot. Below is the list of the 40 ages in order from youngest to oldest. Ages (Sorted) 32 37 39 40 42 43 45 46 47 47 48 49 50 51 51 52 53 53 53 54 54 54 55 55 56 56 57 58 58 58 59 59 59 60 60 61 61 62 63 63 Clearly, the ages range from 32 to 63 years. Also, there is more tenured faculty at older ages. Solution to b: Stemplots concisely display the data in order from smallest to largest. Stemplots can provide useful information about small data sets. A stemplot, like a histogram, is a tool to help you visualize a quantitative data set. The name “Stem plot” is due to the fact that there is one “stem” with the largest place-value digits to the left and one “leaf” to the right. We can see the distribution of data while keeping the original data values. Again, we find that the range of ages spans from 32 years to 63 years. We also find that there is more tenured CSULB faculty at older ages. 3 2 7 9 4 0 2 3 5 6 7 7 8 9 5 5 5 6 6 7 8 8 8 9 9 9 6 0 0 1 1 2 3 3
  • 11.
    11 Statistics, Sample Test(Exam Review) Solution Module 1: Chapters 1, 2 & 3 Review Chapter 3: Describing, Exploring, and Comparing Data 1. The following data gives the number of hours that a few employees at the GM factory worked last week. 17, 38, 27, 14, 18, 34, 16, 42, 28, 24, 40, 20, 23, 31, 37, 21, 30, 25 Ranked Data: (Note: We don’t need to rank data for some calculations such as finding the mean, however, it’s a good practice to do so for those calculations that need ranked data.) 14, 16, 17, 18, 20, 21, 23, 24, 25, 27, 28, 30, 31, 34, 37, 38, 40, 42 n = 18 a) Find the mean x x n = =  (14+16+17+18+20+21+23+24+25+27+28+30+31+34+37+38+40+42)/18 = 485/18 ≈ 26.9444 b) Find the mode there is no mode (each term applies only once) c) Find the median. (25+27)/2 = 26 d) Find the midrange. MR = (Min + Max)/2 = (14+42)/2 = 28 e) Find the range R= Max – Min = 42 – 14 = 28. f) Find the variance. ( ) ( ) ( ) 2 2 2 2 Sample Variance: 1 ( 1) x x n x x s n n n − − = = − −    ( ) ( ) 2 2 2 2 18 14 16 ... 42 14 16 ... 42 74.99673203... 75 18(18 1) + + + − + + + = = − = (18 (14343) – (485)2 ) / 18(17) = 75 Or we can do the following: ( ) 2 2 2 2 2 (14 26.944) (16 26.944) ... (42 26.944) Sample Variance: 75 1 18 1 x x s n − − + − + + − = =  − −
  • 12.
    12 g) Find thestandard deviation. ( ) 2 Sample Standard Deviation: 75 1 x x s n − = = −  s ≈ 8.66 h) Find the interquartile range (IQR). 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 14 16 17 18 20 21 23 24 25 27 28 30 31 34 37 38 40 42 Q2 = median = 25+27 2 = 26 Q1 = Median of the first half of data = 20 Q3 = Median of the second = 34 interquartile range: Q3 – Q1 = 34 – 20 = 14 2. IQ scores have a mean of 100 and a standard deviation of 15. a) Find the coefficient of variance. 15 : 100, 15& 15% 100 Given CV     = = = = = b) Using the range rule of thumb to establish the minimum and maximum “usual” IQ scores. 2     100 – 2(15) = 70 to 100 + 2(15) = 130 usual minimum is 70 and usual maximum is 130 c) Using the Chebyshev’s Theorem, find what is the least percentage of those who will have an IQ score of 70 to 130. 1 – 1/K2 K = 2 ( K is the number of standard deviations away from the mean) 1 – 1/22 = 1 – ¼ = ¾ At least 75% have an IQ score of 70 to 130. d. Using the empirical rule, find the percentage of those who will have an IQ score of 70 to 130. 95% will have an IQ score of 70 to 130. (70 to 130 are 2 standard deviations away from the mean) 3. Given the following set of data: 32, 19, 14, 7, 15, 3, 4, 5, 9, 16, 15, 16, 19, 50 a) Rank the data from smallest to largest. b) Prepare a box-and-whisker plot. [Box plot]
  • 13.
    13 c) Does thisdata set contain any outliers? [Make sure to show the lower and the upper fences on your graph] d) Are the data symmetric or skewed? [If skewed, are they skewed left or right?] a) Answer: 3, 4, 5, 7, 9, 14, 15, 15, 16, 16, 19, 19, 32, 50 b) Answer: There are different ways to calculate Q1, and Q3 Here’s one: Q1 = 7 (4th data) since L = (25/100)(14) = 3.5 ≈ 4; Q2 = median = (15+15)/2 = 15; Q3 = 19 (11th data) since L = (75/100)(14) = 10.5 ≈ 11 Minimum Q1 Median Q3 Maximum 3 7 15 19 50 c) Answer: Outlier: 50 The values for Q1 – 1.5×IQR and Q3 + 1.5×IQR are the "fences" that mark off the "reasonable" values from the outlier values. Outliers lie outside the fences. IQR = Q3 – Q1 = 19 – 7 = 12; IQR x 1.5 = 12 x 1.5 = 18 A data is considered an outlier if its value is less than Q1 – 1.5 IQR = 7 – 18 = – 11 A data is considered an outlier if its value is larger than Q3 + 1.5 IQR = 19 + 18 = 37 A data is considered an extreme outlier if its value is larger than Q1 – 3 IQR = 7 – 36 = –29 A data is considered an extreme outlier if its value is larger than Q3 + 3 IQR = 19 + 36 = 55 d) Are the data symmetric or skewed? [If skewed, are they skewed left or right?] Answer: Skewed to the right Note 1: Make sure the drawing is to scale. Note 2: Skewed data show an uneven boxplot in which case the median cuts the box into two unequal pieces. Longer part on the right or above the median indicates data is skewed to the right. Longer part on the left or below the median indicates data is skewed to the left. Note 3: Sometimes the box may look even or uneven and even Skewed to one side, however, the whiskers (tails on each side of the box) may indicate otherwise. Therefore, pay attention to both.
  • 14.
    14 4. Draw thebox-and-whisker plot for the following data set: 77, 79, 80, 86, 87, 87, 94, 99 Median: (86 + 87) ÷ 2 = 86.5 = Q2 Answer: There are different ways to calculate Q1, and Q3 Here’s an easier way: This splits the list into two halves: 77, 79, 80, 86 and 87, 87, 94, 99. Since the halves of the data set each contain an even number of values, the sub-medians will be the average of the middle two values. Copyright © 2004-2011 All Rights Reserved Q1 = (79 + 80) ÷ 2 = 79.5 Q3 = (87 + 94) ÷ 2 = 90.5 Minimum = 77, Q1 = 79.5, Q2= 86.5, Q3= 90.5, Maximum = 99 Box & Whisker Plot: OR: Minimum Q1 Median Q3 Maximum 77 79.5 86.5 90.5 99 This set of five values has been given the name "the five-number summary". To find the outliers: IQR = Q3 – Q1= 90.5 -79.5 = 11. The values for Q1 – 1.5×IQR and Q3 + 1.5×IQR are the "fences" that mark off the "reasonable" values from the outlier values. Outliers lie outside the fences. The outliers will be any values below Q1 – 1.5×IQR = 79.5 – 1.5 × 11 = 79.5 – 16.5 = 63 or above Q3 + 1.5×IQR = 90.5 + 1.5 × 11 = 90.5 + 16.5 = 107. The extreme values (Outliers) will be those below Q1 – 3×IQR or above Q3 + 3×IQR.
  • 15.
    15 Answer: This datais almost symmetric (Normal, bell shaped) 5. Answer the following: a. If the mean of a set of data is 23.00, and 12.60 has a z-score of –1.30, then the standard deviation must be: A) 4.00 B) 32.00 C) 64.00 D) 8.00 Answer: D 12.6 23 8 1.3 x z x z     − = → − = − = = − b. Find the z score for each student and indicate which one is higher. Art Major (AM) 𝒙 = 𝟒𝟔 𝒙 ̅ = 𝟓𝟎, 𝒔 = 𝟓 Theatre Major (TM) 𝒙 = 𝟕𝟎 𝒙 ̅ = 𝟕𝟓, 𝒔 = 𝟕 a. Both students have the same score. b. Neither student received a positive score; therefore, the higher score cannot be determined. c. The theater major has a higher score than the art major. d. The art major has a higher score than the theater major. Answer: D 12.6 23 8 1.3 x z x z     − = → − = − = = −
  • 16.
    16 Answer: C 46 504 0.8 5 5 70 75 5 0.7143 7 7 Art Theater x x z s x x z s − = − − = = = − − = − − = = = − 6. Answer the following: a. If the five-number summary for a set of data is 0, 3, 6, 7, and 16, then the mean of this set of data is A. 6 B. There is insufficient information to calculate the mean C. 8 D. 5 Answer: B, why? b. Which of the following is true? a. 𝑎. 𝐷50 = 𝑃5= 𝑄25 𝑏. 𝐷5 = 𝑃50= 𝑄2 𝑐. 𝐷50 = 𝑃5= 𝑄2 𝑑. 𝐷5 = 𝑃5= 𝑄5 Answer: b D: Decile, 10 equal parts, P: Percentile, 100 equal parts, Q: Quartile, 4 equal parts 7. A student received the following grades: An A in Statistics (4 units), a F in Physics II (5 units), a B in Sociology (3 units), a B in a Literature seminar (2 units), and a D in Tennis (1 units). Assuming A = 4 grade points, B = 3 grade points, C = 2 grade points, D = 1 grade point, and F = 0 grade points, the student's grade point average is: Grades W (Number of units) Worth (Points) A 4 4 F 5 0 B 3 3 B 2 3 D 1 1 1 1 ( 4) ( 0) ( 3) ( 3) 4 5 3 2 1 4 ( 1) 32 2.1 5 3 2 1 33 15 n i n i x x w w = = • + • + • + • + • = = = = + + + +   Answer: 2.133
  • 17.
    17 8. A n s w e r : B , W h y ? Given thefollowing data set, find the value that corresponds to the a) 75th percentile. b) 30th percentile. c) Find the percentile corresponding to number 44. d) Using range Rule of Thumb, estimate the standard deviation. 10, 44, 15, 23, 14, 18, 72, 56 :10,14,15,18,23,44,56,72 75 8, : 8 6 100 100 Since 6 is a whole nummber 6th+7th 44 56 50 2 2 Ranked k n location L n = = • = • = + = = a) Answer: 50 30 8, : 8 2.4 100 100 Alw Ro ays : 3rd value: 15 und Up k n location L n = = • = • = b) Answer: 3rd value = 15 63 # of values 44 5 : 0.625 8 Round accordingly percentile p n  = = = c) Answer: P63 d) 72 10 15.5 4 4 4 R Max Min s − −  = = =
  • 18.
    18 9. The following datarepresenting numbers of keyboards assembled for a sample of 25 days in a company: 45 52 48 41 56 46 44 48 53 51 53 51 48 46 43 52 50 54 47 44 47 50 49 52 42 a. Construct a frequency distribution table with 4 classes. b. Use the data to find the mean. c. Use the frequency distribution to find the mean. d. Are the two means reasonably close? Range(Hi-Low) Width= Number of classes 4 classes: 56-41 3.75 4 W = = & rounding will result in: w = 4, Lower limit of the 1st class (starting point) is a convenient Number  the smallest value such as 41. Frequency Distribution Table Classes Frequency Relative Frequency 41 - 44 5 0.2 45 - 48 8 0.32 49 - 52 8 0.32 53 - 56 4 0.16 b. x x n = =  45 +52 + …+42 25 = 1212 25 = 48.48 c. 𝑥̅ = ∑ 𝑓⋅𝑋𝑚 𝑛 = 𝑥̅ = 5(42.5)+8(46.5)+8(50.5)+4(54.5) 25 = 1206.5 25 = 48.26, (𝑋𝑚 is the midpoint of each class.) d. Yes, they are reasonably close