Chi_Sq_DATA
Employee_Ref | Promoted | Management_Training_Program | ||||
1 | No | No | ||||
2 | Yes | Yes | ||||
3 | No | No | ||||
4 | Yes | Yes | ||||
5 | No | No | ||||
6 | No | No | ||||
7 | Yes | Yes | ||||
8 | Yes | Yes | ||||
9 | Yes | Yes | ||||
10 | Yes | Yes | ||||
11 | No | No | ||||
12 | No | No | ||||
13 | Yes | Yes | ||||
14 | No | No | ||||
15 | Yes | No | ||||
16 | Yes | Yes | ||||
17 | No | Yes | ||||
18 | Yes | Yes | ||||
19 | Yes | Yes | ||||
20 | Yes | Yes | ||||
21 | Yes | Yes | ||||
22 | Yes | Yes | ||||
23 | No | No | ||||
24 | Yes | Yes | ||||
25 | No | No | ||||
26 | Yes | Yes | ||||
27 | No | No | ||||
28 | Yes | Yes | ||||
29 | Yes | Yes | ||||
30 | No | No | ||||
31 | No | No | ||||
32 | Yes | Yes | ||||
33 | Yes | Yes | ||||
34 | No | No | ||||
35 | Yes | Yes | ||||
36 | No | No | ||||
37 | Yes | Yes | ||||
38 | No | No | ||||
39 | Yes | Yes | ||||
40 | No | No | ||||
41 | No | No | ||||
42 | Yes | Yes | ||||
43 | Yes | Yes | ||||
44 | Yes | Yes | ||||
45 | Yes | Yes | ||||
46 | No | No | ||||
47 | No | No | ||||
48 | No | No | ||||
49 | Yes | No | ||||
50 | Yes | Yes | ||||
51 | Yes | Yes | ||||
52 | Yes | Yes | ||||
53 | Yes | Yes | ||||
54 | Yes | Yes | ||||
55 | No | No | ||||
56 | Yes | Yes | ||||
57 | No | No | ||||
58 | Yes | Yes | ||||
59 | No | No | ||||
60 | No | No | ||||
61 | Yes | Yes | ||||
62 | Yes | Yes | ||||
63 | Yes | Yes | ||||
64 | Yes | Yes | ||||
65 | No | No | ||||
66 | Yes | Yes | ||||
67 | No | No | ||||
68 | No | No | ||||
69 | No | No | ||||
70 | No | No | ||||
71 | No | No | ||||
72 | No | No | ||||
73 | Yes | Yes | ||||
74 | No | Yes | ||||
75 | No | No | ||||
76 | No | No | ||||
77 | No | No | ||||
78 | Yes | Yes | ||||
79 | No | Yes | ||||
80 | Yes | Yes | ||||
81 | Yes | Yes | ||||
82 | No | Yes | ||||
83 | No | No | ||||
84 | No | No | ||||
85 | No | No | ||||
86 | No | No | ||||
87 | No | No | ||||
88 | No | No | ||||
89 | No | No | ||||
90 | Yes | Yes | ||||
91 | No | No | ||||
92 | Yes | Yes | ||||
93 | No | Yes | ||||
94 | Yes | Yes | ||||
95 | Yes | No | ||||
96 | Yes | Yes | ||||
97 | Yes | Yes | ||||
98 | Yes | Yes | ||||
99 | No | No | ||||
100 | No | No | ||||
101 | No | No | ||||
102 | Yes | Yes | ||||
103 | Yes | Yes | ||||
104 | Yes | Yes | ||||
105 | Yes | Yes | ||||
106 | Yes | Yes | ||||
107 | Yes | Yes | ||||
108 | Yes | Yes | ||||
109 | No | No | ||||
110 | Yes | Yes | ||||
111 | No | No | ||||
112 | Yes | Yes | ||||
113 | No | No | ||||
114 | Yes | Yes | ||||
115 | Yes | No | ||||
116 | No | No | ||||
117 | Yes | Yes | ||||
118 | No | No | ||||
119 | No | Yes | ||||
120 | Yes | Yes | ||||
121 | No | No | ||||
122 | Yes | Yes | ||||
123 | Yes | Yes | ||||
124 | No | No | ||||
125 | Yes | Yes | ||||
126 | No | No | ||||
127 | Yes | Yes | ||||
128 | Yes | Yes | ||||
129 | Yes | Yes | ||||
130 | Yes | Yes | ||||
131 | Yes | No | ||||
132 | No | No | ||||
133 | Yes | Yes | ||||
134 | Yes | Yes | ||||
135 | No | No | ||||
136 | Yes | Yes | ||||
137 | Yes | Yes | ||||
138 | Yes | Yes | ||||
139 | No | No | ||||
140 | No | No | Here is where the "HIDE" command was used | |||
141 | No | No | ||||
142 | No | No | ||||
143 | No | No | ||||
144 | No | No | ||||
145 | Yes | Yes | ||||
146 | No | No | We are testing the idea that Promotion = function (Going to the training program). | |||
147 | Yes | Yes | ||||
148 | Yes | Yes | ||||
149 | No | No | ||||
150 | No | Yes | Chi square test with Excel formula | |||
Formula | =COUNTIF(B2:B151,$A$154) | |||||
# | Promoted | Training Program | Formula | =CHISQ.TEST(C154:C155,D154:D155) | ||
Yes | ||||||
No | p-value = | |||||
Total | ||||||
In Per Cent | ||||||
# | Promoted | Training Program | ||||
Yes | ||||||
No | ||||||
Total |
ANOVA_DATA
Item_# | Week_Code | Satisfaction_Level | Office_Code | Office | |||||||
1 | Week 2 | 38 | 1 | Boston | Week # | Boston | Atlanta | Los Angeles | Portland | ||
2 | Week 4 | 40 | 1 | Boston | 1 | 38 | 42 | 44 | 31 | ||
3 | Week 6 | 30 | 1 | Boston | 2 | 40 | 45 | 43 | 43 | ||
4 | Week 8 | 42 | 1 | Boston | 3 | 30 | 44 | 38 | 44 | ||
5 | Week 10 | 36 | 1 | Boston | 4 | 42 | 43 | 40 | 43 | ||
6 | Week 12 | 41 | 1 | Boston | 5 | 36 | 46 | 37 | 36 | ||
7 | Week 14 | 40 | 1 | Boston | 6 | 41 | 42 | 35 | 43 | ||
8 | Week 16 | 42 | 1 | Boston | 7 | 40 | 35 | 37 | 46 | ||
9 | Week 18 | 36 | 1 | Boston | 8 | 42 | 43 | 35 | 45 | ||
10 | Week 20 | 41 | 1 | Boston | 9 | 36 | 46 | 40 | 46 | ||
11 | Week 22 | 40 | 1 | Boston | 10 | 41 | 43 | 37 | 36 | ||
12 | Week 24 | 35 | 1 | Boston | 11 | 40 | 42 | 36 | 40 | ||
13 | Week 26 | 34 | 1 | Boston | 12 | 35 | 45 | 39 | 41 | ||
14 | Week 28 | 41 | 1 | Boston | 13 | 34 | 39 | 40 | 40 | ||
15 | Week 30 | 39 | 1 | Boston | 14 | 41 | 39 | 36 | 39 | ||
16 | Week 32 | 42 | 1 | Boston | 15 | 39 | 44 | 44 | 38 | ||
17 | Week 2 | 42 | 2 | Atlanta | 16 | 42 | 43 | 39 | 42 | ||
18 | Week 4 | 45 | 2 | Atlanta | Average | 38.6 | 42.6 | 38.8 | 40.8 | ||
19 | Week 6 | 44 | 2 | Atlanta | |||||||
20 | Week 8 | 43 | 2 | Atlanta | |||||||
21 | Week 10 | 46 | 2 | Atlanta | |||||||
22 | Week 12 | 42 | 2 | Atlanta | |||||||
23 | Week 14 | 35 | 2 | Atlanta | |||||||
24 | Week 16 | 43 | 2 | Atlanta | |||||||
25 | Week 18 | 46 | 2 | Atlanta | |||||||
26 | Week 20 | 43 | 2 | Atlanta | |||||||
27 | Week 22 | 42 | 2 | Atlanta | |||||||
28 | Week 24 | 45 | 2 | Atlanta | |||||||
29 | Week 26 | 39 | 2 | Atlanta | |||||||
30 | Week 28 | 39 | 2 | Atlanta | |||||||
31 | Week 30 | 44 | 2 | Atlanta | |||||||
32 | Week 32 | 43 | 2 | Atlanta | |||||||
33 | Week 2 | 44 | 3 | Los Angeles | |||||||
34 | Week 4 | 43 | 3 | Los Angeles | |||||||
35 | Week 6 | 38 | 3 | Los Angeles | |||||||
36 | Week 8 | 40 | 3 | Los Angeles | |||||||
37 | Week 10 | 37 | 3 | Los Angeles | |||||||
38 | Week 12 | 35 | 3 | Los Angeles | |||||||
39 | Week 14 | 37 | 3 | Los Angeles | |||||||
40 | Week 16 | 35 | 3 | Los Angeles | |||||||
41 | Week 18 | 40 | 3 | Los Angeles | |||||||
42 | Week 20 | 37 | 3 | Los Angeles | |||||||
43 | Week 22 | 36 | 3 | Los Angeles | |||||||
44 | Week 24 | 39 | 3 | Los Angeles | |||||||
45 | Week 26 | 40 | 3 | Los Angeles | |||||||
46 | Week 28 | 36 | 3 | Los Angeles | |||||||
47 | Week 30 | 44 | 3 | Los Angeles | |||||||
48 | Week 32 | 39 | 3 | Los Angeles | |||||||
49 | Week 2 | 31 | 4 | Portland | |||||||
50 | Week 4 | 43 | 4 | Portland | |||||||
51 | Week 6 | 44 | 4 | Portland | |||||||
52 | Week 8 | 43 | 4 | Portland | |||||||
53 | Week 10 | 36 | 4 | Portland | |||||||
54 | Week 12 | 43 | 4 | Portland | |||||||
55 | Week 14 | 46 | 4 | Portland | |||||||
56 | Week 16 | 45 | 4 | Portland | |||||||
57 | Week 18 | 46 | 4 | Portland | |||||||
58 | Week 20 | 36 | 4 | Portland | |||||||
59 | Week 22 | 40 | 4 | Portland | |||||||
60 | Week 24 | 41 | 4 | Portland | |||||||
61 | Week 26 | 40 | 4 | Portland | |||||||
62 | Week 28 | 39 | 4 | Portland | |||||||
63 | Week 30 | 38 | 4 | Portland | |||||||
64 | Week 32 | 42 | 4 | Portland | |||||||
40.2 | |||||||||||
40 |
As mentioned earlier, the mid-term will have conceptual and quantitative multiple-choice questions. You need to read all 4 chapters and you need to be able to solve problems in all 4 chapters in order to do well in this test.
The following are for review and learning purposes only. I am not indicating that identical or similar problems will be in the test. As I have indicated in the class syllabus, all the exams in this course will have multiple-choice questions and problems.
Suggestion: treat this review set as you would an actual test. Sit down with your one page of notes and your calculator, and give it a try. That way you will know what areas you still need to study.
ADMN 210
Answers to Review for Midterm #1
1) Classify each of the following as nominal, ordinal, interval, or ratio data.
a. The time required to produce each tire on an assembly line – ratio since it is numeric with a valid 0 point meaning “lack of”
b. The number of quarts of milk a family drinks in a month - ratio since it is numeric with a valid 0 point meaning “lack of”
c. The ranking of four machines in your plant after they have been designated as excellent, good, satisfactory, and poor – ordinal since it is ranking data only
d. The telephone area code of clients in the United States – nominal since it is a label
e. The age of each of your employees - ratio since it is numeric with a valid 0 point meaning “lack of”
f. The dollar sales at the local pizza house each month - ratio since it is numeric with a valid 0 point meaning “lack of”
g. An employee’s identification number – nominal since it is a label
h. The response time of an emergency unit - ratio since it is numeric with a valid 0 point meaning “lack of”
2) True or False: The highest level of data measurement is the ratio-level measurement.
True (you can do the most powerful analysis with this kind of data)
3) True or False: Interval- and ratio-level data are also referred to as categorical data.
False (Interval and ratio level data are numeric and therefore quantitative, NOT qualitative….Nominal is qualitative)
4) A small portion or a subset of the population on which data is collected for conducting statistical analysis is called __________.
A sample! A population is the total group, a census IS the population, and a data set can be either a sample or a population.
5) One of the advantages for taking a sample instead of conducting a census is this:
a sample is more accurate than census
a sample is difficult to take
a sample cannot be trusted
a sample can save money when data collection process is destructive
6) Selection of the winning numbers is a lottery is an example of __________.
convenience sampling
random sampling
nonrandom sampling
regulatory sampling
7) A type of random sampling in which the population is divided into non-overlapping subpopulations is called __________.
stratified random sampling
cluster sampling
systematic random sampling
regulatory sampling
8) A type of random sampling in which every kth item (where k is some number) in the population is selected for inclusion in the sample is called __________.
stratified random sampling
cluster sampling
systematic sampling
regulatory sampling
9) Judgment sampling is an example of __________.
convenience sampling
random sampling
nonrandom (non-probabilistic) sampling
justice department sampling
10) For the following data, construct a frequency distribution with six classes.
57 23 35 18 21
26 51 47 29 21
50 41 19 36 28
31 42 52 29 18
28 46 33 28 20
Class width = (high – low)/6 = (57 – 18)/6 = 6.5. Let’s round up to 7 for convenience. NOTE: each student will have something slightly different!
Class Interval Frequency
18 - under 25 8 just count up how many observations are 18 through 24
25 - under 32 8
32 - under 39 3
39 - under 46 4
46 - under 53 6
53 - under 60 1
TOTAL 30
11) What type of graph would be most appropriate for the frequency distribution above?
Pie chart
Bar chart
Pareto diagram
Histogram
12) For the following frequency distribution, determine the relative frequency, percent, and the cumulative frequency.
*Round your answer to 3 decimal places, the tolerance is +/-0.001.
Class Interval Frequency Relative Frequency Percent Cumulative Frequency
20–under 25 17 17/82 = .207* 20.7% 17
25–under 30 20 20/82 = .244* 24.4% 17 + 20 = 37
30–under 35 16 .195* 19.5% 37 + 16 = 53
35–under 40 15 .183* 18.3% 53 + 15 = 68
40–under 45 8 .098* 9.8% 68 + 8 = 76
45–under 50 6 .073* 7.3% 76 + 6 = 82
TOTAL 82 1.000 100.0%
13) True or False: Frequency distribution is a summary of data presented in the form of class intervals and frequencies.
True – that’s the definition of a frequency distribution!
14) True or False: The range of a data set is defined as the difference between the mean and the median.
False – Range is the difference between the highest and lowest numbers in the data!
15) True or False: The sum of the relative frequencies of a grouped data set is always equal to one.
True – don’t forget, relative frequencies are just decimal versions of percentages, and percentages have to add up to 100%.
16) The U.S. Department of the Interior releases figures on mineral production. Following are the values (in billions of dollars) of the 15 leading states in nonfuel mineral production in the United States in 2008.
1.68, 1.81, 1.85, 1.89, 2.05, 2.05, 2.08, 2.74, 3.21, 3.30, 4.00, 4.17, 4.20, 6.48, 7.84
a. Calculate the mean, median, and mode.
Mean = sum of all data/15 = $3.29 billion
Median: the position = 2*(15+1)/4 = 8th location = $2.74 billion
Mode: 2.05 since it is the only value that appears more than once
b. Calculate the range, interquartile range, sample variance, and sample standard deviation.
Range = 7.84 – 1.68 = 6.16
Interquartile range = Q3 – Q1.
Q1 is at the following location: (15+1)/4 = 4th location = $1.89 billion
Q3 is at the following location: 3*(15+1)/4 = 12th location = $4.17 billion
So Interquartile range = 4.17 – 1.89 = 2.28
NOTE: make sure you understand what quartiles mean!
Sample variance = 3.3321 (See below)
Sample standard deviation = 1.8254 (see below)
|
Value ($ billions) |
X-mean |
squared |
|
|
|
1.68 |
-1.61 |
2.5921 |
|
|
|
1.81 |
-1.48 |
2.1904 |
|
|
|
1.85 |
-1.44 |
2.0736 |
|
|
|
1.89 |
-1.40 |
1.9600 |
|
|
|
2.05 |
-1.24 |
1.5376 |
|
|
|
2.05 |
-1.24 |
1.5376 |
|
|
|
2.08 |
-1.21 |
1.4641 |
|
|
|
2.74 |
-0.55 |
0.3025 |
|
|
|
3.21 |
-0.08 |
0.0064 |
|
|
|
3.30 |
0.01 |
0.0001 |
|
|
|
4.00 |
0.71 |
0.5041 |
|
|
|
4.17 |
0.88 |
0.7744 |
|
|
|
4.20 |
0.91 |
0.8281 |
|
|
|
6.48 |
3.19 |
10.1761 |
|
|
|
7.84 |
4.55 |
20.7025 |
|
|
TOTAL |
49.35 |
|
46.6496 |
|
|
MEAN |
3.29 |
Variance |
3.3321 |
=46.6496/(15-1) |
|
|
|
SD |
1.8254 |
=sqrt(3.3321) |
c. Compute the coefficient of skewness for these data and interpret. [Ignore]
Just use the Data Analysis portion of Excel and interpret. It is 1.48, so there is a right skew of the data (slightly long right hand tail)
17) The following graphic of residential housing data (selling price and size in square feet) indicates:
a correlation close to -1
a correlation close to 0 (no relation between the two variables)
a correlation close to 1
a negative relationship between the two variables
18) The Polk Company reported that the average age of a car on U.S. roads in a recent year was 7.5 years.
a) Suppose the distribution of ages of cars on U.S. roads is approximately bell-shaped. If 99.7% of the ages are between 1 year and 14 years, what is the standard deviation of car age?
We know that 99.7% of the data are within 3 standard deviations of the mean = 6.5 years (I found that from 14 – 7.5 or 7.5 – 1). So 6.5/3 = 2.167.
b) Suppose the standard deviation is 1.7 years and the mean is 7.5 years. Between what two values would 95% of the car ages fall?
95% of the data falls within 2 standard deviations of the mean.
So 7.5 + 2 * 1.7 = 10.9, and 7.5 – 2 * 1.7 = 4.1.
19) A large manufacturing firm tests job applicants who recently graduated from college. The test scores are bell shaped with a mean of 500 and a standard deviation of 50.
a) What proportion of people get scores between 400 and 600?
Points are 2 standard deviations away, so 95%
b) What proportion of people get scores higher than 450?
Point is 1 standard deviation away, so 68/2 + 50 = 84%
c) Management is considering placing a new hire in an upper level management position if the person scores in the upper 0.15% of the distribution. What is the lowest score a college graduate can earn to qualify for the position?
(X – 500)/50 = 3 SDs, so X = 500 + 3 * 50 = 650
20) According to the Bureau of Labor Statistics, the average annual salary of a worker in Detroit, Michigan, is $35,748. Suppose the median annual salary for a worker in this group is $31,369 and the mode is $29,500.
a) Is the distribution of salaries for this group skewed? If so, how and why?
Since these three measures are not equal, the distribution is skewed. The distribution is skewed to the right because the mean is greater than the median.
b) Which of these measures of central tendency would you use to describe these data? Why?
Often, the median is preferred in reporting income data because it yields information about the middle of the data while ignoring extremes.
21) True or False: The median is the most frequently occurring value in a set of data. False – the MODE is the most frequently occurring, not the median
22) True or False: A disadvantage of the mean as the measure of central tendency is that it is affected by extremely large or extremely small values in the data set.
True – that’s why you use the median for data sets with outliers!
23) True or False: The variance is the average of the squared deviations about the arithmetic mean for a set of numbers.
True
24) What is the median for the following five numbers? 223, 264, 216, 218, 229
Put the data in order: 216, 218, 223, 229, 256
The center number is the median = 223
25) The second quartile of a data set is always equal to its ________.
Median (by definition)
26) The sum of deviations from the mean for a data set is equal to __________.
Zero…that’s why we have to square the deviations to find the variance and standard deviation!
27) Scores obtained by students in an advanced placement test has a symmetric mound shaped (bell shaped) distribution with a mean of 70 and a standard deviation of 10. What is the proportion of students who received between 60 and 80 points.
60 is 1 standard deviation to the left of center and 80 is 1 standard deviation to the right, so by the empirical rule the answer is about 68%
28) For the previous problem, what is the proportion of students who received less than 50 points?
Find the Z point for 50: (50 – 70)/10 = -2. The area between -2 and +2 is 95%, so the area “less than 50” is (100% - 95%)/2 = 2.5%
29) The following joint probability table contains a breakdown on the age and gender of U.S. physicians in a recent year, as reported by the American Medical Association.
|
Age of U.S. Physicians |
|
||||
|
< 35 |
35 - 44 |
45 - 54 |
55 - 64 |
> 65 |
TOTAL |
Male |
0.11 |
0.20 |
0.19 |
0.12 |
0.16 |
0.78 |
Female |
0.07 |
0.08 |
0.04 |
0.02 |
0.01 |
0.22 |
TOTAL |
0.18 |
0.28 |
0.23 |
0.14 |
0.17 |
1.00 |
a) What is the probability that one randomly selected physician is 35–44 years old?
P(35 – 44) = .28/1.00 = .28
NOTE: in a probability table (as opposed to a frequency table like the one in example #31), you don’t really have to be dividing by the total since the total is 1.00. I write it in to remind you that you MUST divide by something when you are finding probabilities!
b) What is the probability that one randomly selected physician is both a woman and 45–54 years old?
P(woman and 45 – 54) = intersection = 0.04/1.00 = .04
c) What is the probability that one randomly selected physician is a man or is 35–44 years old?
P(man or 35 – 44) = .78 + .28 - .20 = .86/1.00 = .86
d) What is the probability that one randomly selected physician is less than 35 years old or 55–64 years old?
P(< 35 or 55 – 64) = .18/1.00 + .14/1.00 = .32 (NOTE: no need to subtract anything since there are no “common points”…that is, those two categories are mutually exclusive)
e) What is the probability that one randomly selected physician is a woman if she is 45–54 years old?
P(woman | 45 – 54) = .04/.23 = 0.1739
f) What is the probability that a randomly selected physician is neither a woman nor 55–64 years old?
P(not woman and not 55 – 64) = P(man and <54 or >65)
= (.11+.2+.19+.16)/1.00 = .66
30) Purchasing Survey asked purchasing professionals what sales traits impressed them most in a sales representative. Seventy-eight percent selected "thoroughness." Forty percent responded "knowledge of your own product." The purchasing professionals were allowed to list more than one trait. Suppose 27% of the purchasing professionals listed both "thoroughness" and "knowledge of your own product" as sales traits that impressed them most. A purchasing professional is randomly sampled.
a) Make a probability table including the above information.
b) What is the probability that the professional selected "thoroughness" or "knowledge of your own product"?
|
Mentioned knowledge |
Didn’t mention knowledge |
TOTAL |
Mentioned thoroughness |
.27 |
.78 - .27 = .51 |
.78 |
Didn’t mention thoroughness |
.40 - .27 = .13 |
.60 - .51 = .09 |
1 - .78 = .22 |
TOTAL |
.40 |
1 - .40 = .60 |
1.00 |
So P(thorough or knowledge) = (.78 + .40 - .27)/1.00 = .91
c) What is the probability that the professional selected neither "thoroughness" nor "knowledge of your own product"?
P(neither thorough nor knowledge) = P(not thorough and not knowledge)
= intersection = 0.09/1.00 = 0.09
d) If it is known that the professional selected "thoroughness," what is the probability that the professional selected "knowledge of your own product"?
P(knowledge | thorough) = .27/.78 = 0.346
e) What is the probability that the professional did not select "thoroughness" and did select "knowledge of your own product"?
P(didn’t mention thoroughness and did mention knowledge) = intersection
= 0.13/1.00 = 0.13
31) The table below contains data from a sample of 200 people regarding opinion about the latest congressional plan to eliminate anti-trust exemptions for professional baseball (broken down by gender).
|
OPINION ABOUT THE PLAN |
|
||
|
For |
Neutral |
Against |
Totals |
Female |
38 |
54 |
12 |
104 |
Male |
12 |
36 |
48 |
96 |
Totals |
50 |
90 |
60 |
200 |
Please show your work for parts "a" through "e" or no credit will be given!
a) What is the probability that a person selected at random is for the plan?
P(for) = 50/200 = .25
b) If we know that the person is a female, what is the probability that the person is for the plan?
P(for | female) = 38/104 = .365
c) What is the probability that the person is male and against the plan?
P(male and against) = 48/200 = .24
d) What is the probability that the person is male or is neutral about the plan?
P(male or neutral) = (96+90-36)/200 = .75
e) Is opinion about the plan related to gender, or are opinion and gender independent? Please use statistical concepts and numerical calculations in your answer, or no credit will be given.
Check to see if P(A) = P(A|B) = P(A|C) etc.
Is P(for the plan) = P(for | female)? .25 ≠ .365 so NOT independent
32) True or False: If two events are independent, the joint probability of the two events is always equal to the product of the marginal probabilities of two events.
True – Think about it…P(A and B) = P(A) * P(B | A). But if A and B are independent, then P(B | A) is the same as P(B). In other words, if A and B are independent, the P(A and B) = P(A) * P(B). We will use that in chapter 5 and more!
33) True or False: If the conditional probability of an event A given another event B is same as the marginal probability of the event A, then events A and B are mutually exclusive.
False – as I just said, if P(A | B) = P(A), that means that A and B are independent…that doesn’t mean that A and B are mutually exclusive. Remember: if A and B are mutually exclusive, then if one happens, the other can’t…in other words, P(A and B) = 0.
34) If the occurrence or non-occurrence of one event does not affect the occurrence or non-occurrence of another event, the two events are ________________________. Independent (by definition)
35) A listing of all elementary outcomes (i.e. the outcomes which cannot be broken down into other events) of an experiment (i.e. a decision making situation under uncertainty) is called a __________.
sample space
36) How many different combinations of a 3-member debating team can be formed from a group of 16 qualified students?
16C3 = 16!/3!(16-3)! = 16 * 15 * 14/(3 * 2 * 1) = 560
Page 9
70
80
90
100
110
120
130
140016001800200022002400
Square Feet
Selling Price ($1,000)
Review for Midterm #1
As mentioned earlier, the mid-term will have conceptual and quantitative multiple-choice questions. You need to read all 4 chapters and you need to be able to solve problems in all 4 chapters in order to do well in this test.
The following are for review and learning purposes only. I am not indicating that identical or similar problems will be in the test. As I have indicated many times, all the exams in this course will have multiple-choice questions and problems.
Suggestion: treat this review set as you would an actual test. Sit down with your one page of notes and your calculator, and give it a try. That way you will know what areas you still need to study.
1) Classify each of the following as nominal, ordinal, interval, or ratio data.
a. The time required to produce each tire on an assembly line
b. The number of quarts of milk a family drinks in a month
c. The ranking of four machines in your plant after they have been designated as excellent, good, satisfactory, and poor
d. The telephone area code of clients in the United States
e. The age of each of your employees
f. The dollar sales at the local pizza house each month
g. An employee’s identification number
h. The response time of an emergency unit
2) True or False: The highest level of data measurement is the ratio-level measurement.
3) True or False: Interval- and ratio-level data are also referred to as categorical data.
4) A small portion or a subset of the population on which data is collected for conducting statistical analysis is called __________.
5) One of the advantages for taking a sample instead of conducting a census is this:
a sample is more accurate than census
a sample is difficult to take
a sample cannot be trusted
a sample can save money when data collection process is destructive
6) Selection of the winning numbers is a lottery is an example of __________.
convenience sampling
random sampling
nonrandom sampling
regulatory sampling
7) A type of random sampling in which the population is divided into non-overlapping subpopulations is called __________.
stratified random sampling
cluster sampling
systematic random sampling
regulatory sampling
8) A type of random sampling in which every kth item (where k is some number) in the population is selected for inclusion in the sample is called __________.
stratified random sampling
cluster sampling
systematic sampling
regulatory sampling
9) Judgment sampling is an example of __________.
convenience sampling
random sampling
nonrandom (non-probabilistic) sampling
justice department sampling
10) For the following data, construct a frequency distribution with six classes.
57 23 35 18 21
26 51 47 29 21
46 43 29 23 39
50 41 19 36 28
31 42 52 29 18
28 46 33 28 20
11) What type of graph would be most appropriate for the frequency distribution above?
Pie chart
Bar chart
Pareto diagram
Histogram
12) For the following frequency distribution, determine the relative frequency, percent, and the cumulative frequency.
*Round your answer to 3 decimal places, the tolerance is +/-0.001.
Class Interval Frequency
20–under 25 17
25–under 30 20
30–under 35 16
35–under 40 15
40–under 45 8
45–under 50 6
TOTAL 82
13) True or False: Frequency distribution is a summary of data presented in the form of class intervals and frequencies.
14) True or False: The range of a data set is defined as the difference between the mean and the median.
15) True or False: The sum of the relative frequencies of a grouped data set is always equal to one.
16) The U.S. Department of the Interior releases figures on mineral production. Following are the values (in billions of dollars) of the 15 leading states in nonfuel mineral production in the United States in 2008.
1.68, 1.81, 1.85, 1.89, 2.05, 2.05, 2.08, 2.74, 3.21, 3.30, 4.00, 4.17, 4.20, 6.48, 7.84
a. Calculate the mean, median, and mode.
b. Calculate the range, interquartile range, sample variance, and sample standard deviation.
c. Compute the coefficient of skewness for these data and interpret.
17) The following graphic of residential housing data (selling price and size in square feet) indicates:
a correlation close to -1
a correlation close to 0 (no relation between the two variables)
a correlation close to 1
a negative relationship between the two variables
18) The Polk Company reported that the average age of a car on U.S. roads in a recent year was 7.5 years.
a) Suppose the distribution of ages of cars on U.S. roads is approximately bell-shaped. If 99.7% of the ages are between 1 year and 14 years, what is the standard deviation of car age?
b) Suppose the standard deviation is 1.7 years and the mean is 7.5 years. Between what two values would 95% of the car ages fall?
19) A large manufacturing firm tests job applicants who recently graduated from college. The test scores are bell shaped with a mean of 500 and a standard deviation of 50.
a) What proportion of people get scores between 400 and 600?
b) What proportion of people get scores higher than 450?
c) Management is considering placing a new hire in an upper level management position if the person scores in the upper 0.15% of the distribution. What is the lowest score a college graduate can earn to qualify for the position?
20) According to the Bureau of Labor Statistics, the average annual salary of a worker in Detroit, Michigan, is $35,748. Suppose the median annual salary for a worker in this group is $31,369 and the mode is $29,500.
a) Is the distribution of salaries for this group skewed? If so, how and why?
b) Which of these measures of central tendency would you use to describe these data? Why?
21) True or False: The median is the most frequently occurring value in a set of data.
22) True or False: A disadvantage of the mean as the measure of central tendency is that it is affected by extremely large or extremely small values in the data set.
23) True or False: The variance is the average of the squared deviations about the arithmetic mean for a set of numbers.
24) What is the median for the following five numbers? 223, 264, 216, 218, 229
25) The second quartile of a data set is always equal to its ________.
26) The sum of deviations from the mean for a data set is equal to __________.
27) Scores obtained by students in an advanced placement test has a symmetric mound shaped (bell shaped) distribution with a mean of 70 and a standard deviation of 10. What is the proportion of students who received between 60 and 80 points.
28) For the previous problem, what is the proportion of students who received less than 50 points?
29) The following joint probability table contains a breakdown on the age and gender of U.S. physicians in a recent year, as reported by the American Medical Association.
|
Age of U.S. Physicians |
|
||||
|
< 35 |
35 - 44 |
45 - 54 |
55 - 64 |
> 65 |
TOTAL |
Male |
0.11 |
0.20 |
0.19 |
0.12 |
0.16 |
0.78 |
Female |
0.07 |
0.08 |
0.04 |
0.02 |
0.01 |
0.22 |
TOTAL |
0.18 |
0.28 |
0.23 |
0.14 |
0.17 |
1.00 |
a) What is the probability that one randomly selected physician is 35–44 years old?
b) What is the probability that one randomly selected physician is both a woman and 45–54 years old?
c) What is the probability that one randomly selected physician is a man or is 35–44 years old?
d) What is the probability that one randomly selected physician is less than 35 years old or 55–64 years old?
e) What is the probability that one randomly selected physician is a woman if she is 45–54 years old?
f) What is the probability that a randomly selected physician is neither a woman nor 55–64 years old?
30) Purchasing Survey asked purchasing professionals what sales traits impressed them most in a sales representative. Seventy-eight percent selected "thoroughness." Forty percent responded "knowledge of your own product." The purchasing professionals were allowed to list more than one trait. Suppose 27% of the purchasing professionals listed both "thoroughness" and "knowledge of your own product" as sales traits that impressed them most. A purchasing professional is randomly sampled.
a) Make a probability table including the above information.
b) What is the probability that the professional selected "thoroughness" or "knowledge of your own product"?
c) What is the probability that the professional selected neither "thoroughness" nor "knowledge of your own product"?
d) If it is known that the professional selected "thoroughness," what is the probability that the professional selected "knowledge of your own product"?
e) What is the probability that the professional did not select "thoroughness" and did select "knowledge of your own product"?
31) From a previous midterm: The table below contains data from a sample of 200 people regarding opinion about the latest congressional plan to eliminate anti-trust exemptions for professional baseball (broken down by gender).
|
OPINION ABOUT THE PLAN |
|
||
|
For |
Neutral |
Against |
Totals |
Female |
38 |
54 |
12 |
104 |
Male |
12 |
36 |
48 |
96 |
Totals |
50 |
90 |
60 |
200 |
Please show your work for parts "a" through "e" or no credit will be given!
a) What is the probability that a person selected at random is for the plan?
b) If we know that the person is a female, what is the probability that the person is for the plan?
c) What is the probability that the person is male and is against the plan?
d) What is the probability that the person is male or is neutral about the plan?
e) Is opinion about the plan related to gender, or are opinion and gender independent? Please use statistical concepts and numerical calculations in your answer.
32) True or False: If two events are independent, the joint probability of the two events is always equal to the product of the marginal probabilities of two events.
33) True or False: If the conditional probability of an event A given another event B is same as the marginal probability of the event A, then events A and B are mutually exclusive.
34) If the occurrence or non-occurrence of one event does not affect the occurrence or non-occurrence of another event, the two events are ________________________.
35) A listing of all elementary outcomes (i.e. the outcomes which cannot be broken down into other events) of an experiment (i.e. a decision making situation under uncertainty) is called a __________.
36) How many different combinations of a 3-member debating team can be formed from a group of 16 qualified students?
Page 2
70
80
90
100
110
120
130
140016001800200022002400
Square Feet
Selling Price ($1,000)
Name: XXXXXX
Analysis Assignment
6/6/2017
In this analysis assignment, we will use the Chi-Square to analyze whether a management
training program is related to the promotion of managers and use the ANOVA to analyze
whether the satisfaction rate of employees in four different offices is the same.
1. Chi-Square
In the Chi-square testing, the total number of employees who didn’t promote is 40
employees. And among the 40 employees, it is expected that 17.6 employees didn’t participate
in the training program and 22.4 employees participated in the program. However, the real
outcome is that among 40 employees, 27 employees, which is 67.5% of the total employees
who didn’t promote, didn’t participate in the training program, while 13 employees, which is
32.5% of the total employees who didn’t promote, participated in the program.
In addition, the total number of employees who promoted is 60 employees. And among
the 60 employees, it is expected that 26.4 employees didn’t participate in the program and 33.6
Pearson Chi-square < 0.05, chance is not the
only factor that causes differences.
employees participated in the program. However, the real outcome is that among 60 employees,
17 employees, which is 28.3% of the total employees who promoted, didn’t participate in the
program and 43 employees, which is 71.7% of the employees who promoted, participate in the
program.
Therefore, the management training program is more efficient than expected in helping
employees to promote. Clearly, the management training program is related to the promotion
of employees.
2. ANOVA In the ANOVA test, we analyze the satisfaction rate data of different location of
offices to see if the satisfaction rate in four offices are the same. According to the charts, the p-
value is 0.034 which is less than 0.05. Therefore, we reject the null hypothesis that data is from
a sample population with the same mean. So, the satisfaction rate in four offices are not the
same. Moreover, with the scatter gram below, we can figure out that office 3 has the smallest
variance (which means the data within the group has smallest difference) while it contains the
smallest data value; the office 2 has the largest variance (which means the data within group
has largest difference) while it contains the biggest data value.
The P value in first subset >0.05, don’t have distinct
differences; The P value in second subset <0.05,
have distinct differences
Scatter Gram
0
10
20
30
40
50
60
0 1 2 3 4 5
mothly satisfaction rating of different offices
XXXXXXXXXXX ARE 112
Analysis Assignment A
For this analysis assignment, we were introduced to the program SPSS. This is a statistical tool to help us find certain statistics about a given data set. For Data Set 1, we used the Chi-Square test to see if there was a relation between a management training program and promotion. The most important statistical values for this data are the “count” and “expected count.” The “count” showed the relationship between who was promoted and who was not for each variable. The “expected count” showed the difference between who was actually promoted and who was expected to be promoted. The difference between “expected” and “actual” can also be referred to as “residual.” All the variables have the same residual value of 9.4, however, half are negative, and the other half is positive. The employees that had a positive residual value were the ones that were not promoted and did not have management training and the ones that were promoted and did have management training. The negative residual value were the employees that were not promoted and had management training and the ones that were promoted and did not have management training.
For Data Set 2, we used a different test, the ANOVA test, to see monthly employee
satisfaction rates at offices with different locations. The test results tell us that the p-value equals 0.034, thus all office employee satisfactions are different. There are four different office locations that are placed into two different “subsets.” Each of these subsets are classified as homogeneous, meaning that each office in a certain subset is alike. From the ANOVA test, we were able to identify which offices had homogenous employee satisfaction rates. The test also told us which offices had differing employee satisfaction rates. Subset 1 contained offices 1,3, and 4, while subset 2 contained offices 2,3, and 4. This tells us that the only offices that varied in employee satisfaction were office 1 and 2.
XXXXXXXXXX ARE 112 Analysis 6/7/17
Promoted * Management_Training_Program Crosstabulation
Management_Training_Program
Total No Yes Promoted No Count 27 13 40
Expected Count 17.6 22.4 40.0 % within Promoted 67.5% 32.5% 100.0% % within Management_Training_Program 61.4% 23.2% 40.0% % of Total 27.0% 13.0% 40.0% Residual 9.4 -9.4
Yes Count 17 43 60 Expected Count 26.4 33.6 60.0 % within Promoted 28.3% 71.7% 100.0% % within Management_Training_Program 38.6% 76.8% 60.0% % of Total 17.0% 43.0% 60.0% Residual -9.4 9.4
Total Count 44 56 100 Expected Count 44.0 56.0 100.0 % within Promoted 44.0% 56.0% 100.0% % within Management_Training_Program 100.0% 100.0% 100.0% % of Total 44.0% 56.0% 100.0%
From the Chi squared analysis and the management training program crosstabulation there is a statistical importance of the expected and observed promotions. The crosstabulation showed that an expected 17.6 employees who did not complete the training program would bot be promoted. The actual amount of employees who were not trained and did not get promoted was 27. This value is proven to be significantly different because the Chi squared analysis resulted in a less than one in a thousand chance of this data occurring without there being a correlation. Because of the Chi squared test result the management training program and promotion increased the amount of those promoted, expectedly 33.6 and actually 43, and had the reverse effect on those who did not get the management training and were not promoted. These differences in the data are statistically relevant; going to the management training would increase the chances of getting a promotion.
Chi-Square Tests
Value df Asymptotic
Significance (2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided) Pearson Chi-Square 14.942a 1 .000 Continuity Correctionb 13.395 1 .000 Likelihood Ratio 15.211 1 .000 Fisher's Exact Test .000 .000 N of Valid Cases 100 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 17.60. b. Computed only for a 2x2 table
XXXXXXXXXX ARE 112 Analysis 6/7/17
From the ANOVA analysis there is used to determine if the level of employ satisfaction is the same in the four offices. About the same level of satisfaction is seen in offices 1, 3, and 4. Offices 2, 3, and 4 also have very similar levels of satisfaction across the offices. There is a noticeable difference in the satisfaction of office 1 and 2. Office 1 had 39.25 and office 2 had a
satisfaction of 45.88. The ANOVA analysis gives the manager the insight of the noticeable difference in satisfaction of offices 1 and 2. These differences call to the manager’s attention that they should address the offices and determine why the satisfaction is varying.
Monthly_Satisfaction_Rating Tukey HSDa
Office Code N Subset for alpha = 0.05 1 2
1 8 39.25 3 8 39.38 39.38 4 8 41.38 41.38 2 8 45.88 Sig. .813 .053 Means for groups in homogeneous subsets are displayed. a. Uses Harmonic Mean Sample Size = 8.000.
- Analysis Assignment (1)
- ARE 112 SPSS A
- ARE112 AA
Employee _Ref Promoted
Management_Training _Program
1 No Yes 2 No Yes 3 No Yes 4 No Yes 5 No No 6 No No 7 No No 8 No No 9 No No 10 No No 140 No No Here is where the "HIDE" command w 141 No No 142 No No 143 No No 144 Yes No 145 Yes No 146 Yes No 147 Yes Yes 148 Yes Yes 149 Yes Yes 150 Yes Yes
Formula # Promoted Training Program Formula
Yes 96 83 No 54 67 p-value = 0.02701
Total 150 150
# Promoted Training Program Yes 64.0% 55.3% No 36.0% 44.7%
Total 100.0% 100.0%
We are testing the idea that Promotion = function (Going to the training program).
=COUNTIF(B2:B151,$A$154)
In Per Cent
Chi square test with Excel form
=CHISQ.TEST(C154:C155,B
ARE 122 - Spring 2020 Chi Square Test
ws used
mula
B154:B155)
ARE 122 - Spring 2020 Chi Square Test

Get help from top-rated tutors in any subject.
Efficiently complete your homework and academic assignments by getting help from the experts at homeworkarchive.com