Chi_Sq_DATAEmployee_RefPromotedManagement_Training

Chi_Sq_DATA

Employee_Ref	Promoted	Management_Training_Program
1	No	No
2	Yes	Yes
3	No	No
4	Yes	Yes
5	No	No
6	No	No
7	Yes	Yes
8	Yes	Yes
9	Yes	Yes
10	Yes	Yes
11	No	No
12	No	No
13	Yes	Yes
14	No	No
15	Yes	No
16	Yes	Yes
17	No	Yes
18	Yes	Yes
19	Yes	Yes
20	Yes	Yes
21	Yes	Yes
22	Yes	Yes
23	No	No
24	Yes	Yes
25	No	No
26	Yes	Yes
27	No	No
28	Yes	Yes
29	Yes	Yes
30	No	No
31	No	No
32	Yes	Yes
33	Yes	Yes
34	No	No
35	Yes	Yes
36	No	No
37	Yes	Yes
38	No	No
39	Yes	Yes
40	No	No
41	No	No
42	Yes	Yes
43	Yes	Yes
44	Yes	Yes
45	Yes	Yes
46	No	No
47	No	No
48	No	No
49	Yes	No
50	Yes	Yes
51	Yes	Yes
52	Yes	Yes
53	Yes	Yes
54	Yes	Yes
55	No	No
56	Yes	Yes
57	No	No
58	Yes	Yes
59	No	No
60	No	No
61	Yes	Yes
62	Yes	Yes
63	Yes	Yes
64	Yes	Yes
65	No	No
66	Yes	Yes
67	No	No
68	No	No
69	No	No
70	No	No
71	No	No
72	No	No
73	Yes	Yes
74	No	Yes
75	No	No
76	No	No
77	No	No
78	Yes	Yes
79	No	Yes
80	Yes	Yes
81	Yes	Yes
82	No	Yes
83	No	No
84	No	No
85	No	No
86	No	No
87	No	No
88	No	No
89	No	No
90	Yes	Yes
91	No	No
92	Yes	Yes
93	No	Yes
94	Yes	Yes
95	Yes	No
96	Yes	Yes
97	Yes	Yes
98	Yes	Yes
99	No	No
100	No	No
101	No	No
102	Yes	Yes
103	Yes	Yes
104	Yes	Yes
105	Yes	Yes
106	Yes	Yes
107	Yes	Yes
108	Yes	Yes
109	No	No
110	Yes	Yes
111	No	No
112	Yes	Yes
113	No	No
114	Yes	Yes
115	Yes	No
116	No	No
117	Yes	Yes
118	No	No
119	No	Yes
120	Yes	Yes
121	No	No
122	Yes	Yes
123	Yes	Yes
124	No	No
125	Yes	Yes
126	No	No
127	Yes	Yes
128	Yes	Yes
129	Yes	Yes
130	Yes	Yes
131	Yes	No
132	No	No
133	Yes	Yes
134	Yes	Yes
135	No	No
136	Yes	Yes
137	Yes	Yes
138	Yes	Yes
139	No	No
140	No	No	Here is where the "HIDE" command was used
141	No	No
142	No	No
143	No	No
144	No	No
145	Yes	Yes
146	No	No	We are testing the idea that Promotion = function (Going to the training program).
147	Yes	Yes
148	Yes	Yes
149	No	No
150	No	Yes	Chi square test with Excel formula
Formula	=COUNTIF(B2:B151,$A$154)
#	Promoted	Training Program	Formula	=CHISQ.TEST(C154:C155,D154:D155)
Yes
No			p-value =
Total
In Per Cent
#	Promoted	Training Program
Yes
No
Total

ANOVA_DATA

Item_#	Week_Code	Satisfaction_Level	Office_Code	Office
1	Week 2	38	1	Boston	Week #	Boston	Atlanta	Los Angeles	Portland
2	Week 4	40	1	Boston	1	38	42	44	31
3	Week 6	30	1	Boston	2	40	45	43	43
4	Week 8	42	1	Boston	3	30	44	38	44
5	Week 10	36	1	Boston	4	42	43	40	43
6	Week 12	41	1	Boston	5	36	46	37	36
7	Week 14	40	1	Boston	6	41	42	35	43
8	Week 16	42	1	Boston	7	40	35	37	46
9	Week 18	36	1	Boston	8	42	43	35	45
10	Week 20	41	1	Boston	9	36	46	40	46
11	Week 22	40	1	Boston	10	41	43	37	36
12	Week 24	35	1	Boston	11	40	42	36	40
13	Week 26	34	1	Boston	12	35	45	39	41
14	Week 28	41	1	Boston	13	34	39	40	40
15	Week 30	39	1	Boston	14	41	39	36	39
16	Week 32	42	1	Boston	15	39	44	44	38
17	Week 2	42	2	Atlanta	16	42	43	39	42
18	Week 4	45	2	Atlanta	Average	38.6	42.6	38.8	40.8
19	Week 6	44	2	Atlanta
20	Week 8	43	2	Atlanta
21	Week 10	46	2	Atlanta
22	Week 12	42	2	Atlanta
23	Week 14	35	2	Atlanta
24	Week 16	43	2	Atlanta
25	Week 18	46	2	Atlanta
26	Week 20	43	2	Atlanta
27	Week 22	42	2	Atlanta
28	Week 24	45	2	Atlanta
29	Week 26	39	2	Atlanta
30	Week 28	39	2	Atlanta
31	Week 30	44	2	Atlanta
32	Week 32	43	2	Atlanta
33	Week 2	44	3	Los Angeles
34	Week 4	43	3	Los Angeles
35	Week 6	38	3	Los Angeles
36	Week 8	40	3	Los Angeles
37	Week 10	37	3	Los Angeles
38	Week 12	35	3	Los Angeles
39	Week 14	37	3	Los Angeles
40	Week 16	35	3	Los Angeles
41	Week 18	40	3	Los Angeles
42	Week 20	37	3	Los Angeles
43	Week 22	36	3	Los Angeles
44	Week 24	39	3	Los Angeles
45	Week 26	40	3	Los Angeles
46	Week 28	36	3	Los Angeles
47	Week 30	44	3	Los Angeles
48	Week 32	39	3	Los Angeles
49	Week 2	31	4	Portland
50	Week 4	43	4	Portland
51	Week 6	44	4	Portland
52	Week 8	43	4	Portland
53	Week 10	36	4	Portland
54	Week 12	43	4	Portland
55	Week 14	46	4	Portland
56	Week 16	45	4	Portland
57	Week 18	46	4	Portland
58	Week 20	36	4	Portland
59	Week 22	40	4	Portland
60	Week 24	41	4	Portland
61	Week 26	40	4	Portland
62	Week 28	39	4	Portland
63	Week 30	38	4	Portland
64	Week 32	42	4	Portland
		40.2
		40

As mentioned earlier, the mid-term will have conceptual and quantitative multiple-choice questions. You need to read all 4 chapters and you need to be able to solve problems in all 4 chapters in order to do well in this test.

The following are for review and learning purposes only. I am not indicating that identical or similar problems will be in the test. As I have indicated in the class syllabus, all the exams in this course will have multiple-choice questions and problems.

Suggestion: treat this review set as you would an actual test. Sit down with your one page of notes and your calculator, and give it a try. That way you will know what areas you still need to study.

ADMN 210

Answers to Review for Midterm #1

1) Classify each of the following as nominal, ordinal, interval, or ratio data.

a. The time required to produce each tire on an assembly line – ratio since it is numeric with a valid 0 point meaning “lack of”

b. The number of quarts of milk a family drinks in a month - ratio since it is numeric with a valid 0 point meaning “lack of”

c. The ranking of four machines in your plant after they have been designated as excellent, good, satisfactory, and poor – ordinal since it is ranking data only

d. The telephone area code of clients in the United States – nominal since it is a label

e. The age of each of your employees - ratio since it is numeric with a valid 0 point meaning “lack of”

f. The dollar sales at the local pizza house each month - ratio since it is numeric with a valid 0 point meaning “lack of”

g. An employee’s identification number – nominal since it is a label

h. The response time of an emergency unit - ratio since it is numeric with a valid 0 point meaning “lack of”

2) True or False: The highest level of data measurement is the ratio-level measurement.

True (you can do the most powerful analysis with this kind of data)

3) True or False: Interval- and ratio-level data are also referred to as categorical data.

False (Interval and ratio level data are numeric and therefore quantitative, NOT qualitative….Nominal is qualitative)

4) A small portion or a subset of the population on which data is collected for conducting statistical analysis is called __________.

A sample! A population is the total group, a census IS the population, and a data set can be either a sample or a population.

5) One of the advantages for taking a sample instead of conducting a census is this:

a sample is more accurate than census

a sample is difficult to take

a sample cannot be trusted

a sample can save money when data collection process is destructive

6) Selection of the winning numbers is a lottery is an example of __________.

convenience sampling

random sampling

nonrandom sampling

regulatory sampling

7) A type of random sampling in which the population is divided into non-overlapping subpopulations is called __________.

stratified random sampling

cluster sampling

systematic random sampling

regulatory sampling

8) A type of random sampling in which every kth item (where k is some number) in the population is selected for inclusion in the sample is called __________.

stratified random sampling

cluster sampling

systematic sampling

regulatory sampling

9) Judgment sampling is an example of __________.

convenience sampling

random sampling

nonrandom (non-probabilistic) sampling

justice department sampling

10) For the following data, construct a frequency distribution with six classes.

57 23 35 18 21

26 51 47 29 21

46 43 29 23 39

50 41 19 36 28

31 42 52 29 18

28 46 33 28 20

Class width = (high – low)/6 = (57 – 18)/6 = 6.5. Let’s round up to 7 for convenience. NOTE: each student will have something slightly different!

Class Interval Frequency

18 - under 25 8 just count up how many observations are 18 through 24

25 - under 32 8

32 - under 39 3

39 - under 46 4

46 - under 53 6

53 - under 60 1

TOTAL 30

11) What type of graph would be most appropriate for the frequency distribution above?

Pie chart

Bar chart

Pareto diagram

Histogram

12) For the following frequency distribution, determine the relative frequency, percent, and the cumulative frequency.

*Round your answer to 3 decimal places, the tolerance is +/-0.001.

Class Interval Frequency Relative Frequency Percent Cumulative Frequency

20–under 25 17 17/82 = .207* 20.7% 17

25–under 30 20 20/82 = .244* 24.4% 17 + 20 = 37

30–under 35 16 .195* 19.5% 37 + 16 = 53

35–under 40 15 .183* 18.3% 53 + 15 = 68

40–under 45 8 .098* 9.8% 68 + 8 = 76

45–under 50 6 .073* 7.3% 76 + 6 = 82

TOTAL 82 1.000 100.0%

13) True or False: Frequency distribution is a summary of data presented in the form of class intervals and frequencies.

True – that’s the definition of a frequency distribution!

14) True or False: The range of a data set is defined as the difference between the mean and the median.

False – Range is the difference between the highest and lowest numbers in the data!

15) True or False: The sum of the relative frequencies of a grouped data set is always equal to one.

True – don’t forget, relative frequencies are just decimal versions of percentages, and percentages have to add up to 100%.

16) The U.S. Department of the Interior releases figures on mineral production. Following are the values (in billions of dollars) of the 15 leading states in nonfuel mineral production in the United States in 2008.

1.68, 1.81, 1.85, 1.89, 2.05, 2.05, 2.08, 2.74, 3.21, 3.30, 4.00, 4.17, 4.20, 6.48, 7.84

a. Calculate the mean, median, and mode.

Mean = sum of all data/15 = $3.29 billion

Median: the position = 2*(15+1)/4 = 8th location = $2.74 billion

Mode: 2.05 since it is the only value that appears more than once

b. Calculate the range, interquartile range, sample variance, and sample standard deviation.

Range = 7.84 – 1.68 = 6.16

Interquartile range = Q3 – Q1.

Q1 is at the following location: (15+1)/4 = 4th location = $1.89 billion

Q3 is at the following location: 3*(15+1)/4 = 12th location = $4.17 billion

So Interquartile range = 4.17 – 1.89 = 2.28

NOTE: make sure you understand what quartiles mean!

Sample variance = 3.3321 (See below)

Sample standard deviation = 1.8254 (see below)

	Value ($ billions)	X-mean	squared
	1.68	-1.61	2.5921
	1.81	-1.48	2.1904
	1.85	-1.44	2.0736
	1.89	-1.40	1.9600
	2.05	-1.24	1.5376
	2.05	-1.24	1.5376
	2.08	-1.21	1.4641
	2.74	-0.55	0.3025
	3.21	-0.08	0.0064
	3.30	0.01	0.0001
	4.00	0.71	0.5041
	4.17	0.88	0.7744
	4.20	0.91	0.8281
	6.48	3.19	10.1761
	7.84	4.55	20.7025
TOTAL	49.35		46.6496
MEAN	3.29	Variance	3.3321	=46.6496/(15-1)
		SD	1.8254	=sqrt(3.3321)

c. Compute the coefficient of skewness for these data and interpret. [Ignore]

Just use the Data Analysis portion of Excel and interpret. It is 1.48, so there is a right skew of the data (slightly long right hand tail)

17) The following graphic of residential housing data (selling price and size in square feet) indicates:

a correlation close to -1

a correlation close to 0 (no relation between the two variables)

a correlation close to 1

a negative relationship between the two variables

18) The Polk Company reported that the average age of a car on U.S. roads in a recent year was 7.5 years.

a) Suppose the distribution of ages of cars on U.S. roads is approximately bell-shaped. If 99.7% of the ages are between 1 year and 14 years, what is the standard deviation of car age?

We know that 99.7% of the data are within 3 standard deviations of the mean = 6.5 years (I found that from 14 – 7.5 or 7.5 – 1). So 6.5/3 = 2.167.

b) Suppose the standard deviation is 1.7 years and the mean is 7.5 years. Between what two values would 95% of the car ages fall?

95% of the data falls within 2 standard deviations of the mean.

So 7.5 + 2 * 1.7 = 10.9, and 7.5 – 2 * 1.7 = 4.1.

19) A large manufacturing firm tests job applicants who recently graduated from college. The test scores are bell shaped with a mean of 500 and a standard deviation of 50.

a) What proportion of people get scores between 400 and 600?

Points are 2 standard deviations away, so 95%

b) What proportion of people get scores higher than 450?

Point is 1 standard deviation away, so 68/2 + 50 = 84%

c) Management is considering placing a new hire in an upper level management position if the person scores in the upper 0.15% of the distribution. What is the lowest score a college graduate can earn to qualify for the position?

(X – 500)/50 = 3 SDs, so X = 500 + 3 * 50 = 650

20) According to the Bureau of Labor Statistics, the average annual salary of a worker in Detroit, Michigan, is $35,748. Suppose the median annual salary for a worker in this group is $31,369 and the mode is $29,500.

a) Is the distribution of salaries for this group skewed? If so, how and why?

Since these three measures are not equal, the distribution is skewed. The distribution is skewed to the right because the mean is greater than the median.

b) Which of these measures of central tendency would you use to describe these data? Why?

Often, the median is preferred in reporting income data because it yields information about the middle of the data while ignoring extremes.

21) True or False: The median is the most frequently occurring value in a set of data. False – the MODE is the most frequently occurring, not the median

22) True or False: A disadvantage of the mean as the measure of central tendency is that it is affected by extremely large or extremely small values in the data set.

True – that’s why you use the median for data sets with outliers!

23) True or False: The variance is the average of the squared deviations about the arithmetic mean for a set of numbers.

True

24) What is the median for the following five numbers? 223, 264, 216, 218, 229

Put the data in order: 216, 218, 223, 229, 256

The center number is the median = 223

25) The second quartile of a data set is always equal to its ________.

Median (by definition)

26) The sum of deviations from the mean for a data set is equal to __________.

Zero…that’s why we have to square the deviations to find the variance and standard deviation!

27) Scores obtained by students in an advanced placement test has a symmetric mound shaped (bell shaped) distribution with a mean of 70 and a standard deviation of 10. What is the proportion of students who received between 60 and 80 points.

60 is 1 standard deviation to the left of center and 80 is 1 standard deviation to the right, so by the empirical rule the answer is about 68%

28) For the previous problem, what is the proportion of students who received less than 50 points?

Find the Z point for 50: (50 – 70)/10 = -2. The area between -2 and +2 is 95%, so the area “less than 50” is (100% - 95%)/2 = 2.5%

29) The following joint probability table contains a breakdown on the age and gender of U.S. physicians in a recent year, as reported by the American Medical Association.

	Age of U.S. Physicians
	< 35	35 - 44	45 - 54	55 - 64	> 65	TOTAL
Male	0.11	0.20	0.19	0.12	0.16	0.78
Female	0.07	0.08	0.04	0.02	0.01	0.22
TOTAL	0.18	0.28	0.23	0.14	0.17	1.00

a) What is the probability that one randomly selected physician is 35–44 years old?

P(35 – 44) = .28/1.00 = .28

NOTE: in a probability table (as opposed to a frequency table like the one in example #31), you don’t really have to be dividing by the total since the total is 1.00. I write it in to remind you that you MUST divide by something when you are finding probabilities!

b) What is the probability that one randomly selected physician is both a woman and 45–54 years old?

P(woman and 45 – 54) = intersection = 0.04/1.00 = .04

c) What is the probability that one randomly selected physician is a man or is 35–44 years old?

P(man or 35 – 44) = .78 + .28 - .20 = .86/1.00 = .86

d) What is the probability that one randomly selected physician is less than 35 years old or 55–64 years old?

P(< 35 or 55 – 64) = .18/1.00 + .14/1.00 = .32 (NOTE: no need to subtract anything since there are no “common points”…that is, those two categories are mutually exclusive)

e) What is the probability that one randomly selected physician is a woman if she is 45–54 years old?

P(woman | 45 – 54) = .04/.23 = 0.1739

f) What is the probability that a randomly selected physician is neither a woman nor 55–64 years old?

P(not woman and not 55 – 64) = P(man and <54 or >65)

= (.11+.2+.19+.16)/1.00 = .66

30) Purchasing Survey asked purchasing professionals what sales traits impressed them most in a sales representative. Seventy-eight percent selected "thoroughness." Forty percent responded "knowledge of your own product." The purchasing professionals were allowed to list more than one trait. Suppose 27% of the purchasing professionals listed both "thoroughness" and "knowledge of your own product" as sales traits that impressed them most. A purchasing professional is randomly sampled.

a) Make a probability table including the above information.

b) What is the probability that the professional selected "thoroughness" or "knowledge of your own product"?

	Mentioned knowledge	Didn’t mention knowledge	TOTAL
Mentioned thoroughness	.27	.78 - .27 = .51	.78
Didn’t mention thoroughness	.40 - .27 = .13	.60 - .51 = .09	1 - .78 = .22
TOTAL	.40	1 - .40 = .60	1.00

So P(thorough or knowledge) = (.78 + .40 - .27)/1.00 = .91

c) What is the probability that the professional selected neither "thoroughness" nor "knowledge of your own product"?

P(neither thorough nor knowledge) = P(not thorough and not knowledge)

= intersection = 0.09/1.00 = 0.09

d) If it is known that the professional selected "thoroughness," what is the probability that the professional selected "knowledge of your own product"?

P(knowledge | thorough) = .27/.78 = 0.346

e) What is the probability that the professional did not select "thoroughness" and did select "knowledge of your own product"?

P(didn’t mention thoroughness and did mention knowledge) = intersection

= 0.13/1.00 = 0.13

31) The table below contains data from a sample of 200 people regarding opinion about the latest congressional plan to eliminate anti-trust exemptions for professional baseball (broken down by gender).

	OPINION ABOUT THE PLAN
	For	Neutral	Against	Totals
Female	38	54	12	104
Male	12	36	48	96
Totals	50	90	60	200

Please show your work for parts "a" through "e" or no credit will be given!

a) What is the probability that a person selected at random is for the plan?

P(for) = 50/200 = .25

b) If we know that the person is a female, what is the probability that the person is for the plan?

P(for | female) = 38/104 = .365

c) What is the probability that the person is male and against the plan?

P(male and against) = 48/200 = .24

d) What is the probability that the person is male or is neutral about the plan?

P(male or neutral) = (96+90-36)/200 = .75

e) Is opinion about the plan related to gender, or are opinion and gender independent? Please use statistical concepts and numerical calculations in your answer, or no credit will be given.

Check to see if P(A) = P(A|B) = P(A|C) etc.

Is P(for the plan) = P(for | female)? .25 ≠ .365 so NOT independent

32) True or False: If two events are independent, the joint probability of the two events is always equal to the product of the marginal probabilities of two events.

True – Think about it…P(A and B) = P(A) * P(B | A). But if A and B are independent, then P(B | A) is the same as P(B). In other words, if A and B are independent, the P(A and B) = P(A) * P(B). We will use that in chapter 5 and more!

33) True or False: If the conditional probability of an event A given another event B is same as the marginal probability of the event A, then events A and B are mutually exclusive.

False – as I just said, if P(A | B) = P(A), that means that A and B are independent…that doesn’t mean that A and B are mutually exclusive. Remember: if A and B are mutually exclusive, then if one happens, the other can’t…in other words, P(A and B) = 0.

34) If the occurrence or non-occurrence of one event does not affect the occurrence or non-occurrence of another event, the two events are ________________________. Independent (by definition)

35) A listing of all elementary outcomes (i.e. the outcomes which cannot be broken down into other events) of an experiment (i.e. a decision making situation under uncertainty) is called a __________.

sample space

36) How many different combinations of a 3-member debating team can be formed from a group of 16 qualified students?

16C3 = 16!/3!(16-3)! = 16 * 15 * 14/(3 * 2 * 1) = 560

Page 9

70

80

90

100

110

120

130

140016001800200022002400

Square Feet

Selling Price ($1,000)

ADMN 210

Review for Midterm #1

As mentioned earlier, the mid-term will have conceptual and quantitative multiple-choice questions. You need to read all 4 chapters and you need to be able to solve problems in all 4 chapters in order to do well in this test.

The following are for review and learning purposes only. I am not indicating that identical or similar problems will be in the test. As I have indicated many times, all the exams in this course will have multiple-choice questions and problems.

Suggestion: treat this review set as you would an actual test. Sit down with your one page of notes and your calculator, and give it a try. That way you will know what areas you still need to study.

1) Classify each of the following as nominal, ordinal, interval, or ratio data.

a. The time required to produce each tire on an assembly line

b. The number of quarts of milk a family drinks in a month

c. The ranking of four machines in your plant after they have been designated as excellent, good, satisfactory, and poor

d. The telephone area code of clients in the United States

e. The age of each of your employees

f. The dollar sales at the local pizza house each month

g. An employee’s identification number

h. The response time of an emergency unit

2) True or False: The highest level of data measurement is the ratio-level measurement.

3) True or False: Interval- and ratio-level data are also referred to as categorical data.

4) A small portion or a subset of the population on which data is collected for conducting statistical analysis is called __________.

5) One of the advantages for taking a sample instead of conducting a census is this:

a sample is more accurate than census

a sample is difficult to take

a sample cannot be trusted

a sample can save money when data collection process is destructive

6) Selection of the winning numbers is a lottery is an example of __________.

convenience sampling

random sampling

nonrandom sampling

regulatory sampling

7) A type of random sampling in which the population is divided into non-overlapping subpopulations is called __________.

stratified random sampling

cluster sampling

systematic random sampling

regulatory sampling

8) A type of random sampling in which every kth item (where k is some number) in the population is selected for inclusion in the sample is called __________.

stratified random sampling

cluster sampling

systematic sampling

regulatory sampling

9) Judgment sampling is an example of __________.

convenience sampling

random sampling

nonrandom (non-probabilistic) sampling

justice department sampling

10) For the following data, construct a frequency distribution with six classes.

57 23 35 18 21

26 51 47 29 21

46 43 29 23 39

50 41 19 36 28

31 42 52 29 18

28 46 33 28 20

11) What type of graph would be most appropriate for the frequency distribution above?

Pie chart

Bar chart

Pareto diagram

Histogram

12) For the following frequency distribution, determine the relative frequency, percent, and the cumulative frequency.

*Round your answer to 3 decimal places, the tolerance is +/-0.001.

Class Interval Frequency

20–under 25 17

25–under 30 20

30–under 35 16

35–under 40 15

40–under 45 8

45–under 50 6

TOTAL 82

13) True or False: Frequency distribution is a summary of data presented in the form of class intervals and frequencies.

14) True or False: The range of a data set is defined as the difference between the mean and the median.

15) True or False: The sum of the relative frequencies of a grouped data set is always equal to one.

16) The U.S. Department of the Interior releases figures on mineral production. Following are the values (in billions of dollars) of the 15 leading states in nonfuel mineral production in the United States in 2008.

1.68, 1.81, 1.85, 1.89, 2.05, 2.05, 2.08, 2.74, 3.21, 3.30, 4.00, 4.17, 4.20, 6.48, 7.84

a. Calculate the mean, median, and mode.

b. Calculate the range, interquartile range, sample variance, and sample standard deviation.

c. Compute the coefficient of skewness for these data and interpret.

17) The following graphic of residential housing data (selling price and size in square feet) indicates:

a correlation close to -1

a correlation close to 0 (no relation between the two variables)

a correlation close to 1

a negative relationship between the two variables

18) The Polk Company reported that the average age of a car on U.S. roads in a recent year was 7.5 years.

a) Suppose the distribution of ages of cars on U.S. roads is approximately bell-shaped. If 99.7% of the ages are between 1 year and 14 years, what is the standard deviation of car age?

b) Suppose the standard deviation is 1.7 years and the mean is 7.5 years. Between what two values would 95% of the car ages fall?

19) A large manufacturing firm tests job applicants who recently graduated from college. The test scores are bell shaped with a mean of 500 and a standard deviation of 50.

a) What proportion of people get scores between 400 and 600?

b) What proportion of people get scores higher than 450?

c) Management is considering placing a new hire in an upper level management position if the person scores in the upper 0.15% of the distribution. What is the lowest score a college graduate can earn to qualify for the position?

20) According to the Bureau of Labor Statistics, the average annual salary of a worker in Detroit, Michigan, is $35,748. Suppose the median annual salary for a worker in this group is $31,369 and the mode is $29,500.

a) Is the distribution of salaries for this group skewed? If so, how and why?

b) Which of these measures of central tendency would you use to describe these data? Why?

21) True or False: The median is the most frequently occurring value in a set of data.

22) True or False: A disadvantage of the mean as the measure of central tendency is that it is affected by extremely large or extremely small values in the data set.

23) True or False: The variance is the average of the squared deviations about the arithmetic mean for a set of numbers.

24) What is the median for the following five numbers? 223, 264, 216, 218, 229

25) The second quartile of a data set is always equal to its ________.

26) The sum of deviations from the mean for a data set is equal to __________.

27) Scores obtained by students in an advanced placement test has a symmetric mound shaped (bell shaped) distribution with a mean of 70 and a standard deviation of 10. What is the proportion of students who received between 60 and 80 points.

28) For the previous problem, what is the proportion of students who received less than 50 points?

29) The following joint probability table contains a breakdown on the age and gender of U.S. physicians in a recent year, as reported by the American Medical Association.

	Age of U.S. Physicians
	< 35	35 - 44	45 - 54	55 - 64	> 65	TOTAL
Male	0.11	0.20	0.19	0.12	0.16	0.78
Female	0.07	0.08	0.04	0.02	0.01	0.22
TOTAL	0.18	0.28	0.23	0.14	0.17	1.00

a) What is the probability that one randomly selected physician is 35–44 years old?

b) What is the probability that one randomly selected physician is both a woman and 45–54 years old?

c) What is the probability that one randomly selected physician is a man or is 35–44 years old?

d) What is the probability that one randomly selected physician is less than 35 years old or 55–64 years old?

e) What is the probability that one randomly selected physician is a woman if she is 45–54 years old?

f) What is the probability that a randomly selected physician is neither a woman nor 55–64 years old?

30) Purchasing Survey asked purchasing professionals what sales traits impressed them most in a sales representative. Seventy-eight percent selected "thoroughness." Forty percent responded "knowledge of your own product." The purchasing professionals were allowed to list more than one trait. Suppose 27% of the purchasing professionals listed both "thoroughness" and "knowledge of your own product" as sales traits that impressed them most. A purchasing professional is randomly sampled.

a) Make a probability table including the above information.

b) What is the probability that the professional selected "thoroughness" or "knowledge of your own product"?

c) What is the probability that the professional selected neither "thoroughness" nor "knowledge of your own product"?

d) If it is known that the professional selected "thoroughness," what is the probability that the professional selected "knowledge of your own product"?

e) What is the probability that the professional did not select "thoroughness" and did select "knowledge of your own product"?

31) From a previous midterm: The table below contains data from a sample of 200 people regarding opinion about the latest congressional plan to eliminate anti-trust exemptions for professional baseball (broken down by gender).

	OPINION ABOUT THE PLAN
	For	Neutral	Against	Totals
Female	38	54	12	104
Male	12	36	48	96
Totals	50	90	60	200

Please show your work for parts "a" through "e" or no credit will be given!

a) What is the probability that a person selected at random is for the plan?

b) If we know that the person is a female, what is the probability that the person is for the plan?

c) What is the probability that the person is male and is against the plan?

d) What is the probability that the person is male or is neutral about the plan?

e) Is opinion about the plan related to gender, or are opinion and gender independent? Please use statistical concepts and numerical calculations in your answer.

32) True or False: If two events are independent, the joint probability of the two events is always equal to the product of the marginal probabilities of two events.

33) True or False: If the conditional probability of an event A given another event B is same as the marginal probability of the event A, then events A and B are mutually exclusive.

34) If the occurrence or non-occurrence of one event does not affect the occurrence or non-occurrence of another event, the two events are ________________________.

35) A listing of all elementary outcomes (i.e. the outcomes which cannot be broken down into other events) of an experiment (i.e. a decision making situation under uncertainty) is called a __________.

36) How many different combinations of a 3-member debating team can be formed from a group of 16 qualified students?

Page 2

70

80

90

100

110

120

130

140016001800200022002400

Square Feet

Selling Price ($1,000)

Name: XXXXXX

Analysis Assignment

6/6/2017

In this analysis assignment, we will use the Chi-Square to analyze whether a management

training program is related to the promotion of managers and use the ANOVA to analyze

whether the satisfaction rate of employees in four different offices is the same.

1. Chi-Square

In the Chi-square testing, the total number of employees who didn’t promote is 40

employees. And among the 40 employees, it is expected that 17.6 employees didn’t participate

in the training program and 22.4 employees participated in the program. However, the real

outcome is that among 40 employees, 27 employees, which is 67.5% of the total employees

who didn’t promote, didn’t participate in the training program, while 13 employees, which is

32.5% of the total employees who didn’t promote, participated in the program.

In addition, the total number of employees who promoted is 60 employees. And among

the 60 employees, it is expected that 26.4 employees didn’t participate in the program and 33.6

Pearson Chi-square < 0.05, chance is not the

only factor that causes differences.

employees participated in the program. However, the real outcome is that among 60 employees,

17 employees, which is 28.3% of the total employees who promoted, didn’t participate in the

program and 43 employees, which is 71.7% of the employees who promoted, participate in the

program.

Therefore, the management training program is more efficient than expected in helping

employees to promote. Clearly, the management training program is related to the promotion

of employees.

2. ANOVA In the ANOVA test, we analyze the satisfaction rate data of different location of

offices to see if the satisfaction rate in four offices are the same. According to the charts, the p-

value is 0.034 which is less than 0.05. Therefore, we reject the null hypothesis that data is from

a sample population with the same mean. So, the satisfaction rate in four offices are not the

same. Moreover, with the scatter gram below, we can figure out that office 3 has the smallest

variance (which means the data within the group has smallest difference) while it contains the

smallest data value; the office 2 has the largest variance (which means the data within group

has largest difference) while it contains the biggest data value.

The P value in first subset >0.05, don’t have distinct

differences; The P value in second subset <0.05,

have distinct differences

Scatter Gram

0

10

20

30

40

50

60

0 1 2 3 4 5

mothly satisfaction rating of different offices

XXXXXXXXXXX ARE 112

Analysis Assignment A

For this analysis assignment, we were introduced to the program SPSS. This is a statistical tool to help us find certain statistics about a given data set. For Data Set 1, we used the Chi-Square test to see if there was a relation between a management training program and promotion. The most important statistical values for this data are the “count” and “expected count.” The “count” showed the relationship between who was promoted and who was not for each variable. The “expected count” showed the difference between who was actually promoted and who was expected to be promoted. The difference between “expected” and “actual” can also be referred to as “residual.” All the variables have the same residual value of 9.4, however, half are negative, and the other half is positive. The employees that had a positive residual value were the ones that were not promoted and did not have management training and the ones that were promoted and did have management training. The negative residual value were the employees that were not promoted and had management training and the ones that were promoted and did not have management training.

For Data Set 2, we used a different test, the ANOVA test, to see monthly employee

satisfaction rates at offices with different locations. The test results tell us that the p-value equals 0.034, thus all office employee satisfactions are different. There are four different office locations that are placed into two different “subsets.” Each of these subsets are classified as homogeneous, meaning that each office in a certain subset is alike. From the ANOVA test, we were able to identify which offices had homogenous employee satisfaction rates. The test also told us which offices had differing employee satisfaction rates. Subset 1 contained offices 1,3, and 4, while subset 2 contained offices 2,3, and 4. This tells us that the only offices that varied in employee satisfaction were office 1 and 2.

XXXXXXXXXX ARE 112 Analysis 6/7/17

Promoted * Management_Training_Program Crosstabulation

Management_Training_Program

Total No Yes Promoted No Count 27 13 40

Expected Count 17.6 22.4 40.0 % within Promoted 67.5% 32.5% 100.0% % within Management_Training_Program 61.4% 23.2% 40.0% % of Total 27.0% 13.0% 40.0% Residual 9.4 -9.4

Yes Count 17 43 60 Expected Count 26.4 33.6 60.0 % within Promoted 28.3% 71.7% 100.0% % within Management_Training_Program 38.6% 76.8% 60.0% % of Total 17.0% 43.0% 60.0% Residual -9.4 9.4

Total Count 44 56 100 Expected Count 44.0 56.0 100.0 % within Promoted 44.0% 56.0% 100.0% % within Management_Training_Program 100.0% 100.0% 100.0% % of Total 44.0% 56.0% 100.0%

From the Chi squared analysis and the management training program crosstabulation there is a statistical importance of the expected and observed promotions. The crosstabulation showed that an expected 17.6 employees who did not complete the training program would bot be promoted. The actual amount of employees who were not trained and did not get promoted was 27. This value is proven to be significantly different because the Chi squared analysis resulted in a less than one in a thousand chance of this data occurring without there being a correlation. Because of the Chi squared test result the management training program and promotion increased the amount of those promoted, expectedly 33.6 and actually 43, and had the reverse effect on those who did not get the management training and were not promoted. These differences in the data are statistically relevant; going to the management training would increase the chances of getting a promotion.

Chi-Square Tests

Value df Asymptotic

Significance (2-sided) Exact Sig. (2-sided) Exact Sig. (1-sided) Pearson Chi-Square 14.942a 1 .000 Continuity Correctionb 13.395 1 .000 Likelihood Ratio 15.211 1 .000 Fisher's Exact Test .000 .000 N of Valid Cases 100 a. 0 cells (0.0%) have expected count less than 5. The minimum expected count is 17.60. b. Computed only for a 2x2 table

XXXXXXXXXX ARE 112 Analysis 6/7/17

From the ANOVA analysis there is used to determine if the level of employ satisfaction is the same in the four offices. About the same level of satisfaction is seen in offices 1, 3, and 4. Offices 2, 3, and 4 also have very similar levels of satisfaction across the offices. There is a noticeable difference in the satisfaction of office 1 and 2. Office 1 had 39.25 and office 2 had a

satisfaction of 45.88. The ANOVA analysis gives the manager the insight of the noticeable difference in satisfaction of offices 1 and 2. These differences call to the manager’s attention that they should address the offices and determine why the satisfaction is varying.

Monthly_Satisfaction_Rating Tukey HSDa

Office Code N Subset for alpha = 0.05 1 2

1 8 39.25 3 8 39.38 39.38 4 8 41.38 41.38 2 8 45.88 Sig. .813 .053 Means for groups in homogeneous subsets are displayed. a. Uses Harmonic Mean Sample Size = 8.000.

Analysis Assignment (1)
ARE 112 SPSS A
ARE112 AA

Employee _Ref Promoted

Management_Training _Program

1 No Yes 2 No Yes 3 No Yes 4 No Yes 5 No No 6 No No 7 No No 8 No No 9 No No 10 No No 140 No No Here is where the "HIDE" command w 141 No No 142 No No 143 No No 144 Yes No 145 Yes No 146 Yes No 147 Yes Yes 148 Yes Yes 149 Yes Yes 150 Yes Yes

Formula # Promoted Training Program Formula

Yes 96 83 No 54 67 p-value = 0.02701

Total 150 150

# Promoted Training Program Yes 64.0% 55.3% No 36.0% 44.7%

Total 100.0% 100.0%

We are testing the idea that Promotion = function (Going to the training program).

=COUNTIF(B2:B151,$A$154)

In Per Cent

Chi square test with Excel form

=CHISQ.TEST(C154:C155,B

ARE 122 - Spring 2020 Chi Square Test

ws used

mula

B154:B155)

ARE 122 - Spring 2020 Chi Square Test

blog10

Chi_Sq_DATA

ANOVA_DATA

Get help from top-rated tutors in any subject.