Sheet1ParticipantPretestPost_Test1151724936321214171452115625257202285

Sheet1

Participant	Pretest	Post Test
1	15	17
2	49	36
3	21	21
4	17	14
5	21	15
6	25	25
7	20	22
8	56	38
9	16	19
10	31	31
11	28	33
12	44	39
13	35	29
14	48	41
15	32	31
16	37	27
17	45	21
18	29	27
19	28	32
20	34	20






















			.

&CDependent Samples "t" test for Repeated Measures

Sheet2

Sheet3

222

CHAPTER 7

✪ The t Test for a Single Sample 223

✪ The t Test for Dependent Means 236

✪ Assumptions of the t Test for a Single Sample and the t Test for Dependent Means 247

✪ Effect Size and Power for the t Test for Dependent Means 247

✪ Controversy: Advantages and Disadvantages of Repeated-Measures Designs 250

A t this point, you may think you know all about hypothesis testing. Here’s a surprise: what you know will not help you much as a researcher. Why? The procedures for testing hypotheses described up to this point were, of course,

absolutely necessary for what you will now learn. However, these procedures in- volved comparing a group of scores to a known population. In real research practice, you often compare two or more groups of scores to each other, without any direct information about populations. For example, you may have two scores for each per- son in a group of people, such as scores on an anxiety test before and after psy- chotherapy or number of familiar versus unfamiliar words recalled in a memory experiment. Or you might have one score per person for two groups of people, such

✪ Single Sample t Tests and Dependent Means t Tests in Research Articles 252

✪ Summary 253

✪ Key Terms 254

✪ Example Worked-Out Problems 254

✪ Practice Problems 258

✪ Using SPSS 265

✪ Chapter Notes 268

Introduction to t Tests Single Sample and Dependent Means

Chapter Outline

IS B

N 0-558-46761-X

Introduction to t Tests 223

t test hypothesis-testing procedure in which the population variance is un- known; it compares t scores from a sam- ple to a comparison distribution called a t distribution.

as an experimental group and a control group in a study of the effect of sleep loss on problem solving, or comparing the self-esteem test scores of a group of 10-year-old girls to a group of 10-year-old boys.

These kinds of research situations are among the most common in psychology, where usually the only information available is from samples. Nothing is known about the populations that the samples are supposed to come from. In particular, the researcher does not know the variance of the populations involved, which is a crucial ingredient in Step ❷ of the hypothesis-testing process (determining the characteristics of the comparison distribution).

In this chapter, we first look at the solution to the problem of not knowing the population variance by focusing on a special situation: comparing the mean of a sin- gle sample to a population with a known mean but an unknown variance. Then, after describing how to handle this problem of not knowing the population variance, we go on to consider the situation in which there is no known population at all—the sit- uation in which all we have are two scores for each of a number of people.

The hypothesis-testing procedures you learn in this chapter, those in which the population variance is unknown, are examples of t tests. The t test is sometimes called “Student’s t ” because its main principles were originally developed by William S. Gosset, who published his research articles anonymously using the name “Student” (see Box 7–1).

The t Test for a Single Sample Let’s begin with an example. Suppose your college newspaper reports an informal survey showing that students at your college study an average of 17 hours per week. However, you think that the students in your dormitory study much more than that. You randomly pick 16 students from your dormitory and ask them how much they study each day. (We will assume that they are all honest and accurate.) Your result is that these 16 students study an average of 21 hours per week. Should you conclude that students in your dormitory study more than the college average? Or should you conclude that your results are close enough to the college average that the small dif- ference of 4 hours might well be due to your having picked, purely by chance, 16 of the more studious residents in your dormitory?

In this example you have scores for a sample of individuals and you want to com- pare the mean of this sample to a population for which you know the mean but not the variance. Hypothesis testing in this situation is called a t test for a single sample. (It is also called a one-sample t test.) The t test for a single sample works basically the same way as the Z test you learned in Chapter 5. In the studies we considered in that chapter, you had scores for a sample of individuals (such as a group of 64 students rating the at- tractiveness of a person in a photograph after being told that the person has positive personality qualities) and you wanted to compare the mean of this sample to a popula- tion (in this case, a population of students not told about the person’s personality qual- ities). However, in the studies we considered in Chapter 5, you knew both the mean and variance of the general population to which you were going to compare your sam- ple. In the situations we are now going to consider, everything is the same, but you don’t know the population variance. This presents two important new wrinkles affect- ing the details of how you carry out two of the steps of the hypothesis-testing process.

The first important new wrinkle is in Step ❷. Because the population variance is not known, you have to estimate it. So the first new wrinkle we consider is how to estimate an unknown population variance. The other important new wrinkle affects Steps ❷ and ❸. When the population variance has to be estimated, the shape of the comparison

t test for a single sample hypothesis- testing procedure in which a sample mean is being compared to a known population mean and the population variance is unknown.

IS B

N 0-

55 8-

46 76

1- X

distribution is not quite a normal curve; so the second new wrinkle we consider is the shape of the comparison distribution (for Step ❷) and how to use a special table to find the cutoff (Step ❸) on what is a slightly differently shaped distribution.

Let’s return to the amount of studying example. Step ❶ of the hypothesis-testing procedure is to restate the problem as hypotheses about populations. There are two populations:

Population 1: The kind of students who live in your dormitory. Population 2: The kind of students in general at your college.

The research hypothesis is that Population 1 students study more than Population 2 students; the null hypothesis is that Population 1 students do not study more than Population 2 students. So far, the problem is no different from those in Chapter 5.

Step ❷ is to determine the characteristics of the comparison distribution. In this example, its mean will be 17, what the survey found for students at your college generally (Population 2).

224 Chapter 7

professor of mathematics and not a proper brewer at all. To his statistical colleagues, mainly at the Biometric Lab- oratory at University College in London, he was a mere brewer and not a proper mathematician.

So Gosset discovered the t distribution and invented the t test—simplicity itself (compared to most of statistics)—for situations when samples are small and the variability of the larger population is unknown. How- ever, the Guinness brewery did not allow its scientists to publish papers, because one Guinness scientist had re- vealed brewery secrets. To this day, most statisticians call the t distribution “Student’s t” because Gosset wrote under the anonymous name “Student.” A few of his fel- low statisticians knew who “Student” was, but apparently meetings with others involved the secrecy worthy of a spy novel. The brewery learned of his scientific fame only at his death, when colleagues wanted to honor him.

In spite of his great achievements, Gosset often wrote in letters that his own work provided “only a rough idea of the thing” or so-and-so “really worked out the com- plete mathematics.” He was remembered as a thoughtful, kind, humble man, sensitive to others’ feelings. Gosset’s friendliness and generosity with his time and ideas also resulted in many students and younger colleagues mak- ing major breakthroughs based on his help.

To learn more about William Gosset, go to http:// www-history.mcs.st-andrews.ac.uk/Biographies/Gosset. html.

Sources: Peters (1987); Salsburg (2001); Stigler (1986); Tankard (1984).

B O X 7 – 1 William S. Gosset, Alias “Student”: Not a Mathematician, But a Practical Man

William S. Gosset graduated from Oxford University in 1899 with degrees in mathe- matics and chemistry. It hap- pened that in the same year the Guinness brewers in Dublin, Ireland, were seeking a few young scientists to take a first-ever scientific look at beer making. Gosset took one of these jobs and soon had

immersed himself in barley, hops, and vats of brew. The problem was how to make beer of a consistently

high quality. Scientists such as Gosset wanted to make the quality of beer less variable, and they were especially in- terested in finding the cause of bad batches. A proper sci- entist would say, “Conduct experiments!” But a business such as a brewery could not afford to waste money on ex- periments involving large numbers of vats, some of which any brewer worth his hops knew would fail. So Gosset was forced to contemplate the probability of, say, a certain strain of barley producing terrible beer when the experi- ment could consist of only a few batches of each strain. Adding to the problem was that he had no idea of the vari- ability of a given strain of barley—perhaps some fields planted with the same strain grew better barley. (Does this sound familiar? Poor Gosset, like today’s psychologists, had no idea of his population’s variance.)

Gosset was up to the task, although at the time only he knew that. To his colleagues at the brewery, he was a

The Granger Collection

IS B

N 0-558-46761-X

The next part of Step ❷ is finding the variance of the distribution of means. Now you face a problem. Up to now in this book, you have always known the variance of the population of individuals. Using that variance, you then figured the variance of the distribution of means. However, in the present example, the variance of the number of hours studied for students at your college (the Population 2 students) was not reported in the newspaper article. So you email the paper. Unfortunately, the reporter did not figure the variance, and the original survey results are no longer available. What to do?

Basic Principle of the t Test: Estimating the Population Variance from the Sample Scores If you do not know the variance of the population of individuals, you can estimate it from what you do know—the scores of the people in your sample.

In the logic of hypothesis testing, the group of people you study is considered to be a random sample from a particular population. The variance of this sample ought to reflect the variance of that population. If the scores in the population have a lot of variation, then the scores in a sample randomly selected from that population should also have a lot of variation. If the population has very little variation, the scores in a sample from that population should also have very little variation. Thus, it should be possible to use the variation among the scores in the sample to make an informed guess about the spread of the scores in the population. That is, you could figure the variance of the sample’s scores, and that should be similar to the variance of the scores in the population. (See Figure 7–1.)

There is, however, one small hitch. The variance of a sample will generally be slightly smaller than the variance of the population from which it is taken. For this reason, the variance of the sample is a biased estimate of the population variance.1

It is a biased estimate because it consistently underestimates the actual variance of the population. (For example, if a population has a variance of 180, a typical sample

Introduction to t Tests 225

(c)(b) (a)

Figure 7–1 The variation in samples (as in each of the lower distributions) is similar to the variations in the populations they are taken from (each of the upper distributions).

biased estimate estimate of a popula- tion parameter that is likely systemati- cally to overestimate or underestimate the true value of the population parame- ter. For example, would be a biased estimate of the population variance (it would systematically underestimate it).

SD2

IS B

N 0-

55 8-

46 76

1- X

226 Chapter 7

unbiased estimate of the population variance ( ) estimate of the popula- tion variance, based on sample scores, which has been corrected so that it is equally likely to overestimate or under- estimate the true population variance; the correction used is dividing the sum of squared deviations by the sample size minus 1, instead of the usual procedure of dividing by the sample size directly.

S2 of 20 scores might have a variance of only 171.) If we used a biased estimate of the population variance in our research studies, our results would not be accurate. There- fore, we need to identify an unbiased estimate of the population variance.

Fortunately, you can figure an unbiased estimate of the population variance by slightly changing the ordinary variance formula. The ordinary variance formula is the sum of the squared deviation scores divided by the number of scores. The changed for- mula still starts with the sum of the squared deviation scores, but divides this by the number of scores minus 1. Dividing by a slightly smaller number makes the result slightly larger. Dividing by the number of scores minus 1 makes the variance you get just enough larger to make it an unbiased estimate of the population variance. (This unbiased estimate is our best estimate of the population variance. However, it is still an estimate, so it is unlikely to be exactly the same as the true population variance. But we can be certain that our unbiased estimate of the population variance is equally likely to be too high as it is to be too low. This is what makes the estimate unbiased.)

The symbol we will use for the unbiased estimate of the population variance is . The formula is the usual variance formula, but now dividing by :

(7–1)

(7–2)

Let’s return again to the example of hours spent studying and figure the estimated population variance from the sample’s 16 scores. First, you figure the sum of squared deviation scores. (Subtract the mean from each of the scores, square those deviation scores, and add them.) Presume in our example that this comes out to To get the estimated population variance, you divide this sum of squared deviation scores by the number of scores minus 1; that is, in this example, you divide 694 by

; 694 divided by 15 comes out to 46.27. In terms of the formula,

At this point, you have now seen several different types of standard deviation and variance (that is, for a sample, for a population, and unbiased estimates); and each of these types has used a different symbol. To help you keep them straight, a summary of the types of standard deviation and variance is shown in Table 7–1.

Degrees of Freedom The number you divide by (the number of scores minus 1) to get the estimated pop- ulation variance has a special name. It is called the degrees of freedom. It has this name because it is the number of scores in a sample that are “free to vary.” The idea is that, when figuring the variance, you first have to know the mean. If you know the mean and all but one of the scores in the sample, you can figure out the one you don’t know with a little arithmetic. Thus, once you know the mean, one of the scores in the sample is not free to have any possible value. So in this kind of situation the degrees of freedom are the number of scores minus 1. In terms of a formula,

(7–3)

df is the degrees of freedom.

df = N - 1

S2 = a (X - M)2 N - 1 =

N - 1 = 694

16 - 1 =

694

15 = 46.27

16 - 1

694 (SS = 694).

S = 2S2

S2 = a (X - M)2 N - 1 =

N - 1

N - 1S2 The estimated population variance is the sum of the squared deviation scores di- vided by the number of scores minus 1.

The estimated population standard deviation is the square root of the estimated population variance.

degrees of freedom (df ) number of scores free to vary when estimating a population parameter; usually part of a formula for making that estimate—for example, in the formula for estimating the population variance from a single sample, the degrees of freedom is the number of scores minus 1.

Table 7–1 Summary of Different Types of Standard Deviation and Variance

Statistical Term Symbol

Sample standard deviation SD

Population standard deviation

Estimated population S standard deviation

Sample variance SD 2

Population variance

Estimated population variance S 2 �2

�

The degrees of freedom are the number of scores in the sample minus 1.

IS B

N 0-558-46761-X

Introduction to t Tests 227

In our example, . (In some situations you learn about in later chapters, the degrees of freedom are figured a bit differently. This is because in those situations, the number of scores free to vary is different. For all the situations you learn about in this chapter, .)

The formula for the estimated population variance is often written using df in- stead of :

(7–4)

The Standard Deviation of the Distribution of Means Once you have figured the estimated population variance, you can figure the stan- dard deviation of the comparison distribution using the same procedures you learned in Chapter 5. Just as before, when you have a sample of more than one, the compar- ison distribution is a distribution of means, and the variance of a distribution of means is the variance of the population of individuals divided by the sample size. You have just estimated the variance of the population. Thus, you can estimate the variance of the distribution of means by dividing the estimated population variance by the sample size. The standard deviation of the distribution of means is the square root of its variance. Stated as formulas,

(7–5)

(7–6)

Note that, with an estimated population variance, the symbols for the variance and standard deviation of the distribution of means use S instead of .

In our example, the sample size was 16 and we worked out the estimated popu- lation variance to be 46.27. The variance of the distribution of means, based on that estimate, will be 2.89. That is, 46.27 divided by 16 equals 2.89. The standard devia- tion is 1.70, the square root of 2.89. In terms of the formulas,

The Shape of the Comparison Distribution When Using an Estimated Population Variance: The t Distribution In Chapter 5 you learned that when the population distribution follows a normal curve, the shape of the distribution of means will also be a normal curve. However, this changes when you do hypothesis testing with an estimated population variance. When you are using an estimated population variance, you have less true informa- tion and more room for error. The mathematical effect is that there are likely to be slightly more extreme means than in an exact normal curve. Further, the smaller your

SM = 2S2M = 22.89 = 1.70 S2M =

N =

46.27

16 = 2.89

�

SM = 2S2M

S2M = S2

S2 = a (X - M)2

df =

N - 1

df = N - 1

df = 16 - 1 = 15

T I P F O R S U C C E S S Be sure that you fully understand the difference between and . These terms look quite similar, but they are quite different. is the estimated variance of the popula- tion of individuals. is the esti- mated variance of the distribution of means (based on the estimated variance of the population of indi- viduals, ).S2

S2M

SM 2S2

The estimated population variance is the sum of squared deviations divided by the de- grees of freedom.

The variance of the distribu- tion of means based on an es- timated population variance is the estimated population variance divided by the num- ber of scores in the sample.

The standard deviation of the distribution of means based on an estimated population vari- ance is the square root of the variance of the distribution of means based on an estimated population variance.

IS B

N 0-

55 8-

46 76

1- X

228 Chapter 7

sample size, the bigger this tendency. This is because, with a smaller sample size, your estimate of the population variance is based on less information.

The result of all this is that, when doing hypothesis testing using an estimated variance, your comparison distribution will not be a normal curve. Instead, the com- parison distribution will be a slightly different curve called a t distribution.

Actually, there is a whole family of t distributions. They vary in shape according to the degrees of freedom you used to estimate the population variance. However, for any particular degrees of freedom, there is only one t distribution.

Generally, t distributions look to the eye like a normal curve—bell-shaped, sym- metrical, and unimodal. A t distribution differs subtly in having heavier tails (that is, slightly more scores at the extremes). Figure 7–2 shows the shape of a t distribution compared to a normal curve.

This slight difference in shape affects how extreme a score you need to reject the null hypothesis. As always, to reject the null hypothesis, your sample mean has to be in an extreme section of the comparison distribution of means, such as the top 5%. However, if the comparison distribution has more of its means in the tails than a normal curve would have, then the point where the top 5% begins has to be farther out on this comparison distribution. The result is that it takes a slightly more extreme sample mean to get a significant result when using a t distribution than when using a normal curve.

Just how much the t distribution differs from the normal curve depends on the de- grees of freedom, the amount of information used in estimating the population vari- ance. The t distribution differs most from the normal curve when the degrees of freedom are low (because your estimate of the population variance is based on a very small sample). For example, using the normal curve, you may recall that 1.64 is the cutoff for a one-tailed test at the .05 level. On a t distribution with 7 degrees of free- dom (that is, with a sample size of 8), the cutoff is 1.895 for a one-tailed test at the .05 level. If your estimate is based on a larger sample, say a sample of 25 (so that ), the cutoff is 1.711, a cutoff much closer to that for the normal curve. If your sample size is infinite, the t distribution is the same as the normal curve. (Of course, if your sample size were infinite, it would include the entire population!) But even with sam- ple sizes of 30 or more, the t distribution is nearly identical to the normal curve.

Shortly, you will learn how to find the cutoff using a t distribution, but let’s first return briefly to the example of how much students in your dorm study each week. You finally have everything you need for Step ❷ about the characteristics of the comparison distribution. We have already seen that the distribution of means in this example has a mean of 17 hours and a standard deviation of 1.70. You can now add that the shape of the comparison distribution will be a t distribution with 15 degrees of freedom.2

df = 24

Normal distribution

t distribution

Figure 7–2 A t distribution (dashed blue line) compared to the normal curve (solid black line).

t distribution mathematically defined curve that is the comparison distribution used in a t test.

IS B

N 0-558-46761-X

Introduction to t Tests 229

The Cutoff Sample Score for Rejecting the Null Hypothesis: Using the t Table Step ❸ of hypothesis testing is determining the cutoff for rejecting the null hypothesis. There is a different t distribution for any particular degrees of freedom. However, to avoid taking up pages and pages with tables for each possible t distribution, you use a simplified table that gives only the crucial cutoff points. We have included such a t table in the Appendix (Table A–2). Just as with the normal curve table, the t table shows only positive t scores. If you have a one-tailed test, you need to decide whether your cutoff score is a positive t score or a negative t score. If your one-tailed test is test- ing whether the mean of Population 1 is greater than the mean of Population 2, the cut- off t score is positive. However, if your one-tailed test is testing whether the mean of Population 1 is less than the mean of Population 2, the cutoff t score is negative.

In the hours-studied example, you have a one-tailed test. (You want to know whether students in your dorm study more than students in general at your college study.) You will probably want to use the 5% significance level, because the cost of a Type I error (mistakenly rejecting the null hypothesis) is not great. You have 16 partic- ipants, making 15 degrees of freedom for your estimate of the population variance.

Table 7–2 shows a portion of the t table from Table A–2 in the Appendix. Find the column for the .05 significance level for one-tailed tests and move down to the row for 15 degrees of freedom. The crucial cutoff is 1.753. In this example, you are testing whether students in your dormitory (Population 1) study more than students in general at your college (Population 2). In other words, you are testing whether

Table 7–2 Cutoff Scores for t Distributions with 1 Through 17 Degrees of Freedom (Highlighting Cutoff for Hours-Studied Example)

One-Tailed Tests Two-Tailed Tests

df .10 .05 .01 .10 .05 .01

1 3.078 6.314 31.821 6.314 12.706 63.657

2 1.886 2.920 6.965 2.920 4.303 9.925

3 1.638 2.353 4.541 2.353 3.182 5.841

4 1.533 2.132 3.747 2.132 2.776 4.604

5 1.476 2.015 3.365 2.015 2.571 4.032

6 1.440 1.943 3.143 1.943 2.447 3.708

7 1.415 1.895 2.998 1.895 2.365 3.500

8 1.397 1.860 2.897 1.860 2.306 3.356

9 1.383 1.833 2.822 1.833 2.262 3.250

10 1.372 1.813 2.764 1.813 2.228 3.170

11 1.364 1.796 2.718 1.796 2.201 3.106

12 1.356 1.783 2.681 1.783 2.179 3.055

13 1.350 1.771 2.651 1.771 2.161 3.013

14 1.345 1.762 2.625 1.762 2.145 2.977

15 1.341 1.753 2.603 1.753 2.132 2.947

16 1.337 1.746 2.584 1.746 2.120 2.921

17 1.334 1.740 2.567 1.740 2.110 2.898

t table table of cutoff scores on the t distribution for various degrees of freedom, significance levels, and one- and two-tailed tests.

IS B

N 0-

55 8-

46 76

1- X

230 Chapter 7

students in your dormitory have a higher t score than students in general. This means that the cutoff t score is positive. Thus, you will reject the null hypothesis if your sample’s mean is 1.753 or more standard deviations above the mean on the compar- ison distribution. (If you were using a known variance, you would have found your cutoff from a normal curve table. The Z score to reject the null hypothesis based on the normal curve would have been 1.645.)

One other point about using the t table: In the full t table in the Appendix, there are rows for each degree of freedom from 1 through 30, then for 35, 40, 45, and so on up to 100. Suppose your study has degrees of freedom between two of these higher values. To be safe, you should use the nearest degrees of freedom to yours given on the table that is less than yours. For example, in a study with 43 degrees of freedom, you would use the cutoff for .

The Sample Mean’s Score on the Comparison Distribution: The t Score Step ❹ of hypothesis testing is figuring your sample mean’s score on the comparison distribution. In Chapter 5, this meant finding the Z score on the comparison distribution—the number of standard deviations your sample’s mean is from the mean on the distribution. You do exactly the same thing when your comparison distri- bution is a t distribution. The only difference is that, instead of calling this a Z score, because it is from a t distribution, you call it a t score. In terms of a formula,

(7–7)

In the example, your sample’s mean of 21 is 4 hours from the mean of the distri- bution of means, which amounts to 2.35 standard deviations from the mean (4 hours divided by the standard deviation of 1.70 hours).3 That is, the t score in the example is 2.35. In terms of the formula,

Deciding Whether to Reject the Null Hypothesis Step ➎ of hypothesis testing is deciding whether to reject the null hypothesis. This step is exactly the same with a t test, as it was in the hypothesis-testing situations dis- cussed in previous chapters. In the example, the cutoff t score was 1.753 and the actual t score for your sample was 2.35. Conclusion: reject the null hypothesis. The research hypothesis is supported that students in your dorm study more than students in the college overall.

Figure 7–3 shows the various distributions for this example.

Summary of Hypothesis Testing When the Population Variance Is Not Known Table 7–3 compares the hypothesis-testing procedure we just considered (for a t test for a single sample) with the hypothesis-testing procedure for a Z test from Chapter 5. That is, we are comparing the current situation in which you know the population’s mean but not its variance to the Chapter 5 situation, where you knew the population’s mean and variance.

t = M - �

SM =

21 - 17 1.70

= 4

1.70 = 2.35

t = M - �

df = 40

The t score is your sample’s mean minus the population mean, divided by the standard deviation of the distribution of means.

t score on a t distribution, number of standard deviations from the mean (like a Z score, but on a t distribution).

IS B

N 0-558-46761-X

Introduction to t Tests 231

Comparison distribution

(t)

Population (normal)

15.3013.60 17.00 18.70 20.40 –1–2 0 1 2

Sample

Raw Scores: t Scores:

Figure 7–3 Distribution for the hours-studied example.

Table 7–3 Hypothesis Testing with a Single Sample Mean When Population Variance Is Unknown (t Test for a Single Sample) Compared to When Population Variance Is Known (Z Test)

Another Example of a t Test for a Single Sample Consider another fictional example. Suppose a researcher was studying the psychologi- cal effects of a devastating flood in a small rural community. Specifically, the researcher was interested in how hopeful (versus unhopeful) people felt after the flood. The

Steps in Hypothesis Testing Difference From When Population Variance Is Known

❶ Restate the question as a research hypothesis and a null hypothesis about the populations.

No difference in method.

❷ Determine the characteristics of the comparison distribution:

Population mean No difference in method.

Standard deviation of the distribution of sample means

No difference in method (but based on estimated population variance).

Population variance Estimate from the sample.

Shape of the comparison distribution Use the t distribution with .df = N - 1 ❸ Determine the significance cutoff. Use the t table.

❹ Determine your sample’s score on the comparison distribution.

No difference in method (but called a t score).

❺ Decide whether to reject the null hypothesis. No difference in method.

IS B

N 0-

55 8-

46 76

1- X

232 Chapter 7

researcher randomly selected 10 people from this community to complete a short ques- tionnaire. The key item on the questionnaire asked how hopeful they felt, using a 7-point scale from extremely unhopeful (1) to neutral (4) to extremely hopeful (7). The re- searcher wanted to know whether the ratings of hopefulness for people who had been through the flood would be consistently above or below the neutral point on the scale (4).

Table 7–4 shows the results and figuring for the t test for a single sample; Figure 7–4 shows the distributions involved. Here are the steps of hypothesis testing.

❶ Restate the question as a research hypothesis and a null hypothesis about the populations. There are two populations:

Population 1: People who experienced the flood. Population 2: People who are neither hopeful nor unhopeful.

The research hypothesis is that the two populations will score differently. The null hypothesis is that they will score the same.

❷ Determine the characteristics of the comparison distribution. If the null hy- pothesis is true, the mean of both populations is 4. The variance of these popu- lations is not known, so you have to estimate it from the sample. As shown in Table 7–4, the sum of the squared deviations of the sample’s scores from the sample’s mean is 32.10. Thus, the estimated population variance is 32.10 divided by 9 degrees of freedom (10 – 1), which comes out to 3.57.

The distribution of means has a mean of 4 (the same as the population mean). Its variance is the estimated population variance divided by the sample size (3.57

Table 7–4 Results and Figuring for a Single-Sample t Test for a Study of 10 People’s Ratings of Hopefulness Following a Devastating Flood (Fictional Data)

Rating (X )

Difference From the Mean

(X � M )

Squared Difference From the Mean

(X � M )2

5 .30 .09

3 2.89

6 1.30 1.69

2 7.29

7 2.30 5.29

6 1.30 1.69

7 5.29

4 .49

2 7.29

5 .30 .09

47 32.10

t with needed for 1% significance level, two-tailed .

Actual sample .

Decision: Do not reject the null hypothesis.

t = (M - �)>SM = (4.70 - 4.00)>.60 = .70>.60 = 1.17 = ; 3.250df = 9

SM = 2S 2M = 2.36 = .60. S2M = S 2>N = 3.57>10 = .36 S 2 = SS>df = 32.10>(10 - 1) = 32.10>9 = 3.57. � = 4.00. df = N - 1 = 10 - 1 = 9. M = (©X )>N = 47>10 = 4.70.

©:

- 2.70 - .70

- 2.30

- 2.70

- 1.70

T I P F O R S U C C E S S Be careful. To find the variance of a distribution of means, you always divide the population variance by the sample size. This is true whether the population’s variance is known or only estimated. It is only when making the estimate of the population variance that you divide by the sample size minus 1. That is, the degrees of freedom are used only when estimating the variance of the population of individuals.

IS B

N 0-558-46761-X

Introduction to t Tests 233

divided by 10 equals .36). The square root of this, the standard deviation of the distribution of means, is .60. Its shape will be a t distribution for .

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. The researcher wanted to be very cau- tious about mistakenly concluding that the flood made a difference. Thus, she decided to use the .01 significance level. The hypothesis was nondirectional (that is, no specific direction of difference from the mean of 4 was specified; either result would have been of interest); so the researcher used a two-tailed test. The researcher looked up the cutoff in Table 7–2 (or Table A–2 in the Appendix) for a two-tailed test and 9 degrees of freedom. The cutoff given in the table is 3.250. Thus, to reject the null hypothesis, the sample’s score on the comparison distribution must be 3.250 or higher, or or lower.

❹ Determine your sample’s score on the comparison distribution. The sam- ple’s mean of 4.70 is .70 scale points from the null hypothesis mean of 4.00. That makes it 1.17 standard deviations on the comparison distribution from that distribution’s mean ; .

➎ Decide whether to reject the null hypothesis. The t of 1.17 is not as extreme as the needed t of . Therefore, the researcher cannot reject the null hy- pothesis. The study is inconclusive. (If the researcher had used a larger sample, giving more power, the result might have been quite different.)

Summary of Steps for a t Test for a Single Sample Table 7–5 summarizes the steps of hypothesis testing when you have scores from a single sample and a population with a known mean but an unknown variance.4

; 3.250

t = 1.17(.70>.60 = 1.17)

- 3.250

df = 9

Comparison distribution (t)

Population (normal)

4.00

3.40 4.00 4.60 –1 0 1

4.70

Sample

Raw Scores: t Scores:

Figure 7–4 Distributions for the example of how hopeful individuals felt following a devastating flood.

IS B

N 0-

55 8-

46 76

1- X

234 Chapter 7

Table 7–5 Steps for a t Test for a Single Sample

❶ Restate the question as a research hypothesis and a null hypothesis about the populations.

❷ Determine the characteristics of the comparison distribution.

a. The mean is the same as the known population mean.

b. The standard deviation is figured as follows:

●A Figure the estimated population variance: .

●B Figure the variance of the distribution of means:

●C Figure the standard deviation of the distribution of means: .

c. The shape will be a t distribution with degrees of freedom.

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected.

a. Decide the significance level and whether to use a one-tailed or a two-tailed test.

b. Look up the appropriate cutoff in a t table.

❹ Determine your sample’s score on the comparison distribution: .

❺ Decide whether to reject the null hypothesis: Compare the scores from Steps ❸ and ❹.

t = (M - �)>SM

N - 1 SM = 2S 2M

S 2M = S 2>N. S 2 = SS>df

How are you doing?

1. In what sense is a sample’s variance a biased estimate of the variance of the population the sample is taken from? That is, in what way does the sample’s variance typically differ from the population’s?

2. What is the difference between the usual formula for figuring the variance and the formula for estimating a population’s variance from the scores in a sample (that is, the formula for an unbiased estimate of the population variance)?

3. (a) What are degrees of freedom? (b) How do you figure the degrees of freedom in a t test for a single sample? (c) What do they have to do with estimating the population variance? (d) What do they have to do with the t distribution?

4. (a) How does a t distribution differ from a normal curve? (b) How do degrees of freedom affect this? (c) What is the effect of the difference on hypothesis testing?

5. List three differences in how you do hypothesis testing for a t test for a single sample versus for the Z test (you learned in Chapter 5).

6. A population has a mean of 23. A sample of 4 is given an experimental proce- dure and has scores of 20, 22, 22, and 20. Test the hypothesis that the proce- dure produces a lower score. Use the .05 significance level. (a) Use the steps of hypothesis testing and (b) make a sketch of the distributions involved.

❸Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. From Table A–2, the cutoff for a one-tailed ttest at the .05 level for is . The cutoff tscore is negative, since the research hypothesis is that the procedure produces a lowerscore.

❹Determine your sample’s score on the comparison distribution. .

➎Decide whether to reject the null hypothesis.The tof is more ex- treme than the needed tof . Therefore, reject the null hypothesis; the research hypothesis is supported.

(b) Sketches of distributions are shown in Figure 7–5.

-2.353 -3.51

t=(M-�)>SM=(21-23)>.57=-2>.57=-3.51

-2.353 df=3

IS B

N 0-558-46761-X

Introduction to t Tests 235

6.(a) Steps of hypothesis testing: ❶Restate the question as a research hypothesis and a null hypothesis

about the populations. There are two populations:

Population 1: People who are given the experimental procedure. Population 2: The general population.

The research hypothesis is that Population 1 will score lower than Population 2. The null hypothesis is that Population 1 will not score lower than Population 2. ❷Determine the characteristics of the comparison distribution.

a.The mean of the distribution of means is 23. b.The standard deviation is figured as follows:

●AFiguretheestimatedpopulationvariance.Youfirstneedtofigure thesamplemean,whichis.The estimatedpopulationvarianceis

. ●BFigure the variance of the distribution of means:

●CFigure the standard deviation of the distribution of means:

c.Theshapeofthecomparisondistributionwillbeatdistributionwithdf=3. SM=2S

2 M=2.33=.57

S 2 M=S

2 > N=1.33 > 4=.33

- 1 2 )>3=(1+1+1+1)>3=4>3=1.33

1 2 + (-1

2 +1

2 + (22-21)

2 +(20-21)

2 4>(4-1)= (22-21)

2 +

S 2

=SS>(N-1)=[(20-21) 2 +

(20+22+22+20)>4=84>4=21

Comparison distribution

(t)

Population (normal)

21.8622.4323

Sample

21.29

–2–10 –3

Raw Scores: t Scores:

Figure 7–5Distributions for answer to “How Are You Doing?” question 6b.

IS B

N 0-

55 8-

46 76

1- X

236 Chapter 7

repeated-measures design research strategy in which each person is tested more than once; same as within subjects design.

t test for dependent means hypothesis-testing procedure in which there are two scores for each person and the population variance is not known; it determines the significance of a hypoth- esis that is being tested using difference or change scores from a single group of people.

The t Test for Dependent Means The situation you just learned about (the t test for a single sample) is for when you know the population mean but not its variance and you have a single sample of scores. It turns out that in most research you do not even know the population’s mean; plus, in most research situations you usually have not one set, but two sets, of scores. These two things, not knowing the population mean and having two sets of scores, is very, very common.

The rest of this chapter focuses specifically on this important research situation in which you have two scores from each person in your sample. This kind of research sit- uation is called a repeated-measures design (also known as a within subjects design). A common example is when you measure the same people before and after some psychological or social intervention. For example, a psychologist might measure the quality of men’s communication before and after receiving premarital counseling.

The hypothesis-testing procedure for the situation in which each person is mea- sured twice (that is, for the situation in which we have a repeated-measures design) is a t test for dependent means. It has the name “dependent means” because the mean for each group of scores (for example, a group of before-scores and a group of after-scores) are dependent on each other in that they are both from the same people. (In Chapter 8, we consider the situation in which you compare scores from two different groups of people, a research situation you analyze using a t test for independent means.)

You do a t test for dependent means exactly the same way as a t test for a single sample, except that (a) you use something called difference scores, and (b) you as- sume that the population mean (of the difference scores) is 0. We will now consider each of these two new aspects.

Difference Scores With a repeated-measures design, your sample includes two scores for each person in- stead of just one. The way you handle this is to make the two scores per person into one

Answers

1.The sample’s variance will in general be smaller than the variance of the pop- ulation the sample is taken from.

2.Intheusualformulayoudividebythenumberofparticipants(N);intheformu- la for estimating a population’s variance from the scores in a sample, you divide by the number of participants in the sample minus 1 (that is, ).

3.(a) Degrees of freedom consist of the number of scores free to vary. (b) The de- grees of freedom in a ttest for a single sample consist of the number of scores in the sample minus 1. (c) In estimating the population variance, the formula is the sum of squared deviations divided by the degrees of freedom. (d) tdistrib- utions differ slightly from each other according to the degrees of freedom.

4.(a) A tdistribution differs from a normal curve in that it has heavier tails; that is, more scores at the extremes. (b) The more degrees of freedom, the closer the shape (including the tails) is to a normal curve. (c) The cutoffs for significance are more extreme for a tdistribution than for a normal curve.

5.In the ttest you (a) estimate the population variance from the sample (it is not known in advance); (b) you look up the cutoff on a ttable in which you also have to take into account the degrees of freedom (you don’t use a normal curve table); and (c) your sample’s score on the comparison distribution, which is a tdistribution (not a normal curve), is a tscore (not a Zscore).

N-1

IS B

N 0-558-46761-X

Introduction to t Tests 237

score per person! You do this magic by creating difference scores: For each person, you subtract one score from the other. If the difference is before versus after, differ- ence scores are also called change scores.

Consider the example of the quality of men’s communication before and after re- ceiving premarital counseling. The psychologist subtracts the communication quality score before the counseling from the communication quality score after the counsel- ing. This gives an after-minus-before difference score for each man. When the two scores are a before-score and an after-score, we usually take the after-score minus the before-score to indicate the change.

Once you have the difference score for each person in the study, you do the rest of the hypothesis testing with difference scores. That is, you treat the study as if there were a single sample of scores (scores that in this situation happen to be difference scores).

Population of Difference Scores with a Mean of 0 So far in the research situations we have considered in this book, you have always known the mean of the population to which you compared your sample’s mean. For example, in the college dormitory survey of hours studied, you knew the population mean was 17 hours. However, now we are using difference scores, and we usually don’t know the mean of the population of difference scores.

Here is the solution. Ordinarily, the null hypothesis in a repeated-measures de- sign is that on the average there is no difference between the two groups of scores. For example, the null hypothesis in a study of the quality of men’s communication before and after receiving premarital counseling is that on the average there is no dif- ference between communication quality before and after the counseling. What does no difference mean? Saying there is on the average no difference is the same as say- ing that the mean of the population of the difference scores is 0. Therefore, when working with difference scores, you are comparing the population of difference scores that your sample of difference scores comes from to a population of differ- ence scores with a mean of 0. In other words, with a t test for dependent means, what we call Population 2 will ordinarily have a mean of 0 (that is, it is a population of dif- ference scores that has a mean of 0).

Example of a t Test for Dependent Means Olthoff (1989) tested the communication quality of couples three months before and again three months after marriage. One group studied was 19 couples who had re- ceived ordinary (very minimal) premarital counseling from the ministers who were going to marry them. (To keep the example simple, we will focus on just this one group and only on the husbands in the group. Scores for wives were similar, though somewhat more varied, making it a more complicated example for learning the t test procedure.)

The scores for the 19 husbands are listed in the “Before” and “After” columns in Table 7–6, followed by all the t test figuring. (The distributions involved are shown in Figure 7–6.) The crucial column for starting the analysis is the difference scores. For example, the first husband, whose communication quality was 126 before mar- riage and 115 after had a difference of . (We figured after minus before, so that an increase is positive and a decrease, as for this husband, is negative.) The mean of the difference scores is . That is, on the average, these 19 husbands’ commu- nication quality decreased by about 12 points.

Is this decrease significant? In other words, how likely is it that this sample of difference scores is a random sample from a population of difference scores whose mean is 0?

- 12.05

- 11

difference scores difference between a person’s score on one testing and the same person’s score on another testing; often an after-score minus a before- score, in which case it is also called a change score.

IS B

N 0-

55 8-

46 76

1- X

238 Chapter 7

❶ Restate the question as a research hypothesis and a null hypothesis about the populations. There are two populations:

Population 1: Husbands who receive ordinary premarital counseling. Population 2: Husbands whose communication quality does not change from before to after marriage. (In other words, it is a population of husbands whose mean difference in communication quality from before to after marriage is 0.)

The research hypothesis is that Population 1’s mean difference score (com- munication quality after marriage minus communication quality before marriage) is different from Population 2’s mean difference score (of zero). That is, the

Table 7–6 t Test for Communication Quality Scores Before and After Marriage for 19 Husbands Who Received Ordinary Premarital Counseling

Husband Communication

Quality Difference

(After – Before) Deviation

(Difference – M ) Squared Deviation

Before After

A 126 115 1.05 1.10

B 133 125 4.05 16.40

C 126 96 322.20

D 115 115 0 12.05 145.20

E 108 119 11 23.05 531.30

F 109 82 223.50

G 124 93 359.10

H 98 109 11 23.05 531.30

I 95 72 119.90

J 120 104 15.60

K 118 107 1.05 1.10

L 126 118 4.05 16.40

M 121 102 48.30

N 116 115 11.05 122.10

O 94 83 1.05 1.10

P 105 87 35.40

Q 123 121 10.05 101.00

R 125 100 167.70

S 128 118 2.05 4.20

2,210 1,981 2,762.90

For difference scores:

(assumed as a no-change baseline of comparison).

t with needed for 5% level, two-tailed .

Decision: Reject the null hypothesis.

Source: Data from Olthoff (1989).

t = (M - �)>SM = ( - 12.05 - 0)>2.84 = - 4.24. = ; 2.101df = 18

SM = 2S2M = 28.08 = 2.84. S 2M = S 2>N = 153.49>19 = 8.08. S 2 = SS>df = 2,762.90>(19 - 1) = 153.49. � = 0 M = - 229>19 = - 12.05.

- 229©: - 10

- 12.95- 25 - 2

- 5.95- 18 - 11 - 1

- 6.95- 19 - 8

- 11 - 3.95- 16

- 10.95- 23

- 18.95- 31 - 14.95- 27

- 17.95- 30 - 8

- 11

T I P F O R S U C C E S S As in previous chapters, Popula- tion 2 is the population for when the null hypothesis is true.

IS B

N 0-558-46761-X

Introduction to t Tests 239

research hypothesis is that husbands who receive ordinary premarital counseling, like the husbands Olthoff studied, do change in communication quality from be- fore to after marriage. The null hypothesis is that the populations are the same— that the husbands who receive ordinary premarital counseling do not change in their communication quality from before to after marriage.

Notice that you have no actual information about Population 2 husbands. The husbands in the study are a sample of Population 1 husbands. For the pur- poses of hypothesis testing, you set up Population 2 as a kind of straw man com- parison group. That is, for the purpose of the analysis, you set up a comparison group of husbands who, if measured before and after marriage, would on the average show no difference.

❷ Determine the characteristics of the comparison distribution. If the null hy- pothesis is true, the mean of the population of difference scores is 0. The vari- ance of the population of difference scores can be estimated from the sample of difference scores. As shown in Table 7–6, the sum of squared deviations of the difference scores from the mean of the difference scores is 2,762.90. With 19 husbands in the study, there are 18 degrees of freedom. Dividing the sum of squared deviation scores by the degrees of freedom gives an estimated popula- tion variance of difference scores of 153.49.

The distribution of means (from this population of difference scores) has a mean of 0, the same as the mean of the population of difference scores. The vari- ance of the distribution of means of difference scores is the estimated population variance of difference scores (153.49) divided by the sample size (19), which

Comparison distribution (t)

Population of difference

scores

–2.85 0 2.85 –1 0 1

–12.05

Sample

Raw Scores t Scores

Figure 7–6 Distributions for the Olthoff (1989) example of a t test for dependent means.

IS B

N 0-

55 8-

46 76

1- X

240 Chapter 7

gives 8.08. The standard deviation of the distribution of means of difference scores is 2.84, the square root of 8.08. Because Olthoff was using an estimated population variance, the comparison distribution is a t distribution. The estimate of the population variance of difference scores is based on 18 degrees of freedom; so this comparison distribution is a t distribution for 18 degrees of freedom.

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. Olthoff used a two-tailed test to allow for either an increase or decrease in communication quality. Using the .05 sig- nificance level and 18 degrees of freedom, Table A–2 shows cutoff t scores of

and . ❹ Determine your sample’s score on the comparison distribution. Olthoff’s

sample had a mean difference score of . That is, the mean was 12.05 points below the mean of 0 on the distribution of means of difference scores. The standard deviation of the distribution of means of difference scores is 2.84. Thus, the mean of the difference scores of is 4.24 standard deviations below the mean of the distribution of means of difference scores. So Olthoff’s sample of difference scores has a t score of .

❺ Decide whether to reject the null hypothesis. The t of for the sample of difference scores is more extreme than the needed t of . Thus, you can re- ject the null hypothesis: Olthoff’s husbands are from a population in which hus- bands’ communication quality is different after marriage from what it was before (it is lower).

Olthoff’s actual study was more complex. You may be interested to know that he found that the wives also showed this decrease in communication quality after mar- riage. But a group of similar engaged couples who were given special communication skills training by their ministers (much more than the usual short session) had no sig- nificant decline in marital communication quality after marriage. In fact, there is a great deal of research showing that on the average marital happiness declines steeply over time (VanLaningham et al., 2001). And many studies have now shown the value of a full course of premarital communications training. For example, a recent repre- sentative survey of 3,344 adults in the United States showed that those who had at- tended a premarital communication program had significantly greater marital satisfaction, had less marital conflict, and were 31% less likely to divorce (Stanley et al., 2006). Further, benefits were greatest for those with a college education!

Summary of Steps for a t Test for Dependent Means Table 7–7 summarizes the steps for a t test for dependent means.5

A Second Example of a t Test for Dependent Means Here is another example. A team of researchers examined the brain systems involved in human romantic love (Aron et al., 2005). One issue was whether romantic love en- gages a part of the brain called the caudate (a brain structure that is engaged when peo- ple win money, are given cocaine, and other such “rewards”). Thus, the researchers recruited people who had very recently fallen “madly in love.” (For example, to be in the study participants had to think about their partner at least 80% of their waking hours.) Participants brought a picture of their beloved with them, plus a picture of a fa- miliar, neutral person of the same age and sex as their beloved. Participants then went in to the functional magnetic resonance imaging (fMRI) machine and their brain was scanned while they looked at the two pictures—30 seconds at the neutral person’s pic- ture, 30 seconds at their beloved, 30 seconds at the neutral person, and so forth.

; 2.101 - 4.24

- 4.24

- 12.05

- 2.101+ 2.101

T I P F O R S U C C E S S Step ❷ of hypothesis testing for the t test for dependent means is more complex than previously. This can make it easy to lose track of the purpose of this step. Step ❷ of hypothesis testing determines the characteristics of the comparison distribution. In the case of the t test for dependent means, this compar- ison distribution is a distribution of means of difference scores. The key characteristics of this distribu- tion are its mean (which is as- sumed to equal 0), its standard deviation (which is estimated as ), and its shape (a t distribution with degrees of freedom equal to the sample size minus 1).

T I P F O R S U C C E S S You now have to deal with some rather complex terms, such as the standard deviation of the distribu- tion of means of difference scores. Although these terms are complex, there is good logic behind them. The best way to understand such terms is to break them down into manageable pieces. For example, you will notice that these new terms are the same as the terms for the t test for a single sample, with the added phrase “of differ- ence scores.” This phrase has been added because all of the fig- uring for the t test for dependent means uses difference scores.

IS B

N 0-558-46761-X

Introduction to t Tests 241

Table 7–8 shows average brain activations (mean fMRI scanner values) in the caudate area of interest during the two kinds of pictures. (We have simplified the example for teaching purposes, including using only 10 participants when the actual study had 17.) It also shows the figuring of the difference scores and all the other

Table 7–7 Steps for a t Test for Dependent Means

❶ Restate the question as a research hypothesis and a null hypothesis about the populations.

❷ Determine the characteristics of the comparison distribution.

a. Make each person’s two scores into a difference score. Do all the remaining steps using these difference scores.

b. Figure the mean of the difference scores.

c. Assume a mean of the distribution of means of difference scores of 0: .

d. The standard deviation of the distribution of means of difference scores is figured as follows:

●A Figure the estimated population variance of difference scores: .

●B Figure the variance of the distribution of means of difference scores: .

●C Figure the standard deviation of the distribution of means of difference scores:

e. The shape is a t distribution with .

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected.

a. Decide the significance level and whether to use a one-tailed or a two-tailed test.

b. Look up the appropriate cutoff in a t table.

❹ Determine your sample’s score on the comparison distribution: .

➎ Decide whether to reject the null hypothesis: Compare the scores from Steps ❸ and ❹.

t = (M - �)>SM

df = N - 1 SM = 2S 2M .

S 2M = S 2>N S 2 = SS>df

� = 0

Table 7–8 t Test for a Study of Romantic Love and Brain Activation in Part of the Caudate

Brain Activation

Student Beloved’s photo Control photo

Difference (Beloved –

Control) Deviation

(Difference – M ) Squared Deviation

1 1487.8 1487.2 .6 .640

2 1329.4 1328.1 1.3 .010

3 1407.9 1405.9 2.0 .600 .360

4 1236.1 1234.0 2.1 .700 .490

5 1299.8 1298.2 1.6 .200 .040

6 1447.2 1444.7 2.5 1.100 1.210

7 1354.1 1354.3 2.560

8 1204.6 1203.7 .9 .250

9 1322.3 1320.8 1.5 .100 .010

10 1388.5 1386.8 1.7 .300 .090

13477.7 13463.7 14.0 5.660

For difference scores:

(assumed as a no-change baseline of comparison).

t with needed for 5% level, one-tailed .

Decision: Reject the null hypothesis.

Source: Data based on Aron et al. (2005).

t = (M - �)>SM = (1.400 - 0)>.251 = 5.58. = 1.833df = 9

SM = 2S2M = 2.063 = .251. S 2M = S 2>N = .629>10 = .063. S 2 = SS>df = 5.660>(10 - 1) = 5.660>9 = .629. � = 0 M = 14.0>10 = 1.400.

©:

- .500 - 1.600- .2

- .100 - .800

IS B

N 0-

55 8-

46 76

1- X

242 Chapter 7

figuring for the t test for dependent means. Figure 7–7 shows the distributions in- volved. Here are the steps of hypothesis testing:

❶ Restate the question as a research hypothesis and a null hypothesis about the populations. There are two populations:

Population 1: Individuals like those tested in this study. Population 2: Individuals whose brain activation in the caudate area of interest is the same when looking at a picture of their beloved and a picture of a familiar, neutral person.

The research hypothesis is that Population 1’s mean difference score (brain activa- tion when viewing the beloved’s picture minus brain activation when viewing the neutral person’s picture) is greater than Population 2’s mean difference score (of no difference). That is, the research hypothesis is that brain activation in the caudate area of interest is greater when viewing the beloved person’s picture than when viewing the neutral person’s picture. The null hypothesis is that Population 1’s mean difference score is not greater than Population 2’s. That is, the null hypothe- sis is that brain activation in the caudate area of interest is not greater when viewing the beloved person’s picture than when viewing the neutral person’s picture.

❷ Determine the characteristics of the comparison distribution. a. Make each person’s two scores into a difference score. This is shown in the

column labeled “Difference” in Table 7–8. You do all the remaining steps using these difference scores.

Comparison distribution (t)

Population of difference

scores

–.251 0 .251 –1 0 1

1.400

Sample

Raw Scores: t Scores:

Figure 7–7 Distributions for the example of romantic love and brain activation in part of the caudate.

IS B

N 0-558-46761-X

Introduction to t Tests 243

b. Figure the mean of the difference scores. The sum of the difference scores (14.0) divided by the number of difference scores (10) gives a mean of the difference scores of 1.400. So, .

c. Assume a mean of the distribution of means of difference scores of 0: d. The standard deviation of the distribution of means of difference scores is fig-

ured as follows: ●A Figure the estimated population variance of difference scores:

. ●B Figure the variance of the distribution of means of difference scores:

●C Figure the standard deviation of the distribution of means of difference scores:

e. The shape is a t distribution with . Therefore, the comparison distribution is a t distribution for 9 degrees of freedom. It is a t distribution because we figured its variance based on an estimated population variance. It has 9 degrees of freedom because there were 9 degrees of freedom in the estimate of the population variance.

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. a. We will use the standard .05 significance level. This is a one-tailed test because

the researchers were interested only in a specific direction of difference. b. Using the .05 significance level with 9 degrees of freedom, Table A–2 shows a

cutoff t of 1.833. In Table 7–8, the difference score is figured as brain activa- tion when viewing the beloved’s picture minus brain activation when viewing the neutral person’s picture. Thus, the research hypothesis predicts a positive difference score, which means that our cutoff is .

❹ Determine your sample’s score on the comparison distribution. . The sample’s mean difference

of 1.400 is 5.58 standard deviations (of .251 each) above the mean of 0 on the distribution of means of difference scores.

➎ Decide whether to reject the null hypothesis. The sample’s t score of 5.58 is more extreme than the cutoff t of 1.833. You can reject the null hypothesis. Brain activation in the caudate area of interest is greater when viewing a beloved’s picture than when viewing a neutral person’s picture. The results of this study are not limited to North Americans. Recently, the study was replicated, with virtually identical results, in Beijing with Chinese students who were in- tensely in love (Xu et al., 2007).

t Test for Dependent Means with Scores from Pairs of Research Participants The t test for dependent means is also called a paired-samples t test, t test for correlated means, t test for matched samples, and t test for matched pairs. Each of these names comes from the same idea that in this kind of t test you are comparing two sets of scores that are related to each other in a direct way. In the t test for dependent means examples in this chapter, the two sets of scores have been related because each individual had a score in both sets of scores (for example, a score before a procedure and a score after a procedure). However, you can also use a t test for dependent means with scores from pairs of research participants, considering each pair as if it were one person, and figur- ing the difference score for each pair. For example, suppose you have 30 married cou- ples and want to test whether wives consistently do more housework than husbands.

t = (M - �)>SM = (1.400 - 0)>.251 = 5.58

+ 1.833

df = N - 1 SM = 2S2M = 2.063 = .251.

S2M = S2 > N = .629 > 10 = .063.

S2 = SS>df = 5.660>(10 - 1) = .629

� = 0. M = 1.400

IS B

N 0-

55 8-

46 76

1- X

244 Chapter 7

You could figure for each couple a difference score of the wife’s hours of housework per week minus her husband’s number of hours of housework per week. There are also situations in which experimenters create pairs. For example, a researcher might put participants into pairs to do a puzzle task together and, for each pair, assign one to be a leader and one a follower. At the end of the study, participants privately fill out a ques- tionnaire about how much they enjoyed the interaction. The procedure for analyzing this study would be to create a difference score for each pair by taking the enjoyment rating of the leader minus the enjoyment rating of the follower.

Review and Comparison of Z Test, t Test for a Single Sample, and t test for Dependent Means In Chapter 5 you learned about the Z test; in this chapter you have learned about the t test for a single sample and the t test for dependent means. Table 7–9 provides a review and comparison of the Z test, the t test for a single sample, and the t test for dependent means.

T I P F O R S U C C E S S We recommend that you spend some time carefully going through Table 7–9. Test your understanding of the different tests by covering up portions of the table and trying to recall the hidden information. Also, take a look at Chapter Note 3 (page 268) for a discussion of the terminology used in the formulas.

Table 7–9 Review of the Z Test, the t Test for a Single Sample, and the t Test for Dependent Means

Type of Test

Features Z Test t Test for a

Single Sample t Test for

Dependent Means

Population variance is known Yes No No

Population mean is known Yes Yes No

Number of scores for each participant 1 1 2

Shape of comparison distribution Z distribution t distribution t distribution

Formula for degrees of freedom Not applicable

Formula t = (M - �)>SMt = (M - �)>SMZ = (M - �M)>�M

df = N - 1df = N - 1

How are you doing?

1. Describe the situation in which you would use a t test for dependent means. 2. When doing a t test for dependent means, what do you do with the two

scores you have for each participant? 3. In a t test for dependent means, (a) what is usually considered to be the mean

of the “known” population (Population 2). (b) Why? 4. Five individuals are tested before and after an experimental procedure; their

scores are given in the following table. Test the hypothesis that there is no change, using the .05 significance level. (a) Use the steps of hypothesis test- ing and (b) sketch the distributions involved.

Person Before After

1 20 30 2 30 50 3 20 10 4 40 30 5 30 40

IS B

N 0-558-46761-X

Introduction to t Tests 245

5. What about the research situation makes the difference in whether you should carry out a Z test or a t test for a single sample?

6. What about the research situation makes the difference in whether you should carry out a t test for a single sample or a t test for dependent means?

Comparison distribution (t)

Population of difference

scores

–606 –101

4.0

Sample

Raw Scores: t Scores:

Figure 7–8Distributions for answer to “How Are You Doing?” question 4.

➎Decidewhethertorejectthenullhypothesis.Thesample’stscoreof.67 isnotmoreextremethanthecutofftof.Therefore,donotrejectthe nullhypothesis.

4.(b) The distributions are shown inFigure 7–8. 5.As shown in Table 7–9, whether the population variance is known determines

whether you should carry out a Ztest or a ttest for a single sample. You use a Ztest when the population variance is known and you use the ttest for a single sample when it is not known.

6.As shown in Table 7–9, whether the population mean is known and whether there are one or two scores for each participant determines whether you should carry out a ttest for a single sample or a ttest for dependent means. You use a ttest for a single sample when you know the population mean and you have one score for each participant; you use the ttest for dependent means when you do not know the population mean and there are two scores for each participant.

;2.776

IS B

N 0-

55 8-

46 76

1- X

246 Chapter 7

Answers

1.A ttest for dependent means is used when you are doing hypothesis testing and you have two scores for each participant (such as a before-score and an after-score) and the population variance is unknown. It is also used when a study compares participants who are organized into pairs.

2.Subtract one from the other to create a difference (or change) score for each person. The ttest is then done with these difference (or change) scores.

3.(a) The mean of the “known” population (Population 2) is 0. (b) You are com- paring your sample to a situation in which there is no difference—a population of difference scores in which the average difference is 0.

4.(a) Steps of hypothesis testing (all figuring is shown in Table 7–10): ❶Restate the question as a research hypothesis and a null hypothesis

about the populations.There are two populations:

Population 1:People like those tested before and after the experimental procedure. Population 2:People whose scores are the same before and after the experimental procedure.

The research hypothesis is that Population 1’s mean change score (after minus before) is different from Population 2’s. The null hypothesis is that Population 1’s mean change score is the same as Population 2’s.

❷Determine the characteristics of the comparison distribution.The mean of the distribution of means of difference scores (the comparison distribution) is 0; the standard deviation of the distribution of means of dif- ference scores is 6; it is a tdistribution with 4 degrees of freedom.

❸Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected.For a two-tailed test at the .05 level, the cutoff sample scores are and .

❹Determine your sample’s score on the comparison distribution. . t=(4-0)>6=.67

-2.776 +2.776

Table 7–10Figuring for Answer to “How Are You Doing?”Question 4

ScoreDifferenceDeviation

PersonBeforeAfter(After– Before)(Difference– M) Squared Deviation

1203010636

230502016256

32010196

44030196

5304010636

14016020720

For difference scores:

tfor needed for 5% significance level,two-tailed.

Decision:Do not reject the null hypothesis.

t=(M-�)>SM=(4-0)>6=.67 =;2.776 df=4

SM=2S 2 M=236=6.

S 2 M=S

2 >N=180>5=36.

S 2 =SS>df=720>(5-1)=720>4=180.

�=0. M=20>5=4.00.

©:

-14 -10 -14 -10

IS B

N 0-558-46761-X

Introduction to t Tests 247

Assumptions of the t Test for a Single Sample and the t Test for Dependent Means As we have seen, when you are using an estimated population variance, the comparison distribution is a t distribution. However, the comparison distribution will be exactly a t distribution only if the distribution of individuals follows a normal curve. Otherwise, the comparison distribution will follow some other (usually unknown) shape.

Thus, strictly speaking, a normal population is a requirement within the logic and mathematics of the t test. A requirement like this for a hypothesis-testing procedure is called an assumption. That is, a normal population distribution is one assumption of the t test. The effect of this assumption is that if the population distri- bution is not normal, the comparison distribution will be some indeterminate shape other than a t distribution—and thus the cutoffs on the t table will be incorrect.

Unfortunately, when you do a t test, you don’t know whether the population is nor- mal. This is because, when doing a t test, usually all you have to go on are the scores in your sample. Fortunately, however, as we saw in Chapter 3, distributions in psychology research quite often approximate a normal curve. (This also applies to distributions of difference scores.) Also, statisticians have found that, in practice, you get reasonably ac- curate results with t tests even when the population is rather far from normal. In other words, the t test is said to be robust over moderate violations of the assumption of a nor- mal population distribution. How statisticians figure out the robustness of a test is an interesting topic, which is described in Box 8–1 in Chapter 8.

The only very common situation in which using a t test for dependent means is likely to give a seriously distorted result is when you are using a one-tailed test and the population is highly skewed (is very asymmetrical, with a much longer tail on one side than the other). Thus, you need to be cautious about your conclusions when doing a one-tailed test if the sample of difference scores is highly skewed, suggest- ing the population it comes from is also highly skewed.

Effect Size and Power for the t Test for Dependent Means Effect Size You can figure the effect size for a study using a t test for dependent means the same way as in Chapter 6.6 t is the difference between the population means divided by the population standard deviation: . When using this formula for a t test for dependent means, is for the predicted mean of the population of differ- ence scores, (the “known” population mean) is almost always 0, and usually stands for the standard deviation of the population of difference scores. The conven- tions for effect size for a t test for dependent means are also the same as you learned for the situation we considered in Chapter 6: A small effect size is .20, a medium ef- fect size is .50, and a large effect size is .80.

Consider an example. A sports psychologist plans a study on attitudes toward teammates before versus after a game. She will administer an attitude questionnaire twice, once before and once after a game. Suppose that the smallest before-after dif- ference that would be of any importance is 4 points on the questionnaire. Also sup- pose that, based on related research, the researcher figures that the standard deviation of difference scores on this attitude questionnaire is about 8 points. Thus, and

. Applying the effect size formula, . In terms of the effect size conventions, her planned study has a medium effect size.

d = (�1 - �2)>� = (4 - 0)>8 = .50� = 8 �1 = 4

��2

�1

d = (�1 - �2)>�

assumption condition, such as a pop- ulation’s having a normal distribution, required for carrying out a particular hypothesis-testing procedure; a part of the mathematical foundation for the accuracy of the tables used in determin- ing cutoff values.

robustness extent to which a particu- lar hypothesis-testing procedure is rea- sonably accurate even when its assumptions are violated.

IS B

N 0-

55 8-

46 76

1- X

248 Chapter 7

To estimate the effect size after a study, use the actual mean of your sample’s difference scores as your estimate of , and use S (for the population of difference scores) as your estimate of .

Consider our first example of a t test for dependent means, the study of husbands’ change in communication quality. In that study, the mean of the differ- ence scores was . The estimated population standard deviation of the differ- ence scores would be 12.41. That is, we figured the estimated variance of the difference scores to be 153.49; Therefore, the estimated effect size is . This is a very large effect size. (The negative sign for the effect size means that the large effect was a decrease.)

Power Power for a t test for dependent means can be determined using a power table, a power software package, or an Internet power calculator. Table 7–11 gives the approximate power at the .05 significance level for small, medium, and large effect sizes and one- tailed and two-tailed tests. In the sports psychology example, the researcher expected a medium effect size ( ). If she planned to conduct the study using the .05 level, two-tailed, with 20 participants, the study would have a power of .59. This means that, if the research hypothesis is true and has a medium effect size, there is a 59% chance that this study will come out significant.

The power table (Table 7–11) is also useful when you are reading about a non- significant result in a published study. Suppose that a study using a t test for dependent means has a nonsignificant result. The study tested significance at the .05 level, was two-tailed, and had 10 participants. Should you conclude that there is in fact no differ- ence at all in the populations? Probably not. Even assuming a medium effect size, Table 7–11 shows that there is only a 32% chance of getting a significant result in this study.

d = .50

d = (�1 - �2)>� = (M - 0)>S = ( - 12.05 - 0)>12.39 = - .97 2S2 = 12.39.(S2)

- 12.05

� �1

T I P F O R S U C C E S S Recall from Chapter 6 that power can be expressed as a probability (such as .71) or as a percentage (such as 71%). Power is expressed as a probability in Table 7–11 (as well as in power tables in later chapters).

Table 7–11 Approximate Power for Studies Using the t Test for Dependent Means for Testing Hypotheses at the .05 Significance Level

Effect SizeDifference Scores in Sample (N )

Small (d � .20)

Medium (d � .50)

Large (d � .80)

One-tailed test

10 .15 .46 .78

20 .22 .71 .96

30 .29 .86 *

40 .35 .93 *

50 .40 .97 *

100 .63 * *

Two-tailed test

10 .09 .32 .66

20 .14 .59 .93

30 .19 .77 .99

40 .24 .88 *

50 .29 .94 *

100 .55 * *

*Power is nearly 1.

IS B

N 0-558-46761-X

Introduction to t Tests 249

Consider another study that was not significant. This study also used the .05 sig- nificance level, two-tailed. This study had 100 research participants. Table 7–11 tells you that there would be a 55% chance of the study’s coming out significant if there were even a true small effect size in the population. If there were a medium effect size in the population, the table indicates that there is almost a 100% chance that this study would have come out significant. Thus, in this study with 100 participants, we could conclude from the results that in the population there is probably at most a small difference.

To keep Table 7–11 simple, we have given power figures for only a few differ- ent numbers of participants (10, 20, 30, 40, 50, and 100). This should be adequate for the kinds of rough evaluations you need to make when evaluating results of research articles.7

Planning Sample Size Table 7–12 gives the approximate number of participants needed for 80% power for a planned study. (Eighty percent is a common figure used by researchers for the minimum power to make a study worth doing.) Suppose you plan a study in which you expect a large effect size and you use the .05 significance level, two-tailed. The table shows you would only need 14 participants to have 80% power. On the other hand, a study using the same significance level, also two-tailed, but in which you ex- pect only a small effect size would need 196 participants for 80% power.8

How are you doing?

1. (a) What is an assumption in hypothesis testing? (b) Describe a specific as- sumption for a t test for dependent means. (c) What is the effect of violating this assumption? (d) What does it mean to say that the t test for dependent means is robust? (e) Describe a situation in which it is not robust.

2. How can you tell if you have violated the normal curve assumption? 3. (a) Write the formula for effect size; (b) describe each of its terms as they

apply to a planned t test for dependent means; (c) describe what you use for each of its terms in figuring effect size for a completed study that used a t test for dependent means.

4. You are planning a study in which you predict the mean of the population of difference scores to be 40, and the population standard deviation is 80. You plan to test significance using a t test for dependent means, one-tailed, with an alpha of .05. (a) What is the predicted effect size? (b) What is the power of this study if you carry it out with 20 participants? (c) How many participants would you need to have 80% power?

Table 7–12 Approximate Number of Research Participants Needed for 80% Power for the t Test for Dependent Means in Testing Hypotheses at the .05 Significance Level

Effect Size

Small (d � .20)

Medium (d � .50)

Large (d � .80)

One-tailed 156 26 12

Two-tailed 196 33 14

IS B

N 0-

55 8-

46 76

1- X

250 Chapter 7

Controversy: Advantages and Disadvantages of Repeated-Measures Designs The main controversies about t tests have to do with their relative advantages and disadvantages compared to various alternatives (alternatives we will discuss in Chapter 14). There is, however, one consideration that we want to comment on now. It is about all research designs in which the same participants are tested before and after some experimental intervention (the kind of situation the t test for dependent means is often used for).

Studies using difference scores (that is, studies using a repeated-measures de- sign) often have much larger effect sizes for the same amount of expected difference between means than other kinds of research designs. That is, testing each of a group of participants twice (once under one condition and once under a different condition) usually produces a study with high power. In particular, this kind of study gives more power than dividing the participants up into two groups and testing each group once (one group tested under one condition and the other tested under another condition). In fact, studies using difference scores usually have even more power than those in which you have twice as many participants, but each is tested only once.

Why do repeated-measures designs have so much power? The reason is that the standard deviation of difference scores is usually quite low. (The standard deviation of difference scores is what you divide by to get the effect size when using difference scores.) This produces a large effect size, which increases the power. In a repeated- measures design, the only variation is in the difference scores. Variation among par- ticipants on each testing’s scores is not part of the variation involved in the analysis. As an example, look back at Table 7–8 from our romantic love and brain imaging study. Notice that there were very great differences between the scores (fMRI scanner

Answers

1.(a) An assumption is a requirement that you must meet for the results of the hypothesis testing procedure to be accurate.(b) The population of individu- als’ difference scores is assumed to be a normal distribution. (c)The signifi- cance level cutoff from the ttable is not accurate. (d) Unless you very strongly violate the assumption (that is, unless the population distribution is very far from normal), the cutoff is fairly accurate.(e) The ttest for dependent means is not robust when you are doing a one-tailed test and the population distrib- ution is highly skewed.

2.You look at the distribution of the sample of difference scores to see if it is dramatically different from a normal curve.

3.(a) .(b) dis the effect size; is for the predicted mean of the population of difference scores; is the mean of the known population, which for a population of difference scores is almost always 0; is for the standard deviation of the population of difference scores.(c) To estimate , you use M,the actual mean of your sample’s difference scores; remains as 0; and for , you use S,the estimated standard deviation of the population of difference scores.

4.(a)Predicted effect size: . (b) Power of this study: .71. (c) Number of participants for 80% power: 26.

d=(�1-�2)>�=(40-0)>80=.50

�

�2

�1

�

�2

�1 d=(�1-�2)>�

IS B

N 0-558-46761-X

Introduction to t Tests 251

activation values) for each participant. The first participant’s scores were around 1,487, the second’s was around 1,328, and so forth. Each person has a quite different overall level of activation. But the differences between the two conditions were rela- tively small. What we see in this example is that, because difference scores are all comparing participants to themselves, the variation in them is much less (and does not include the variation between participants). William S. Gosset, who essentially invented the t test (see Box 7–1), made much of the higher power of repeated- measures studies in a historically interesting controversy over an experiment about milk, which is described in Box 7–2.

On the other hand, testing a group of people before and after an experimental proce- dure, without any kind of control group that does not go through the procedure, is a weak research design (Cook & Campbell, 1979). Even if such a study produces a significant difference, it leaves many alternative explanations for that difference. For example, the research participants might have matured or improved during that period anyway, or perhaps other events happened between tests, or the participants not getting benefits may have dropped out. It is even possible that the initial test itself caused changes.

Note, however, that the difficulties of research that tests people before and after some intervention are shared only slightly with the kind of study in which participants are tested under two conditions, such as viewing a beloved person’s picture and a neu- tral person’s picture, with half tested first viewing the beloved’s picture and half tested first viewing the neutral person’s picture. Another example would be a study examining the hand-eye coordination of a group of surgeons under both quiet and noisy conditions (not while doing surgery, of course). Each surgeon would perform the test of hand-eye

from group to group if they took pity on a child whom they felt would benefit from receiving milk!

However, even more interesting in light of the present chapter, Gosset demonstrated that the researchers could have obtained the same result with 50 pairs of identical twins, flipping a coin to determine which of each pair was in the milk group (and sticking to it). Of course, the statis- tic you would use is the t test as taught in this chapter—the t test for dependent means.

More recently, the development of power analysis, which we introduced in Chapter 6, has thoroughly vindi- cated Gosset. It is now clear just how surprisingly few participants are needed when a researcher can find a way to set up a repeated-measures design in which difference scores are the basic unit of analysis. (In this case, each pair of twins would be one “participant.”) As Gosset could have told them, studies that use the t test for depen- dent means can be extremely sensitive.

Sources: Peters (1987); Tankard (1984).

B O X 7 – 2 The Power of Studies Using Difference Scores: How the Lanarkshire Milk Experiment Could Have Been Milked for More

In 1930, a major health experiment was conducted in Scotland involving 20,000 schoolchildren. Its main pur- pose was to compare the growth of a group of children who were assigned to drink milk regularly to those who were in a control group. The results were that those who drank milk showed more growth.

However, William Gosset, a contemporary statistician and inventor of the t test (see Box 7–1), was appalled at the way the experiment was conducted. It had cost about £7,500, which in 1930 was a huge amount of money, and was done wrong! Large studies such as this were very popular among statisticians in those days because they seemed to imitate the large numbers found in nature. Gosset, by contrast, being a brewer, was forced to use very small numbers in his studies—experimental batches of beer were too costly. And he was often chided by the “real statisticians” for his small sample sizes. But Gosset argued that no number of participants was large enough when strict random assignment was not followed. And in this study, teachers were permitted to switch children

IS B

N 0-

55 8-

46 76

1- X

252 Chapter 7

coordination during quiet conditions and noisy conditions. Ideally, any effects of prac- tice or fatigue from taking the test twice would be equalized by testing half of the sur- geons under noisy conditions first, and half under quiet conditions first.

Single Sample t Tests and Dependent Means t Tests in Research Articles Research articles usually describe t tests in a fairly standard format that includes the degrees of freedom, the t score, and the significance level. For example, “ , p � .05” tells you that the researcher used a t test with 24 degrees of freedom, found a t score of 2.80, and the result was significant at the .05 level. Whether a one-tailed or two-tailed test was used may also be noted. (If not, assume that it was two-tailed.) Usually the means, and sometimes the standard deviations, are given for each testing. Rarely does an article report the standard deviation of the dif- ference scores.

Had our student in the dormitory example reported the results in a research arti- cle, she would have written something like this: “The sample from my dormitory studied a mean of 21 hours ( ). Based on a t test for a single sample, this was significantly different from the known mean of 17 for the college as a whole,

, p � .05, one-tailed.” The researchers in our fictional flood victims example might have written up their results as follows: “The reported hopefulness of our sample of flood victims ( , ) was not significantly differ- ent from the midpoint of the scale, .”

As we noted earlier, psychologists only occasionally use the t test for a single sample. We introduced it mainly as a stepping-stone to the more widely used t test for dependent means. Nevertheless, one sometimes sees the t test for a single sample in research articles. For example, Soproni and colleagues (2001), as part of a larger study, had pet dogs respond to a series of eight trials in which the owner would look at one of two bowls of dog food and the researchers measured whether the dog went to the correct bowl. (The researchers called these “at trials” because the owner looked directly at the target.) For each dog, this produced an average percentage cor- rect that was compared to chance, which would be 50% correct. Here is part of their results: “During the eight test trials for gesture, dogs performed significantly above chance on at target trials: one sample t test, , p � .01 . . .” (p. 124).

As we have said, the t test for dependent means is much more commonly used. Olthoff (1989) might have reported the result of his study of husbands’ communica- tion quality as follows: “There was a significant decline in communication quality, dropping from a mean of 116.32 before marriage to a mean of 104.26 after marriage,

, p � .05.” As another example, Rashotte and Webster (2005) carried out a study about

people’s general expectations about the abilities of men and women. In the study, the researchers showed 174 college students photos of women and men (referred to as the female and male targets, respectively). The students rated the person in each photo in terms of that person’s general abilities (e.g., in terms of the person’s intelli- gence, abstract abilities, capability at most tasks, and so on). For each participant, these ratings were combined to create a measure of the perceived status of the female targets and of the male targets. The researchers then compared the status ratings given for the female targets and male targets. Since each participant in the study rated both the female and the male targets, the researchers compared the status rat- ings assigned to the female and male targets using a t test for dependent means. Table 7–13 shows the results. The row entitled “Whole sample ( )” gives theN = 174

t(18) = - 4.24

t(13) = 5.3

t(9) = 1.17 SD = 1.89M = 4.70

t(15) = 2.35

SD = 6.80

t(24) = 2.80

IS B

N 0-558-46761-X

Introduction to t Tests 253

result of the t test for all 174 participants and shows that the status rating assigned to the male targets was significantly higher than the rating assigned to the female tar- gets ( , p � .001). As shown in the table, the researchers also conducted two additional t tests to see if this effect was the same among the female participants and the male participants. The results showed that both the female and the male partici- pants assigned higher ratings to the male targets.

t = 3.46

Table 7–13 Status Scale: Mean (and SE ) General Expectations for Female and Male Targets

Mean Score (SE )

Respondents Female Target Male Target M � F Target

Difference t (1-tailed p)

Whole sample ( ) 5.60 (.06) 5.85 (.07) .25

Female respondents ( ) 5.62 (.07) 5.84 (.081) .22

Male respondents ( ) 5.57 (.10) 5.86 (.11) .29

2.26 ( 6 .05)N = 63 2.62 ( 6 .05)N = 111 3.46 ( 6 .001)N = 174

1. You use the standard steps of hypothesis testing even when you don’t know the population variance. However, in this situation you have to estimate the popula- tion variance from the scores in the sample, using a formula that divides the sum of squared deviation scores by the degrees of freedom ( ).

2. When the population variance is estimated, the comparison distribution of means is a t distribution (with cutoffs given in a t table). A t distribution has slightly heavier tails than a normal curve (just how much heavier depends on how few the degrees of freedom are). Also, in this situation, a sample’s number of standard deviations from the mean of the comparison distribution is called a t score.

3. You use a t test for a single sample when a sample mean is being compared to a known population mean and the population variance is unknown.

4. You use a t test for dependent means in studies where each participant has two scores, such as a before-score and an after-score or a score in each of two experi- mental conditions. A t test for dependent means is also used when you have scores from pairs of research participants. In this t test, you first figure a difference or change score for each participant, then go through the usual five steps of hypothe- sis testing with the modifications described in summary points 1 and 2 and making Population 2 a population of difference scores with a mean of 0 (no difference).

5. An assumption of the t test is that the population distribution is a normal curve. However, even when it is not, the t test is usually fairly accurate.

6. The effect size of a study using a t test for dependent means is the mean of the difference scores divided by the standard deviation of the difference scores. You can look up power and needed sample size for any particular level of power using power software packages, an Internet power calculator, or special tables.

7. The power of studies using difference scores is usually much higher than that of studies using other designs with the same number of participants. However, re- search using a single group tested before and after some intervening event, without a control group, allows for many alternative explanations of any observed changes.

8. t tests are reported in research articles using a standard format. For example, “ , p � .05.”t(24) = 2.80

df = N - 1

Summary

IS B

N 0-

55 8-

46 76

1- X

254 Chapter 7

t tests (p. 223) t test for a single sample (p. 223) biased estimate (p. 225) unbiased estimate of the population

variance ( ) (p. 226)S2

degrees of freedom (df ) (p. 226) t distribution (p. 228) t table (p. 229) t score (p. 230) repeated-measures design (p. 236)

t test for dependent means (p. 236) difference scores (p. 237) assumption (p. 247) robustness (p. 247)

Key Terms

t Test for a Single Sample Eight participants are tested after being given an experimental procedure. Their scores are 14, 8, 6, 5, 13, 10, 10, and 6. The population of people not given this pro- cedure is normally distributed with a mean of 6. Using the .05 level, two-tailed, does the experimental procedure make a difference? (a) Use the five steps of hypothesis testing and (b) sketch the distributions involved.

Answer (a) Steps of hypothesis testing:

❶ Restate the question as a research hypothesis and a null hypothesis about the populations. There are two populations:

Population 1: People who are given the experimental procedure. Population 2: The general population.

The research hypothesis is that the Population 1 will score differently than Population 2. The null hypothesis is that Population 1 will score the same as Population 2.

❷ Determine the characteristics of the comparison distribution. The mean of the distribution of means is 6 (the known population mean). To figure the estimated population variance, you first need to figure the sample mean, which is ( . The estimated population variance is ; the variance of the distribution of means is The standard deviation of the distribution of means is Its shape will be a t distribution for .

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. From Table A–2, the cutoffs for a two-tailed t test at the .05 level for are and .

❹ Determine your sample’s score on the comparison distribution. .

➎ Decide whether to reject the null hypothesis. The t of 2.54 is more extreme than the needed t of . Therefore, reject the null hypothesis; the research hypothesis is supported. The experimental procedure does make a difference.

(b) Sketches of distributions are shown in Figure 7–9.

; 2.365

t = (M - �)>SM = (9 - 6)>1.18 = 3>1.18 = 2.54

- 2.365+ 2.365df = 7

df = 7 SM = 2S2M = 21.39 = 1.18.

S2M = S2 > N = 11.14 > 8 = 1.39. 78>7 = 11.14S2 = SS>df =

8 = 72>8 = 914 + 8 + 6 + 5 + 13 + 10 + 10 + 6)>

Example Worked-Out Problems

IS B

N 0-558-46761-X

Introduction to t Tests 255

t Test for Dependent Means A researcher tests 10 individuals before and after an experimental procedure. The results are as follows:

Comparison distribution

(t)

Population (normal)

4.82 6 7.18 8.36 –1 0 1 2

Sample

Raw Scores: t Scores:

Figure 7–9 Distributions for answer to Example Worked-Out Example Problem for t test for a single sample.

Participant Before After

1 10.4 10.8 2 12.6 12.1 3 11.2 12.1 4 10.9 11.4 5 14.3 13.9 6 13.2 13.5 7 9.7 10.9 8 11.5 11.5 9 10.8 10.4

10 13.1 12.5

Test the hypothesis that there is an increase in scores, using the .05 significance level. (a) Use the five steps of hypothesis testing and (b) sketch the distributions involved.

IS B

N 0-

55 8-

46 76

1- X

256 Chapter 7

Answer (a) Table 7–14 shows the results, including the figuring of difference scores and

all the other figuring for the t test for dependent means. Here are the steps of hypothesis testing:

❶ Restate the question as a research hypothesis and a null hypothesis about the populations. There are two populations:

Population 1: People like those who are given the experimental procedure. Population 2: People who show no change from before to after.

The research hypothesis is that Population 1’s mean difference score (figured using “after” scores minus “before” scores) is greater than Population 2’s mean difference score. The null hypothesis is that Population 1’s mean dif- ference score is not greater than Population 2’s.

❷ Determine the characteristics of the comparison distribution. Its popula- tion mean is 0 difference. The estimated population variance of difference scores, , is shown in Table 7–14 to be .388. As shown in Table 7–14, the standard deviation of the distribution of means of difference scores, , is .197. Therefore, the comparison distribution has a mean of 0 and a standard deviation of .197. It will be a t distribution for .

❸ Determine the cutoff sample score on the comparison distribution at which the null hypothesis should be rejected. For a one-tailed test at the .05 level with

, the cutoff is 1.833. (The cutoff is positive as the research hypothesis is that Population 1’s mean difference score will be greater than Population 2’s.) df = 9

df = 9

Table 7–14 Figuring for Answer to Example Worked-Out Problem for t Test for Dependent Means

Participant Score Difference

(After � Before) Deviation

(Difference � M ) Squared Deviation

Before After

1 10.4 10.8 .4 .260 .068

2 12.6 12.1 .410

3 11.2 12.1 .9 .760 .578

4 10.9 11.4 .5 .360 .130

5 14.3 13.9 .292

6 13.2 13.5 .3 .160 .026

7 9.7 10.9 1.2 1.060 1.124

8 11.5 11.5 0.0 .020

9 10.8 10.4 .292

10 13.1 12.5 .548

117.7 119.1 1.4 3.488

For difference scores:

t for needed for 5% significance level, one-tailed

Decision: Do not reject the null hypothesis.

t = (M - �)>SM = (.140 - 0)>.197 = .71. = 1.833.df = 9

SM = 2S 2M = 2.039 = .197. S 2M = S 2>N = .388>10 = .039. S 2 = SS>df = 3.488>(10 - 1) = 3.488>9 = .388. � = 0. M = 1.4>10 = .140.

- .540- .4

- .640- .5

IS B

N 0-558-46761-X

Introduction to t Tests 257

❹ Determine your sample’s score on the comparison distribution. The sam- ple’s mean change of .140 is .71 standard deviations (of .197 each) on the distribution of means above that distribution’s mean of 0. That is,

➎ Decide whether to reject the null hypothesis. The sample’s t of .71 is less extreme than the needed t of 1.833. Thus, you cannot reject the null hypothe- sis. The study is inconclusive.

(b) Sketches of distributions are shown in Figure 7–10.

Outline for Writing Essays for a t Test for a Single Sample 1. Describe the core logic of hypothesis testing in this situation. Be sure to mention

that the t test for a single sample is used for hypothesis testing when you have scores for a sample of individuals and you want to compare the mean of this sample to a population for which the mean is known but the variance is un- known. Be sure to explain the meaning of the research hypothesis and the null hypothesis in this situation.

2. Outline the logic of estimating the population variance from the sample scores. Explain the idea of biased and unbiased estimates of the population variance, and describe the formula for estimating the population variance and why it is different from the ordinary variance formula.

3. Describe the comparison distribution (the t distribution) that is used with a t test for a single sample, noting how it is different from a normal curve and why.

t = (M - �)>SM = (.140 - 0)>.197 = .71

Comparison distribution (t)

Population of difference

scores

–.20 0 .20 –1 0 1

.14

Sample

Raw Scores: t Scores:

Figure 7–10 Distributions for answer to Example Worked-Out Problem for t test for dependent means.

IS B

N 0-

55 8-

46 76

1- X

258 Chapter 7

Explain why a t distribution (as opposed to the normal curve) is used as the comparison distribution.

4. Describe the logic and process for determining the cutoff sample score(s) on the comparison distribution at which the null hypothesis should be rejected.

5. Describe why and how you figure the t score of the sample mean on the compar- ison distribution.

6. Explain how and why the scores from Steps ❸ and ❹ of the hypothesis-testing process are compared. Explain the meaning of the result of this comparison with regard to the specific research and null hypotheses being tested.

Outline for Writing Essays for a t Test for Dependent Means 1. Describe the core logic of hypothesis testing in this situation. Be sure to mention

that the t test for dependent means is used for hypothesis testing when you have two scores from each person in your sample. Be sure to explain the meaning of the research hypothesis and the null hypothesis in this situation. Explain the logic and procedure for creating difference scores.

2. Explain why you use 0 as the mean for the comparison distribution. 3. Outline the logic of estimating the population variance of difference scores from

the sample scores. Explain the idea of biased and unbiased estimates of the pop- ulation variance, and describe the formula for estimating the population vari- ance. Describe how to figure the standard deviation of the distribution of means of difference scores.

4. Describe the comparison distribution (the t distribution) that is used with a t test for dependent means. Explain why a t distribution (as opposed to the normal curve) is used as the comparison distribution.

5. Describe the logic and process for determining the cutoff sample score(s) on the comparison distribution at which the null hypothesis should be rejected.

6. Describe why and how you figure the t score of the sample mean on the compar- ison distribution.

7. Explain how and why the scores from Steps ❸ and ❹ of the hypothesis-testing process are compared. Explain the meaning of the result of this comparison with regard to the specific research and null hypotheses being tested.

These problems involve figuring. Most real-life statistics problems are done on a computer with special statistical software. Even if you have such software, do these problems by hand to ingrain the method in your mind. To learn how to use a computer to solve statistics problems like those in this chapter, refer to the Using SPSS section at the end of this chapter and the Study Guide and Computer Workbook that accompanies this text.

All data are fictional unless an actual citation is given.

Set I (for Answers to Set I Problems, see pp. 681–683) 1. In each of the following studies, a single sample’s mean is being compared to a

population with a known mean but an unknown variance. For each study, decide whether the result is significant. (Be sure to show all of your calculations.)`

Practice Problems

IS B

N 0-558-46761-X

Introduction to t Tests 259

2. Suppose a candidate running for sheriff in a rural community claims that she will reduce the average speed of emergency response to less than 30 minutes, which is thought to be the average response time with the current sheriff. There are no past records; so the actual standard deviation of such response times cannot be deter- mined. Thanks to this campaign, she is elected sheriff, and careful records are now kept. The response times for the first month are 26, 30, 28, 29, 25, 28, 32, 35, 24, and 23 minutes.

Using the .05 level of significance, did she keep her promise? (a) Use the steps of hypothesis testing. (b) Sketch the distributions involved. (c) Explain your answer to someone who has never taken a course in statistics.

3. A researcher tests five individuals who have seen paid political ads about a partic- ular issue. These individuals take a multiple-choice test about the issue in which people in general (who know nothing about the issue) usually get 40 questions correct. The number correct for these five individuals was 48, 41, 40, 51, and 50.

Using the .05 level of significance, two-tailed, do people who see the ads do better on this test? (a) Use the steps of hypothesis testing. (b) Sketch the dis- tributions involved. (c) Explain your answer to someone who is familiar with the Z test (from Chapter 5) but is unfamiliar with t tests.

4. For each of the following studies using difference scores, test the significance using a t test for dependent means.

Estimated Sample Population Population Sample Significance Level Size (N ) Mean ( ) Variance ( ) Mean (M ) Tails ( )

(a) 64 12.40 9.00 11.00 1 (low predicted) .05 (b) 49 1,006.35 317.91 1,009.72 2 .01 (c) 400 52.00 7.02 52.41 1 (high predicted) .01

�S 2�

Number of Mean of Estimated Difference Difference Population Scores in Scores in Variance of Significance Sample Sample Difference Scores Tails Level

(a) 20 1.7 8.29 1 (high predicted) .05 (b) 164 2.3 414.53 2 .05 (c) 15 4.00 1 (low predicted) .01- 2.2

5. A program to decrease littering was carried out in four cities in California’s Central Valley starting in August 2007. The amount of litter in the streets (aver- age pounds of litter collected per block per day) was measured during July be- fore the program started and then the next July, after the program had been in effect for a year. The results were as follows:

City July 2007 July 2008

Fresno 9 2 Merced 10 4 Bakersfield 8 9 Stockton 9 1

IS B

N 0-

55 8-

46 76

1- X

260 Chapter 7

Using the .01 level of significance, was there a significant decrease in the amount of litter? (a) Use the five steps of hypothesis testing. (b) Sketch the distributions involved. (c) Explain your answer to someone who understands mean, standard deviation, and variance, but knows nothing else about statistics.

6. A researcher assesses the level of a particular hormone in the blood in five pa- tients before and after they begin taking a hormone treatment program. Results for the five are as follows:

Patient Before After

A .20 .18 B .16 .16 C .24 .23 D .22 .19 E .17 .16

Using the .05 significance level, was there a significant change in the level of this hormone? (a) Use the steps of hypothesis testing. (b) Sketch the distribu- tions involved. (c) Explain your answer to someone who understands the t test for a single sample but is unfamiliar with the t test for dependent means.

7. Figure the estimated effect size and indicate whether it is approximately small, medium, or large, for each of the following studies:

Mean Change S

(a) 20 32 (b) 5 10 (c) .1 .4 (d) 100 500

8. What is the power of each of the following studies, using a t test for dependent means (based on the .05 significance level)?

Effect Size N Tails

(a) Small 20 One (b) Medium 20 One (c) Medium 30 One (d) Medium 30 Two (e) Large 30 Two

9. About how many participants are needed for 80% power in each of the follow- ing planned studies that will use a t test for dependent means with p � .05?

Predicted Effect Size Tails

(a) Medium Two (b) Large One (c) Small One

IS B

N 0-558-46761-X

Introduction to t Tests 261

10. Weller and Weller (1997) conducted a study of the tendency for the menstrual cycles of women who live together (such as sisters) to become synchronized. For their statistical analysis, they compared scores on a measure of synchronization of pairs of sisters living together versus the degree of synchronization that would be expected by chance (lower scores mean more synchronization). Their key re- sults (reported in a table not reproduced here) were synchrony scores of 6.32 for the 30 roommate sister pairs in their sample compared to an expected synchrony score of 7.76; they then reported a t score of 2.27 and a p level of .011 for this dif- ference. Explain this result to a person who is familiar with hypothesis testing with a known population variance, but not with the t test for a single sample.

11. A psychologist conducts a study of perceptual illusions under two different lighting conditions. Twenty participants were each tested under both of the two different conditions. The experimenter reported: “The mean number of effective illusions was 6.72 under the bright conditions and 6.85 under the dimly lit con- ditions, a difference that was not significant, .” Explain this result to a person who has never had a course in statistics. Be sure to use sketches of the distributions in your answer.

12. A study was done of personality characteristics of 100 students who were tested at the beginning and end of their first year of college. The researchers reported the results in the following table:

t(19) = 1.62

(a) Focusing on the difference scores, figure the t values for each personality scale. (Assume that SD in the table is for what we have called S, the unbiased estimate of the population standard deviation.) (b) Explain to a person who has never had a course in statistics what this table means.

Set II 13. In each of the following studies, a single sample’s mean is being compared to a

population with a known mean but an unknown variance. For each study, decide whether the result is significant.

Fall Spring Difference

Personality Scale M SD M SD M SD

Anxiety 16.82 4.21 15.32 3.84 1.50** 1.85 Depression 89.32 8.39 86.24 8.91 3.08** 4.23 Introversion 59.89 6.87 60.12 7.11 2.22 Neuroticism 38.11 5.39 37.22 6.02 .89* 4.21

*p � .05. **p � .01.

- .23

Estimated Population Standard Sample Significance

Sample Population Deviation Mean Level Size (N ) Mean ( ) (S ) (M ) Tails ( )

(a) 16 100.31 2.00 100.98 1 (high predicted) .05 (b) 16 .47 4.00 .00 2 .05 (c) 16 68.90 9.00 34.00 1 (low predicted) .01

��

IS B

N 0-

55 8-

46 76

1- X

262 Chapter 7

14. Evolutionary theories often emphasize that humans have adapted to their physi- cal environment. One such theory hypothesizes that people should spontaneously follow a 24-hour cycle of sleeping and waking—even if they are not exposed to the usual pattern of sunlight. To test this notion, eight paid volunteers were placed (individually) in a room in which there was no light from the outside and no clocks or other indications of time. They could turn the lights on and off as they wished. After a month in the room, each individual tended to develop a steady cycle. Their cycles at the end of the study were as follows: 25, 27, 25, 23, 24, 25, 26, and 25.

Using the .05 level of significance, what should we conclude about the theory that 24 hours is the natural cycle? (That is, does the average cycle length under these conditions differ significantly from 24 hours?) (a) Use the steps of hypothesis testing. (b) Sketch the distributions involved. (c) Explain your an- swer to someone who has never taken a course in statistics.

15. In a particular country, it is known that college seniors report falling in love an average of 2.20 times during their college years. A sample of five seniors, origi- nally from that country but who have spent their entire college career in the United States, were asked how many times they had fallen in love during their college years. Their numbers were 2, 3, 5, 5, and 2. Using the .05 significance level, do students like these who go to college in the United States fall in love more often than those from their country who go to college in their own coun- try? (a) Use the steps of hypothesis testing. (b) Sketch the distributions in- volved. (c) Explain your answer to someone who is familiar with the Z test (from Chapter 5) but is unfamiliar with the t test for a single sample.

16. For each of the following studies using difference scores, test the significance using a t test for dependent means.

Number of Difference Scores in Sample

Mean of Difference

Scores

for Difference

Scores

S 2

Tails Significance

Level

(a) 10 3.8 50 One (high) .05 (b) 100 3.8 50 One (high) .05 (c) 100 1.9 50 One (high) .05 (d) 100 1.9 50 Two .05 (e) 100 1.9 25 Two .05

17. Four individuals with high levels of cholesterol went on a special crash diet, avoiding high-cholesterol foods and taking special supplements. Their total cholesterol levels before and after the diet were as follows:

Participant Before After

J. K. 287 255 L. M. M 305 269 A. K. 243 245 R. O. S. 309 247

Using the .05 level of significance, was there a significant change in cholesterol level? (a) Use the steps of hypothesis testing. (b) Sketch the distributions

IS B

N 0-558-46761-X

Introduction to t Tests 263

involved. (c) Explain your answer to someone who has never taken a course in statistics.

18. Five people who were convicted of speeding were ordered by the court to attend a workshop. A special device put into their cars kept records of their speeds for two weeks before and after the workshop. The maximum speeds for each person during the two weeks before and the two weeks after the workshop follow.

Participant Before After

L. B. 65 58 J. K. 62 65 R .C. 60 56 R. T. 70 66 J. M. 68 60

Using the .05 significance level, should we conclude that people are likely to drive more slowly after such a workshop? (a) Use the steps of hypothesis test- ing. (b) Sketch the distributions involved. (c) Explain your answer to someone who is familiar with hypothesis testing involving known populations, but has never learned anything about t tests.

19. The amount of oxygen consumption was measured in six individuals over two 10-minute periods while sitting with their eyes closed. During one period, they listened to an exciting adventure story; during the other, they heard restful music.

Based on the results shown, is oxygen consumption less when listening to the music? Use the .01 significance level. (a) Use the steps of hypothesis testing. (b) Sketch the distributions involved. (c) Explain your answer to someone who understands mean, standard deviation, and variance but knows nothing else about statistics.

20. Five sophomores were given an English achievement test before and after receiving instruction in basic grammar. Their scores are shown below.

Participant Story Music

1 6.12 5.39 2 7.25 6.72 3 5.70 5.42 4 6.40 6.16 5 5.82 5.96 6 6.24 6.08

Student Before After

A 20 18 B 18 22 C 17 15 D 16 17 E 12 9

IS B

N 0-

55 8-

46 76

1- X

264 Chapter 7

Is it reasonable to conclude that future students would show higher scores after instruction? Use the .05 significance level. (a) Use the steps of hypothesis test- ing. (b) Sketch the distributions involved (c) Explain your answer to someone who understands mean, standard deviation, and variance but knows nothing else about statistics.

21. Figure the predicted effect size and indicate whether it is approximately small, medium, or large, for each of the following planned studies:

Predicted Mean Change

(a) 8 30 (b) 8 10 (c) 16 30 (d) 16 10

�

22. What is the power of each of the following studies, using a t test for dependent means (based on the .05 significance level)?

Effect Size N Tails

(a) Small 50 Two (b) Medium 50 Two (c) Large 50 Two (d) Small 10 Two (e) Small 40 Two (f) Small 100 Two (g) Small 100 One

23. About how many participants are needed for 80% power in each of the follow- ing planned studies that will use a t test for dependent means with p � .05?

Predicted Effect Size Tails

(a) Small Two (b) Medium One (c) Large Two

24. A study compared union activity of employees in 10 plants during two different decades. The researchers reported “a significant increase in union activity,

, p � .01.” Explain this result to a person who has never had a course in statistics. Be sure to use sketches of the distributions in your answer.

25. Holden and colleagues (1997) compared mothers’ reported attitudes toward cor- poral punishment of their children from before to 3 years after having their first child. “The average change in the women’s prior-to-current attitudes was signif- icant, , p � .001 . . . ” (p. 485). (The change was that they felt more negatively about corporal punishment after having their child.) Explain this result to someone who is familiar with the t test for a single sample, but not with the t test for dependent means.

26. Table 7–15 (reproduced from Table 4 of Larson et al., 2001) shows ratings of various aspects of work and home life of 100 middle-class men in India who were fathers. Pick three rows of interest to you and explain the results to some- one who is familiar with the mean, variance, and Z scores but knows nothing else about statistics.

t(107) = 10.32

t(9) = 3.28

IS B

N 0-558-46761-X

Introduction to t Tests 265

The U in the following steps indicates a mouse click. (We used SPSS version 15.0 for Windows to carry out these analyses. The steps and output may be slightly differ- ent for other versions of SPSS.)

t Test for a Single Sample ❶ Enter the scores from your distribution in one column of the data window. ❷ U Analyze. ❸ U Compare means. ❹ U One-sample T test (this is the name SPSS uses for a t test for a single sample). ➎ U on the variable for which you want to carry out the t test and then U the arrow. ❻ Enter the population mean in the “Test Value” box. ❼ U OK.

Practice these steps by carrying out a single sample t test for the example shown earlier in this chapter of 10 people’s ratings of hopefulness after a flood. The sample scores, population mean, and figuring for that study are shown in Table 7–4 on page 232. Your SPSS output window should look like Figure 7–11. The first table provides information about the variable: the number of scores (“N”); the mean of the scores (“Mean”); the estimated population standard deviation, S (“Std. Deviation”); and the standard deviation of the distribution of means, (“Std. Error Mean”). Check that the values in that table are consistent (allowing for rounding error) with the values in Table 7–4.

The second table in the SPSS output window gives the outcome of the t test. Compare the values of t and df in that table and the values shown in Table 7–4. The exact two-tailed significance level of the t test is given in the “Sig. (2-tailed)” col- umn. In this study, the researcher was using the .01 significance level. The signifi- cance level given by SPSS (.271) is not more extreme than .01, which means that the researcher cannot reject the null hypothesis and the study is inconclusive.

Using SPSS

Table 7–15 Comparison of Fathers’ Mean Psychological States in the Job and Home Spheres ( )

Sphere

Scale Range Work Home Work vs. Home

Important 0–9 5.98 5.06 6.86***

Attention 0–9 6.15 5.13 7.96***

Challenge 0–9 4.11 2.41 11.49***

Choice 0–9 4.28 4.74 ***

Wish doing else 0–9 1.50 1.44 0.61

Hurried 0–3 1.80 1.39 3.21**

Social anxiety 0–3 0.81 0.64 3.17**

Affect 1–7 4.84 4.98 **

Social climate 1–7 5.64 5.95 4.17***

Note: Values for column 3 are t scores; for all t tests. ** *** Source: Larson, R., Dworkin, J., & Verma, S. (2001). Men’s work and family lives in India: The daily organization of time and emotions. Journal of Family Psychology, 15, 206–224. Copyright © 2001 by the American Psychological Association.

p 6 .001. p 6 .01.

df = 90

- 2.64

- 3.38

N = 100

IS B

N 0-

55 8-

46 76

1- X

266 Chapter 7

t Test for Dependent Means ❶ Enter one set of scores (for example, the “before” scores) in the first column of

the data window. Then enter the second set of scores (for example, the “after” scores) in the second column of the data window. (Be sure to enter the scores in the order they are listed.) Since each row in the SPSS data window represents a separate person, it is important that you enter each person’s scores in two sepa- rate columns (for example, a “before” column and an “after” column).

❷ U Analyze. ❸ U Compare means. ❹ U Paired-Samples T Test (this is the name SPSS uses for a t test for dependent

means).

Figure 7–11 Using SPSS to carry out a t test for a single sample for the example of 10 people’s ratings of hopefulness after a flood.

IS B

N 0-558-46761-X

Introduction to t Tests 267

❺ U on the first variable (this will highlight the variable). U on the second vari- able (this will highlight the variable). U the arrow. The two variables will now appear in the “Paired Variables” box.

❻ U OK.

Practice these steps by carrying out a t test for dependent means for Olthoff’s (1989) study of communication quality of 19 men who received ordinary premarital counseling. The scores and figuring for that study are shown in Table 7–6 on page 238. Your SPSS output window should look like Figure 7–12. The key information is con- tained in the third table (labeled “Paired Samples Test”). The final three columns of this table give the t score (4.240), the degrees of freedom (18), and the two-tailed sig- nificance level (.000 in this case) of the t test. The significance level is so small that, even after rounding to three decimal places, it is less than .001. Since the significance level is more extreme than the .05 significance level we set for this study, you can re- ject the null hypothesis. By looking at the means for the “before” variable and the “after” variable in the first table (labeled “Paired Samples Statistics”), you can see that

Figure 7–12 Using SPSS to carry out a t test for dependent means for Olthoff’s (1989) study of communication quality of 19 men who received ordinary premarital counseling.

IS B

N 0-

55 8-

46 76

1- X

268 Chapter 7

the husbands’ communication quality was lower after marriage (a mean of 104.2632) than before marriage (a mean 116.3158). Don’t worry that the t value figured in Table 7–6 was negative, whereas the t value in the SPSS output is positive. This hap- pens because the difference score in Table 7–6 was figured as after minus before, but SPSS figured the difference scores as before minus after. Both ways of figuring the difference score are mathematically correct and the overall result is the same in each case.

1. A sample’s variance is slightly smaller than the population’s because it is based on deviations from the sample’s mean. A sample’s mean is the optimal balance point for its scores. Thus, deviations of a sample’s scores from its mean will be smaller than deviations from any other number. The mean of a sample generally is not exactly the same as the mean of the population it comes from. Thus, devi- ations of a sample’s scores from its mean will generally be smaller than devia- tions of that sample’s scores from the population mean.

2. Statisticians make a subtle distinction in this situation between the comparison distribution and the distribution of means. (We avoid this distinction to simplify your learning of what is already fairly difficult.) The general procedure of hy- pothesis testing, as we introduced it in Chapter 5, can be described as comparing a Z score to your sample’s mean, where and then comparing this Z score to a cutoff Z score from the normal curve table. We described this process as using the distribution of means as your comparison distribution. Sta- tisticians would say that actually you are comparing the Z score you figured for your sample mean to a distribution of Z scores (which is simply a standard nor- mal curve). Similarly, for a t test, statisticians think of the procedure as figuring a t score (like a Z score, but figured using an estimated standard deviation) where and then comparing your computed t score to a cutoff t score from a t distribution table. Thus, according to the formal statistical logic, the comparison distribution is a distribution of t scores, not of means.

3. In line with the terminology we used in Chapter 5, the symbol � in the formula should read , since it refers to the population mean of a distribution of means. In Chapter 5, we used the terminology to emphasize the conceptual differ- ence between the mean of a population of individuals and the mean of a popula- tion of means. But � and are always equal. Thus, to keep the terminology as straightforward as possible in this and subsequent chapters, we refer to the mean of a distribution of means as . (If we were even more formal, we might use or even since we are referring to the mean of Population 2.)

4. The steps of carrying out a t test for a single sample can be combined into a computational formula for t based on difference scores. For learning purposes in your class, you should use the steps as we have discussed them in this chapter. In a real research situation, the figuring is usually all done by computer (see this chapter’s Using SPSS section). Should you ever have to do a t test for a single sample for an actual research study by hand (or just with a hand calculator), you may find the following formula useful:

t = M - �

(N - 1)(N)

�M2

�2�

�M

t = (M - �)>SM

Z = (M - �)>�M

Chapter Notes

The t score for a t test for a single sample is the result of subtracting the population mean from the sample mean and dividing that difference by the square root of the fol- lowing: the sum of the squared scores minus the re- sult of taking the sum of all the scores, squaring this sum and dividing by the number of scores, then taking this whole difference and dividing it by the result of multiplying the number of scores minus 1 by the number of scores.

IS B

N 0-558-46761-X

Introduction to t Tests 269

5. The steps of carrying out a t test for dependent means can be combined into a computational formula for t based on difference scores. For learning purposes in your class, you should use the steps as we have discussed them in this chapter. In a real research situation, the figuring is usually all done by computer (see the Using SPSS section at the end of this chapter). However, if you ever have to do a t test for dependent means for an actual research study by hand (or with just a hand calculator), you may find the formula useful:

6. Single sample t tests are quite rare in practice; so we didn’t include a discussion of effect size or power for them in the main text. However, the effect size for a single sample t test can be figured using the same approach as in Chapter 6 (which is the same as the approach for figuring effect size for the t test for de- pendent means). It is the difference between the population means divided by the population standard deviation: . When using this formula for a t test for a single sample, is the predicted mean of Population 1 (the population from which you are studying a sample), is the mean of the “known” population, and is the population standard deviation. The conven- tions for effect size for a t test for a single sample are the same as you learned for the situation we considered in Chapter 6: A small effect size is .20, a medium effect size is .50, and a large effect size is .80.

7. Cohen (1988, pp. 28–39) provides more detailed tables in terms of numbers of participants, levels of effect size, and significance levels. If you use his tables, note that the d referred to is actually based on a t test for independent means (the situation we consider in Chapter 8). To use these tables for a t test for dependent means, first multiply your effect size by 1.4. For example, if your effect size is .30, for purposes of using Cohen’s tables, you would consider it to be .42 (that is, . ). The only other difference from our table is that Cohen de- scribes the significance level by the letter a (for “alpha level”), with a subscript of either 1 or 2, referring to a one-tailed or two-tailed test. For example, a table that refers to “ ” at the top means that this is the table for p � .05, one- tailed.

8. More detailed tables, giving the needed numbers of participants for levels of power other than 80% (and also for effect sizes other than .20, .50, and .80 and for other significance levels) are provided in Cohen (1988, pp. 54–55). However, see Chapter Note 7 about using Cohen’s tables for a t test for dependent means.

a1 = .05

30 * 1.4 = .42

� �2

�1

d = (�1 - �2)>�

t = (©D)>N

(N - 1)(N)

The t score for a t test for de- pendent means is the result of dividing the sum of the dif- ference scores by the number of difference scores and then dividing that result by the square root of the following: the sum of the squared differ- ence scores minus the result of taking the sum of all the difference scores, squaring this sum and dividing by the number of difference scores, then taking this whole differ- ence and dividing it by the re- sult of multiplying the number of difference scores minus 1 by the number of dif- ference scores.

IS B

N 0-

55 8-

46 76

1- X

Running Head: EMISSIONS OF AUTOMOBILES IN AMERICA

PAGE

EMISSIONS OF AUTOMOBILES IN AMERICA

Emissions of Automobiles in America: A Controlled Investigation

Learning Team Z

Team Member Name Here

Etc.

Psych 315

October 8, 2059

Mr. Avery

Emissions of Automobiles in America: A Controlled Investigation

A growing body of research by leading industry experts shows an alarming rate of hydrocarbons in the atmosphere throughout the lower 48 contiguous United States and Canada. Harmful emissions from automobile tailpipes and engine compartments contribute to the rising rate of hydrocarbons released into the atmosphere each year. Preston (2005) indicates an average atmospheric increase of 3.8% in hydrocarbons each year since 1999. He estimates that a 50% reduction in automobile emissions will lower the overall level of harmful hydrocarbons in the air by 23%. Engineers from the Oppenheimer Group, a leading manufacturer of emissions control products, produced an emissions control device that can be retrofitted to any automobile exhaust system through the tailpipe, and modified to fit in the engine compartment of most cars and trucks. The Oppenheimer Group installed its emissions control device in the exhaust system of 36 randomly selected cars from across the country. Industry standards reveal that automobiles not equipped with the Oppenheimer Group’s emissions control device emit an average of 100 pounds of pollutants per year into the atmosphere with a standard deviation of 15. The Oppenheimer Group predicts that cars equipped with their emissions control device will differ from cars not equipped with their emissions control device in the number of pounds of pollutants released into the air. H1:

≠ 100 and H0:

= 100.

Method

Participants

Forty drivers and their respective cars were recruited to participate in the emissions control test from various cities across the country via a radio, newspaper, and billboard advertisement campaign which lasted 60 days prior to the beginning of the investigation. Four participants withdrew before the study began due to prior commitments that would interfere with their ability to complete all 30 days of the study. Drivers ranged in age from 18 to 59 years with a mean age of 34.12. Nineteen participants were female (52.78%); seventeen were male (47.22%). Twenty-nine participants were married (74.36%). The remaining seven were single or divorced (19.44%). The average education level for all participants was 11.63 years. Average income reported for all participants was $39,700 per year. For all 36 participants, the average number of miles driven per month was 1,140.18. Eight of the participants were unemployed at the beginning of the study (22.22%). All others had one or more jobs. A 25 dollar inducement was offered to those participants who completed the investigation.

Apparatus

The Oppenheimer Group’s emissions control device was retrofitted into the tailpipe of all 36 automobiles by Oppenheimer’s technicians who flew to each participant’s city to install the device. The device was place at both ends of the catalytic converter; that is, the exhaust fumes traveling from the engine first passed through the emissions control device before passing through the catalytic converter and then through a second emissions control device prior to being released through the tailpipe into the air.

Materials

All participants were given Truman’s (1999) questionnaire of driving habits that included questions about driving infractions and/or tickets over the past five years. Those drivers who indicated 3 or more traffic tickets within the past year, and/or 1 or more DUI and DWI infractions within the past 5 years were excluded from the study in the interest of maintaining a high degree of integrity within the investigation.

Procedure

Drivers were instructed to maintain their normal driving routine throughout the 30 day trial. Exhaust emissions were measured on 20 of the 30 days (Monday – Friday) during the study by inserting an emissions meter into the tailpipe and also by taking emission readings from each automobile’s engine compartment. Emissions per car were totaled and the totals for each automobile were added together at the end of the study. Yearly totals were estimated from the monthly totals at the end of the 30 day trial.

Results

A “Z” test was the statistical procedure chosen to determine the significance or non-significance of the hydrocarbons released into the air by the emissions control device. The Oppenheimer Group tested their emissions control device using a two-tailed test @ .05 alpha. The average amount of hydrocarbons released into the air by all cars not equipped with the emissions control device was 100.00 pounds per year with a standard deviation of 15.00.

All thirty-six automobiles in the test group released an average of 99.00 pounds of hydrocarbons into the air per year. The standard error of the mean was 2.50. That is, the population standard deviation of 15.00 divided by the square root of N where N = 36 yielded the standard error of the mean of 2.50. The resulting “Z” test yielded an obtained value of –0.40 against a two-tailed critical value of –1.96. The obtained value, indicated below, was not significant. The mean pounds per year for the sample (M = 99.00, SD = 4.31) dropped slightly from the pounds per year for the population (M = 100.00, SD 4.54).

Z = -0.40, p > .05.

Discussion

The present study attempted to demonstrate that the Oppenheimer Group’s emissions control device released less pollutants into the air from cars on which the device was installed vs the pollutants released into the air from automobiles on which the emissions control device was not installed. After the 30 day trial, no significant difference was found between those automobiles on which the emissions control device was installed compared to those automobiles on which the device was not installed.

Hydrocarbon pollutants were an inherent part of this study and will continue to be released into the air at no less than Preston’s estimated 3.8% per year until effective measures can be put in place to reduce the level of harmful emissions released from automobile exhaust systems.

This study was limited in that only 36 participants and their automobiles volunteered to participate. A broader scope of participants from a wider geographical area would be desired. Also, global implications can scarcely be discussed since the present trials were conducted entirely within the United States where the level of atmospheric hydrocarbons is at minimum 3 times greater than other industrialized nations (Jennings, 2004). Similar research in regions outside the U.S. may have achieved different results.

Future research should include more participants with a broader range of automobiles to include small, medium, and large privately owned trucks. Further, trials should be conducted in all of the various climatic regions in the country and at multiple elevations. Moreover, the researchers feel that a longer study would produce more accurate data instead of estimating yearly emission totals from one 30 day study. Lastly, each emissions control device was installed new on each of the 36 automobile in the study. Even if results had been significant, the research team would want to know if the emissions control device would continue to reduce the level of harmful pollutants into the air and for how long until it needed to be replaced. Perhaps further research can answer these important questions.

References

Jennings, W.R (2004). Filling our air with poisons: A case study of pollutants in

our air. Hydrocarbon Quarterly, 34, 165-184.

Preston, H.G. (2005). Hydrocarbons and the Air: The Poisoning of America.

New York: McGraw Hill.

Truman, J.C. (1997). Driving Habits: Integrity of the American Driver. Journal of

Automobile Engineering and Science, 15, 125-133.

blog10

Sheet1

Sheet2

Sheet3

_1253984606.unknown

_1253984843.unknown

Get help from top-rated tutors in any subject.