LEARNING OBJECTIVES
· Describe single-case experimental designs and discuss reasons to use this design.
· Describe the one-group posttest-only design.
· Describe the one-group pretest-posttest design and the associated threats to internal validity that may occur: history, maturation, testing, instrument decay, and regression toward the mean.
· Describe the nonequivalent control group design and nonequivalent control group pretest-posttest design, and discuss the advantages of having a control group.
· Distinguish between the interrupted time series design and control series design.
· Describe cross-sectional, longitudinal, and sequential research designs, including the advantages and disadvantages of each design.
· Define cohort effect.
Page 221
IN THE CLASSIC EXPERIMENTAL DESIGN DESCRIBED IN CHAPTER 8, PARTICIPANTS ARE RANDOMLY ASSIGNED TO THE INDEPENDENT VARIABLE CONDITIONS, AND A DEPENDENT VARIABLE IS MEASURED. The responses on the dependent measure are then compared to determine whether the independent variable had an effect. Because all other variables are held constant, differences on the dependent variable must be due to the effect of the independent variable. This design has high internal validity—we are very confident that the independent variable caused the observed responses on the dependent variable. You will frequently encounter this experimental design when you explore research in the behavioral sciences. However, other research designs have been devised to address special research problems.
This chapter focuses on three types of special research situations. The first is the instance in which the effect of an independent variable must be inferred from an experiment with only one participant—single-case experimental designs. Second, we will describe pre-experimental and quasi-experimental designs that may be considered if it is not possible to use one of the true experimental designs described in Chapter 8. Third, we consider research designs for studying changes that occur with age.
SINGLE-CASE EXPERIMENTAL DESIGNS
Single-case experimental designs have traditionally been called single-subject designs; an equivalent term you may see is small N designs. Much of the early interest in single-case designs in psychology came from research on operant conditioning pioneered by B. F. Skinner (e.g., Skinner, 1953). Today, research using single-case designs is often seen in applied behavior analysis in which operant conditioning techniques are used in clinical, counseling, educational, medical, and other applied settings (Kazdin, 2011, 2013).
Single-case experiments were developed from a need to determine whether an experimental manipulation had an effect on a single research participant. In a single-case design, the subject's behavior is measured over time during a baseline control period. The manipulation is then introduced during a treatment period, and the subject's behavior continues to be observed. A change in the subject's behavior from baseline to treatment periods is evidence for the effectiveness of the manipulation. The problem, however, is that there could be many explanations for the change other than the experimental treatment (i.e., alternative explanations). For example, some other event may have coincided with the introduction of the treatment. The single-case designs described in the following sections address this problem.
Reversal Designs
As noted, the basic issue in single-case experiments is how to determine that the manipulation of the independent variable had an effect. One method is Page 222to demonstrate the reversibility of the manipulation. A simple reversal design takes the following form:
This basic reversal design is called an ABA design; it requires observation of behavior during the baseline control (A) period, again during the treatment (B) period, and also during a second baseline (A) period after the experimental treatment has been removed. (Sometimes this is called a withdrawal design, in recognition of the fact that the treatment is removed or withdrawn.) For example, the effect of a reinforcement procedure on a child's academic performance could be assessed with an ABA design. The number of correct homework problems could be measured each day during the baseline. A reinforcement treatment procedure would then be introduced in which the child received stars for correct problems; the stars could be accumulated and exchanged for toys or candies. Later, this treatment would be discontinued during the second baseline (A) period. Hypothetical data from such an experiment are shown in Figure 11.1. The fact that behavior changed when the treatment was introduced and reversed when the treatment was withdrawn is evidence for its effectiveness.
Figure 11.1 depicts a treatment that had a relatively dramatic impact on behavior. Some treatments do produce an immediate change in behavior, but many other variables may require a longer time to show an impact.
The ABA design can be greatly improved by extending it to an ABAB design, in which the experimental treatment is introduced a second time, or even to an ABABAB design that allows the effect of the treatment to be tested a third time. This is done to address two problems with the ABA reversal design. First, a single reversal is not extremely powerful evidence for the effectiveness of the treatment. The observed reversal might have been due to a random fluctuation in the child's behavior; perhaps the treatment happened to coincide with some other event, such as the child's upcoming birthday, that caused the change (and the post-birthday reversal). These possibilities are much less likely if the treatment has been shown to have an effect two or more times; random or coincidental events are unlikely to be responsible for both reversals. The second problem is ethical. As Barlow, Nock, and Hersen (2009) point out, it does not seem right to end the design with the withdrawal of a treatment that may be very beneficial for the participant. Using an ABAB design provides the opportunity to observe a second reversal when the treatment is introduced again. The sequence ends with the treatment rather than the withdrawal of the treatment.
FIGURE 11.1
Hypothetical data from ABA reversal design
Page 223The logic of the reversal design can also be applied to behaviors observed in a single setting. For example, Kazbour and Bailey (2010) examined the effectiveness of a procedure designed to increase use of designated drivers in a bar. The percentage of bar patrons either serving as or being with a designated driver was recorded over a baseline period of 2 weeks. A procedure to increase the use of designated drivers was then implemented during the treatment phase. Designated drivers received a $5 gas card, and the driver and passengers received free pizza on their way out of the bar. The pizza and gas incentive was discontinued during the final phase of the study. The percentage of bar patrons engaged in designated driver arrangements increased substantially during the treatment phase but returned to baseline levels when the incentive was withdrawn.
Multiple Baseline Designs
It may have occurred to you that a reversal of some behaviors may be impossible or unethical. For example, it would be unethical to reverse treatment that reduces dangerous or illegal behaviors, such as indecent exposure or alcoholism, even if the possibility exists that a second introduction of the treatment might be effective. Other treatments might produce a long-lasting change in behavior that is not reversible. In such cases, multiple measures over time can be made before and after the manipulation. If the manipulation is effective, a change in behavior will be immediately observed, and the change will continue to be reflected in further measures of the behavior. In a multiple baseline design, the effectiveness of the treatment is demonstrated when a behavior changes only after the manipulation is introduced. To demonstrate the effectiveness of the treatment, such a change must be observed under multiple circumstances to rule out the possibility that other events were responsible.
There are several variations of the multiple baseline design (Barlow et al., 2009). In the multiple baseline across subjects, the behavior of several subjects is measured over time; for each subject, though, the manipulation is introduced at a different point in time. Figure 11.2 shows data from a hypothetical smoking reduction experiment with three subjects. Note that introduction of the manipulation was followed by a change in behavior for each subject. However, because this change occurred across all individuals and the manipulation was introduced at a different time for each subject, we can rule out explanations based on chance, historical events, and so on.
Page 224
FIGURE 11.2
Hypothetical data from multiple baseline design across three subjects (S1, S2, and S3)
In a multiple baseline across behaviors, several different behaviors of a single subject are measured over time. At different times, the same manipulation is applied to each of the behaviors. For example, a reward system could be instituted to increase the socializing, grooming, and reading behaviors of a psychiatric patient. The reward system would be applied to each of these behaviors at different times. Demonstrating that each behavior increased when the reward system was applied would be evidence for the effectiveness of the manipulation.
The third variation is the multiple baseline across situations, in which the same behavior is measured in different settings, such as at home and at work. Again, a manipulation is introduced at a different time in each setting, with the expectation that a change in the behavior in each situation will occur only after the manipulation.
Replications in Single-Case Designs
The procedures for use with a single subject can, of course, be replicated with other subjects, greatly enhancing the generalizability of the results. Usually, reports of research that employs single-case experimental procedures do present Page 225the results from several subjects (and often in several settings). The tradition in single-case research has been to present the results from each subject individually rather than as group data with overall means. Sidman (1960), a leading spokesperson for this tradition, has pointed out that grouping the data from a number of subjects by using group means can sometimes give a misleading picture of individual responses to the manipulation. For example, the manipulation may be effective in changing the behavior of some subjects but not others. This was true in a study conducted by Ryan and Hemmes (2005) that investigated the impact of rewarding college students with course grade points for submitting homework. For half of the 10 chapters, students received points for submitting homework; however, there were no points given if they submitted homework for the other chapters (to control for chapter topic, some students had points for odd-numbered chapters only and others received points for the even-numbered chapters). Ryan and Hemmes found that on average students submitted more homework assignments and performed better on chapter-based quizzes that were directly associated with point rewards. However, some individual participants performed about the same regardless of condition. Because the emphasis of the study was on the individual subject, this pattern of results was quickly revealed.
Single-case designs are useful for studying many research problems and should be considered a powerful alternative to more traditional research designs. They can be especially valuable for someone who is applying some change technique in a natural environment—for example, a teacher who is trying a new technique in the classroom. In addition, complex statistical analyses are not required for single-case designs.
QUASI-EXPERIMENTAL DESIGNS
Quasi-experimental designs address the need to study the effect of an independent variable in settings in which the control features of true experimental designs cannot be achieved. Thus, a quasi-experimental design allows us to examine the impact of an independent variable on a dependent variable, but causal inference is much more difficult because quasi-experiments lack important features of true experiments such as random assignment to conditions. In this chapter, we will examine several quasi-experimental designs that might be used in situations in which a true experiment is not possible. This is most likely to occur in applied settings when an independent variable is manipulated in a natural setting such as a school, business, hospital, or an entire city or state.
There are many types of quasi-experimental designs—see Campbell (1968, 1969), Campbell and Stanley (1966), Cook and Campbell (1979), Shadish, Cook, and Campbell (2002). Only six designs will be described. As you read about each design, compare the design features and problems with the randomized true experimental designs described in Chapter 8. We start out with the simplest and most problematic of the designs. In fact, the first three designs Page 226we describe are sometimes called “pre-experimental” to distinguish them from other quasi-experimental designs. This is because of the problems associated with these designs. Nevertheless, all may be used in different circumstances, and it is important to recognize the internal validity issues raised by each design.
One-Group Posttest-Only Design
Suppose you want to investigate whether sitting close to a stranger will cause the stranger to move away. You might try sitting next to a number of strangers and measure the number of seconds that elapse before they leave. Your design would look like this:
Now suppose that the average amount of time before the people leave is 9.6 seconds. Unfortunately, this finding is not interpretable. You do not know whether they would have stayed longer if you had not sat down or whether they would have stayed for 9.6 seconds anyway. It is even possible that they would have left sooner if you had not sat down—perhaps they liked you!
This one-group posttest-only design—called a “one-shot case study” by Campbell and Stanley (1966)—lacks a crucial element of a true experiment: a control or comparison group. There must be some sort of comparison condition to enable you to interpret your results. The one-group posttest-only design with its missing comparison group has serious deficiencies in the context of designing an internally valid experiment that will allow us to draw causal inferences about the effect of an independent variable on a dependent variable.
You might wonder whether this design is ever used. In fact, you may see this type of design used as evidence for the effectiveness of a program. For example, employees in a company might participate in a 4-hour information session on emergency procedures. At the conclusion of the program, they complete a knowledge test on which their average score is 90%. This result is then used to conclude that the program is successfully educating employees. Such studies lack internal validity—our ability to conclude that the independent variable had an effect on the dependent variable. With this design, we do not even know if the score on the dependent variable would have been equal, lower, or even higher without the program. The reason why results such as these are sometimes accepted is because we may have an implicit idea of how a control group would perform. Unfortunately, we need that comparison data.
One-Group Pretest-Posttest Design
One way to obtain a comparison is to measure participants before the manipulation (a pretest) and again afterward (a posttest). An index of change from Page 227the pretest to the posttest could then be computed. Although this one-group pretest-posttest design sounds fine, there are some major problems with it.
To illustrate, suppose you wanted to test the hypothesis that a relaxation training program will result in a reduction in cigarette smoking. Using the one-group pretest-posttest design, you would select a group of people who smoke, administer a measure of smoking, have them go through relaxation training, and then re-administer the smoking measure. Your design would look like this:
If you did find a reduction in smoking, you could not assume that the result was due to the relaxation training program. This design has failed to take into account several alternative explanations. These alternative explanations are threats to the internal validity of studies using this design and include history, maturation, testing, instrument decay, and regression toward the mean.
History History refers to any event that occurs between the first and second measurements but is not part of the manipulation. Any such event is confounded with the manipulation. For example, suppose that a famous person dies of lung cancer during the time between the first and second measures. This event, and not the relaxation training, could be responsible for a reduction in smoking. Admittedly, the celebrity death example is dramatic and perhaps unlikely. However, history effects can be caused by virtually any confounding event that occurs at the same time as the experimental manipulation.
Maturation People change over time. In a brief period they become bored, fatigued, perhaps wiser, and certainly hungrier; over a longer period, children become more coordinated and analytical. Any changes that occur systematically over time are called maturation effects. Maturation could be a problem in the smoking reduction example if people generally become more concerned about health as they get older. Any such time-related factor might result in a change from the pretest to the posttest. If this happens, you might mistakenly attribute the change to the treatment rather than to maturation.
Testing Testing becomes a problem if simply taking the pretest changes the participant's behavior—the problem of testing effects. For example, the smoking measure might require people to keep a diary in which they note every cigarette smoked during the day. Simply keeping track of smoking might be sufficient to cause a reduction in the number of cigarettes a person smokes. Thus, the reduction found on the posttest could be the result of taking the Page 228pretest rather than of the program itself. In other contexts, taking a pretest may sensitize people to the purpose of the experiment or make them more adept at a skill being tested. Again, the experiment would not have internal validity.
Instrument decay Sometimes, the basic characteristics of the measuring instrument change over time; this is called instrument decay. Consider sources of instrument decay when human observers are used to measure behavior: Over time, an observer may gain skill, become fatigued, or change the standards on which observations are based. In our example on smoking, participants might be highly motivated to record all cigarettes smoked during the pretest when the task is new and interesting, but by the time the posttest is given they may be tired of the task and sometimes forget to record a cigarette. Such instrument decay would lead to an apparent reduction in cigarette smoking.
Regression toward the mean Sometimes called statistical regression, regression toward the mean is likely to occur whenever participants are selected because they score extremely high or low on some variable. When they are tested again, their scores tend to change in the direction of the mean. Extremely high scores are likely to become lower (closer to the mean), and extremely low scores are likely to become higher (again, closer to the mean).
Regression toward the mean would be a problem in the smoking experiment if participants were selected because they were initially found to be extremely heavy smokers. By choosing people for the program who scored highest on the pretest, the researcher may have selected many participants who were, for whatever reason, smoking much more than usual at the particular time the measure was administered. Those people who were smoking much more than usual will likely be smoking less when their smoking is measured again. If we then compare the overall amount of smoking before and after the program, it will appear that people are smoking less. The alternative explanation is that the smoking reduction is due to statistical regression rather than the effect of the program.
Regression toward the mean will occur whenever you gather a set of extreme scores taken at one time and compare them with scores taken at another point in time. The problem is actually rooted in the reliability of the measure. Recall from Chapter 5 that any given measure reflects a true score plus measurement error. If there is perfect reliability, the two measures will be the same (if nothing happens to lower or raise the scores). If the measure of smoking is perfectly reliable, a person who reports smoking 20 cigarettes today will report smoking 20 cigarettes 2 weeks from now. However, if the two measures are not perfectly reliable and there is measurement error, most scores will be close to the true score but some will be higher and some will be lower. Thus, one smoker with a true score of 20 cigarettes per day might sometimes smoke 5 and sometimes 35; however, most of the time, the number is closer to 20 than the extremes. Another smoker might have a true score of 35 but on occasion smokes as few as 20 and as many as 50; again, most of the time, the number is Page 229closer to the true score than to the extremes. Now suppose that you select two people who said they smoked 35 cigarettes on the previous day, and that both of these people are included in the group—you picked the first person on a very unusual day and the second person on a very ordinary day. When you measure these people 2 weeks later, the first person is probably going to report smoking close to 20 cigarettes and the second person close to 35. If you average the two, it will appear that there is an overall reduction in smoking.
What if the measure were perfectly reliable? In this case, the person with a true score of 20 cigarettes would always report this amount and therefore would not be included in the heavy smoker (35+) group at all. Only people with true scores of 35 or more would be in the group, and any reduction in smoking would be due to the treatment program. The point here is that regression toward the mean is a problem if there is measurement error.
Statistical regression occurs when we try to explain events in the “real world” as well. Sports columnists often refer to the hex that awaits an athlete who appears on the cover of Sports Illustrated. The performances of a number of athletes have dropped considerably after they were the subjects of Sports Illustrated cover stories. Although these cover stories might cause the lower performance (perhaps the notoriety results in nervousness and reduced concentration), statistical regression is also a likely explanation. An athlete is selected for the cover of the magazine because he or she is performing at an exceptionally high level; the principle of regression toward the mean states that very high performance is likely to deteriorate. We would know this for sure if Sports Illustrated also did cover stories on athletes who were in a slump and this became a good omen for them!
All these problems can be eliminated by the use of an appropriate control group. A group that does not receive the experimental treatment provides an adequate control for the effects of history, statistical regression, and so on. For example, outside historical events would have the same effect on both the experimental and the control groups. If the experimental group differs from the control group on the dependent measure administered after the manipulation, the difference between the two groups can be attributed to the effect of the experimental manipulation.
Given these problems, is the one-group pretest-posttest design ever used? This design may in fact be used in many applied settings. Recall the example of the evaluation of a program to teach emergency procedures to employees. With a one group pretest-posttest design, the knowledge test would be given before and after the training session. The ability to observe a change from the pretest to the posttest does represent an improvement over the posttest-only design, even with the threats to internal validity that we identified. In addition, the ability to use data from this design can be enhanced if the study is replicated at other times with other participants. However, formation of a control group is always the best way to strengthen this design.
In any control group, the participants in the experimental condition and the control condition must be equivalent. If participants in the two groups Page 230differ before the manipulation, they will probably differ after the manipulation as well. The next design illustrates this problem.
Nonequivalent Control Group Design
The nonequivalent control group design employs a separate control group, but the participants in the two conditions—the experimental group and the control group—are not equivalent. In other words, the two groups are not the result of random assignment. The differences become a confounding variable that provides an alternative explanation for the results. This problem, called selection differences or selection bias, usually occurs when participants who form the two groups in the experiment are chosen from existing natural groups. If the relaxation training program is studied with the nonequivalent control group design, the design will look like this:
The participants in the first group are given the smoking frequency measure after completing the relaxation training. The people in the second group do not participate in any program. In this design, the researcher does not have any control over which participants are in each group. Suppose, for example, that the study is conducted in a division of a large company. All of the employees who smoke are identified and recruited to participate in the training program. The people who volunteer for the program are in the experimental group, and the people in the control group are simply the smokers who did not sign up for the training. The problem of selection differences arises because smokers who choose to participate may differ in some important way from those who do not. For instance, they may already be light smokers compared with the others and more confident that a program can help them. If so, any difference between the groups on the smoking measure would reflect preexisting differences rather than the effect of the relaxation training. Such a preexisting difference is what we have previously described as a confound (see Chapter 4).
It is important to note that the problem of selection differences arises in this design even when the researcher apparently has successfully manipulated the independent variable using two similar groups. For example, a researcher might have all smokers in the engineering division of a company participate in the relaxation training program and smokers who work in the marketing division serve as a control group. The problem here, of course, is that the smokers in the two divisions may have differed in smoking patterns prior to the relaxation program.
Page 231
Nonequivalent Control Group Pretest-Posttest Design
The nonequivalent control group posttest-only design can be greatly improved if a pretest is given. When this is done, we have a nonequivalent control group pretest-posttest design, one of the most useful quasi-experimental designs. It can be diagrammed as follows:
This design is similar to the pretest-posttest design described in Chapter 8. However, this is not a true experimental design because assignment to groups is not random; the two groups may not be equivalent. We have the advantage, however, of knowing the pretest scores. Thus, we can see whether the groups were the same on the pretest. Even if the groups are not equivalent, we can look at changes in scores from the pretest to the posttest. If the independent variable has an effect, the experimental group should show a greater change than the control group (see Kenny, 1979).
An evaluation of National Alcohol Screening Day (NASD) provides an example of the use of a nonequivalent control group pretest-posttest design (Aseltine, Schilling, James, Murray, & Jacobs, 2008). NASD is a community-based program that provides free access to alcohol screening, a private meeting with a health professional to review the results, educational materials, and referral information if necessary. For the evaluation, NASD attendees at five community locations completed a baseline (pretest) measure of their recent alcohol consumption. This measure was administered as a posttest 3 months later. A control group was formed 1 week following NASD at the same locations using displays that invited people to take part in a health survey. These individuals completed the same pretest measure and were contacted in 3 months for the posttest. The data analysis focused on participants identified as at-risk drinkers; the NASD participants showed a significant decrease in alcohol consumption from pretest to posttest when compared with similar individuals in the control group.
Propensity Score Matching of Nonequivalent Treatment and Control Groups
The nonequivalent control group designs lack random assignment to conditions and so the groups may in fact differ in important ways. For example, people who decide to attend an alcohol screening event may differ from those who Page 232are interested in a health screening. Perhaps the people at the health screening are in fact healthier than the alcohol screening participants.
One approach to making the groups equivalent on a variable such as health is to match participants in the conditions on a measure of health (this is similar to matched pairs designs, covered in Chapter 8). The health measure can be administered to everyone in the treatment condition and all individuals who are included in the control condition. Now, each person in the treatment condition would be matched with a control individual who possesses an identical or highly similar health score. Once this has been done, the analysis of the dependent measure can take place. This procedure is most effective when the measure used for the matching is highly reliable and the individuals in the two conditions are known to be very similar. Nonetheless, it is still possible that the two groups are different on other variables that were not measured.
Advances in statistical methods have made it possible to simultaneously match individuals on multiple variables. Instead of matching on just one variable such as health, the researcher can obtain measures of other variables thought to be important when comparing the groups. The scores on these variables are combined to produce what is called a propensity score (the statistical procedure is beyond the scope of the book). Individuals in the treatment and control groups can then be matched on propensity scores—this process is called propensity score matching (Guo & Fraser, 2010; Shadish, Cook, & Campbell, 2002).
Interrupted Time Series Design and Control Series Design
Campbell (1969) discusses at length the evaluation of one specific legal reform: the 1955 crackdown on speeding in Connecticut. Although seemingly an event in the distant past, the example is still a good illustration of an important methodological issue. The crackdown was instituted after a record high number of traffic fatalities occurred in 1955. The easiest way to evaluate this reform is to compare the number of traffic fatalities in 1955 (before the crackdown) with the number of fatalities in 1956 (after the crackdown). Indeed, the number of traffic deaths fell from 324 in 1955 to 284 in 1956. This single comparison is really a one-group pretest-posttest design with all of that design's problems of internal validity; there are many other reasons that traffic deaths might have declined. One alternative is to use an interrupted time series design that would examine the traffic fatality rates over an extended period of time, both before and after the reform was instituted. Figure 11.3 shows this information for the years 1951–1959. Campbell (1969) argues that the drop from 1955 to 1956 does not look particularly impressive, given the great fluctuations in previous years, but there is a steady downward trend in fatalities after the crackdown. Even here, however, Campbell sees a problem in interpretation. The drop could be due to statistical regression: Because 1955 was a record high year, the probability is that there would have been a drop anyway. Still, the data for the years extending before and after the crackdown allow for a less ambiguous interpretation than would be possible with data for only 1955 and 1956.
Page 233
FIGURE 11.3
Connecticut traffic fatalities, 1951–1959
One way to improve the interrupted time series design is to find some kind of control group—a control series design. In the Connecticut speeding crackdown, this was possible because other states had not instituted the reform. Figure 11.4 shows the same data on traffic fatalities in Connecticut plus the fatality figures of four comparable states during the same years. The fact that the fatality rates in the control states remained relatively constant while those in Connecticut consistently declined led Campbell to conclude that the crackdown did indeed have some effect.
FIGURE 11.4
Control series design comparing Connecticut traffic fatality rate (solid color line) with the fatality rate of four comparable states (dotted black line)
Page 234
DEVELOPMENTAL RESEARCH DESIGNS
Developmental psychologists often study the ways that individuals change as a function of age. A researcher might test a theory concerning changes in reasoning ability as children grow older, the age at which self-awareness develops in young children, or the global values people have as they move from adolescence through old age. In all cases, the major variable is age. Developmental researchers face an interesting choice in designing their studies because there are two general methods for studying individuals of different ages: the cross-sectional method and the longitudinal method. You will see that the cross-sectional method shares similarities with the independent groups design whereas the longitudinal method is similar to the repeated measures design. We will also examine a hybrid approach called the sequential method. The three approaches are illustrated in Figure 11.5.
Cross-Sectional Method
In a study using the cross-sectional method, persons of different ages are studied at only one point in time. Suppose you are interested in examining how the ability to learn a computer application changes as people grow older. Using the cross-sectional method, you might study people who are currently 20, 30, 40, and 50 years of age. The participants in your study would be given the same computer learning task, and you would compare the groups on their performance.
FIGURE 11.5
Three designs for developmental research
Page 235In a recent study by Tymula, Belmaker, Ruderman, and Levy (2013), subjects in four age groups (12–17; 21–25; 30–50; 65–90) completed the same financial decision-making task. The task involved choosing among options with varying levels of risk and reward that led to an expected financial outcome for each subject. Individuals in the oldest age group made the poorest financial decisions with more inconsistent decisions and lower financial outcomes.
Longitudinal Method
In the longitudinal method, the same group of people is observed at different points in time as they grow older. Perhaps the most famous longitudinal study is the Terman Life Cycle Study that was begun by Stanford psychologist Lewis Terman in 1921. Terman studied 1,528 California schoolchildren who had intelligence test scores of at least 135. The participants, who called themselves “Termites,” were initially measured on numerous aspects of their cognitive and social development in 1921 and 1922. Terman and his colleagues continued studying the Termites during their childhood and adolescence and throughout their adult lives (cf. Terman, 1925; Terman & Oden, 1947, 1959).
Terman's successors at Stanford continue to track the Termites until each one dies. The study has provided a rich description of the lives of highly intelligent individuals and disconfirmed many negative stereotypes of high intelligence—for example, the Termites were very well adjusted both socially and emotionally. The data have now been archived for use by other researchers such as Friedman and Martin (2011), who used the Terman data to study whether personality and other factors are related to health and longevity. To complete their investigations, Friedman and Martin obtained death certificates of Terman participants to have precise data on both how long they lived and the causes of death. One strong pattern that emerged was that the personality dimension of “conscientiousness” (being self-disciplined, organized) that was measured in childhood was related to longevity. Of interest is that changes in personality qualities also affected longevity. Participants who had become less conscientious as adults had a reduction in longevity; those who became more conscientious as adults experienced longer lives. Another interesting finding concerned interacting with pets. Questions about animals were asked when participants were in their sixties; contrary to common beliefs, having or playing with pets was not related to longevity.
A unique longitudinal study on aging and Alzheimer's disease called the Nun Study illustrates a different approach (Snowden, 1997). In 1991, all members of a particular religious order born prior to 1917 were asked to participate by providing access to their archived records as well as various annual medical and psychological measures taken over the course of the study. The Page 236sample consisted of 678 women with a mean age of 83. One fascinating finding from this study was based on autobiographies that all sisters wrote in 1930 (Danner, Snowden, & Friesen, 2001). The researchers devised a coding system to measure positive emotional content in the autobiographies. Greater positive emotions were strongly related to actual survival rate during the course of the study. Other longitudinal studies may study individuals over only a few years. For example, a 9-year study of U.S. children found a variety of impacts—positive and negative—of early non-maternal child care (NICHD Early Child Care Research Network, 2005).
Comparison of Longitudinal and Cross-Sectional Methods
The cross-sectional method is much more common than the longitudinal method primarily because it is less expensive and immediately yields results. Note that, with a longitudinal design, it would take 30 years to study the same group of individuals from age 20 to 50, but with a cross-sectional design, comparisons of different age groups can be obtained relatively quickly.
There are, however, some disadvantages to cross-sectional designs. Most important, the researcher must infer that differences among age groups are due to the developmental variable of age. The developmental change is not observed directly among the same group of people, but rather is based on comparisons among different cohorts of individuals. You can think of a cohort as a group of people born at about the same time, exposed to the same events in a society, and influenced by the same demographic trends such as divorce rates and family size. If you think about the hairstyles of people you know who are in their 30s, 40s, 50s, and 60s, you will immediately recognize the importance of cohort effects! More crucially, differences among cohorts reflect different economic and political conditions in society, different music and arts, different educational systems, and different child-rearing practices. In a cross-sectional study, a difference among groups of different ages may reflect developmental age changes; however, the differences may result from cohort effects (Schaie, 1986).
To illustrate this issue, let's return to our hypothetical study on learning to use computers. Suppose you found that age is associated with a decrease in ability such that the people in the 50-year-old group score lower on the learning measure than the 40-year-olds, and so on. Should you conclude that the ability to learn to use a computer application decreases with age? That may be an accurate conclusion; alternatively, the differences could be due to a cohort effect: The older people had less experience with computers while growing up. The key point here is that the cross-sectional method confounds age and cohort effects. (Review the discussion of confounding and internal validity at the beginning of Chapter 8.) Finally, you should note that cohort effects are most likely to be a problem when the researcher is examining age effects across a wide range of ages (e.g., adolescents through older adults).
The only way to conclusively study changes that occur as people grow older is to use a longitudinal design. Also, longitudinal research is the best way Page 237to study how scores on a variable at one age are related to another variable at a later age. For example, researchers at the National Children's Study (http://www.nationalchildrensstudy.gov) began collecting data in 2009 at 105 study locations across the United States. In each of those study sites, participants (new parents) are being recruited to participate in the study that will run from the birth of their child until the child is 21 years of age. The goal of the study is to better understand the interactions of the environment and genetics and their effects on child health and well-being. The alternative in this case would be to study samples of children of various ages and ask them or their parents about the earlier home environment; this retrospective approach has its own problems when one considers the difficulty of remembering events in the distant past.
Thus, the longitudinal approach, despite being expensive and difficult, has definite advantages. However, there is one major problem: Over the course of a longitudinal study, people may move, die, or lose interest in the study. Researchers who conduct longitudinal studies become adept at convincing people to continue, often travel anywhere to collect more data, and compare test scores of people who drop out with those who stay to provide better analyses of their results. In sum, a researcher should not embark on a longitudinal study without considerable resources and a great deal of patience and energy!
Sequential Method
A compromise between the longitudinal and cross-sectional methods is to use the sequential method. This method, along with the cross-sectional and longitudinal methods, is illustrated in Figure 11.5. In the figure, the goal of the study is to minimally compare 55- and 65-year-olds. The first phase of the sequential method begins with the cross-sectional method; for example, you could study groups of 55- and 65-year-olds. These individuals are then studied using the longitudinal method with each individual tested at least one more time.
Orth, Trzesniewski, and Robins (2010) studied the development of self-esteem over time using just such a sequential method. Using data from the Americans’ Changing Lives study, Orth and his colleagues identified six different age cohorts (25–34, 35–44, 45–54, 55–64, 65–74, 75+) and examined their self-esteem ratings from 1986, 1989, 1994, and 2002. Thus, they were interested in changes in self-esteem for participants at various ages, over time. Their findings provide an interesting picture of how self-esteem changes over time: They found that self-esteem gradually increases from age 25 to around age 60 and then declines in later years. If this were conducted as a full longitudinal study, it would require 100 years to complete!
Clearly, this method takes fewer years and less effort to complete than a longitudinal study, and the researcher reaps immediate rewards because data on the different age groups are available in the first year of the study. On the other hand, the participants are not followed over the entire time span as they would be in a full longitudinal investigation; that is, no one in the Orth study was followed from age 25 to 100.
Page 238We have now described most of the major approaches to designing research. In the next two chapters, we consider methods of analyzing research data.
ILLUSTRATIVE ARTICLE: A QUASI-EXPERIMENT
Sexual violence on college and university campuses has been and continues to be a widespread problem. Programs designed to prevent sexual violence on campuses have shown mixed results: Some evidence suggests that they can be effective, but other evidence shows that they are not.
Banyard, Moynihan, and Crossman (2009) implemented a prevention program that utilized specific sub groups of campus communities to “raise awareness about the problem of sexual violence and build skill that individuals can use to end it.” They exposed dormitory resident advisors to a program called “Bringing in the Bystander” and assessed change in attitudes as well as a set of six outcome measures (e.g., willingness to help).
First, acquire and read the article:
Banyard, V. L., Moynihan, M. M., & Crossman, M. T. (2009). Reducing sexual violence on campus: The role of student leaders as empowered bystanders. Journal of College Student Development, 50, 446–457. doi:10.1353/csd.0.0083
Then, after reading the article, consider the following:
1. This study was a quasi-experiment. What is the specific design?
2. What are the potential weaknesses of the design?
3. The discussion of this article begins with this statement: “The results of this study are promising.” Do you agree or disagree? Support your position.
4. How would you determine if there is a need to address the problem of sexual violence on your campus? If you discover that there is a need, would the program described here be appropriate? Why or why not?
Study Terms
Baseline (p. 221)
Cohort (p. 236)
Cohort effects (p. 236)
Control series design (p. 233)
Cross-sectional method (p. 234)
History effects (p. 227)
Instrument decay (p. 228)
Interrupted time series design (p. 232)
Longitudinal method (p. 235)
Maturation effects (p. 227)
Page 239Multiple baseline design (p. 223)
Nonequivalent control group design (p. 230)
Nonequivalent control group pretest-posttest design (p. 231)
One-group posttest-only design (p. 226)
One-group pretest-posttest design (p. 227)
Propensity score matching (p. 232)
Quasi-experimental design (p. 225)
Regression toward the mean (Statistical regression) (p. 228)
Reversal design (p. 222)
Selection differences (p. 230)
Sequential method (p. 237)
Single-case experimental design (p. 221)
Testing effects (p. 227)
Review Questions
1. What is a reversal design? Why is an ABAB design superior to an ABA design?
2. What is meant by baseline in a single-case design?
3. What is a multiple baseline design? Why is it used? Distinguish between multiple baseline designs across subjects, across behaviors, and across situations.
4. Why might a researcher use a quasi-experimental design rather than a true experimental design?
5. Why does having a control group eliminate the problems associated with the one-group pretest-posttest design?
6. Describe the threats to internal validity discussed in the text: history, maturation, testing, instrument decay, regression toward the mean, and selection differences.
7. Describe the nonequivalent control group pretest-posttest design. Why is this a quasi-experimental design rather than a true experiment?
8. Describe the interrupted time series and the control series designs. What are the strengths of the control series design as compared with the interrupted time series design?
9. Distinguish between longitudinal, cross-sectional, and sequential methods.
10. What is a cohort effect?
Activities
1. Your dog gets lonely while you are at work and consequently engages in destructive activities such as pulling down curtains or strewing Page 240wastebasket contents all over the floor. You decide that playing a radio while you are gone might help. How might you determine whether this “treatment” is effective?
2. Your best friend frequently suffers from severe headaches. You have noticed that your friend consumes a great deal of diet cola, and so you consider the hypothesis that the artificial sweetener in the cola is responsible for the headaches. Devise a way to test your hypothesis using a single-case design. What do you expect to find if your hypothesis is correct? If you obtain the expected results, what do you conclude about the effect of the artificial sweetener on headaches?
3. Dr. Smith learned that one sorority on campus had purchased several MacBooks and another sorority had purchased several Windows-based computers. Dr. Smith was interested in whether the type of computer affects the quality of students’ papers, so he went to each of the sorority houses to collect samples of papers from the members. Two graduate students in the English department then rated the quality of the papers. Dr. Smith found that the quality of the papers was higher in one sorority than in the other. What are the independent and dependent variables in this study? Identify the type of design that Dr. Smith used. What variables are confounded with the independent variable? Design a true experiment that would address Dr. Smith's original question.
4. Gilovich (1991) described an incident that he read about during a visit to Israel. A very large number of deaths had occurred during a brief time period in one region of the country. A group of rabbis attributed the deaths to a recent change in religious practice that allowed women to attend funerals. Women were immediately forbidden to attend funerals in that region, and the number of deaths subsequently decreased. How would you explain this phenomenon?
5. The captain of each precinct of a metropolitan police department selected two officers to participate in a program designed to reduce prejudice by increasing sensitivity to racial and ethnic group differences and community issues. The training program took place every Friday morning for 3 months. At the first and last meetings, the officers completed a measure of prejudice. To assess the effectiveness of the program, the average prejudice score at the first meeting was compared with the average score at the last meeting; it was found that the average score was in fact lower following the training program. What type of design is this? What specific problems arise if you try to conclude that the training program was responsible for the reduction in prejudice?
6. Many elementary schools have implemented a daily “sustained silent reading” period during which students, faculty, and staff spend 15–20 minutes silently reading a book of their choice. Advocates of this policy claim that the activity encourages pleasure reading outside the required Page 241silent reading time. Design a nonequivalent control group pretest-posttest quasi-experiment to test this claim. Include a well-reasoned dependent measure as well.
7. For the preceding situation, discuss the advantages and disadvantages of using a quasi-experimental design in contrast to conducting a true experiment.
8. Dr. Cardenas studied political attitudes among different groups of 20-, 40-, and 60-year-olds. Political attitudes were found to be most conservative in the age-60 group and least conservative in the age-20 group.
a. What type of method was used in this study?
b. Can you conclude that people become more politically conservative as they get older? Why or why not?
c. Propose alternative ways of studying this topic.
11 Chapter |
True Experimental Designs: The Power of Between-Groups and Within-Subjects Designs |
Many research questions in counseling relate to the very basic question of whether what a counselor is doing is effective: Is counseling really helping people with some aspect of their lives? Whether we work in independent practice, in counseling centers, in school settings, or in academic jobs, we are a profession that helps people in a variety of ways, and we want to know if what we are doing is really having a positive impact. Whether it is a psychoeducational group for teens with eating disorders, a high school classroom intervention aimed at bringing awareness to the discrimination faced by minority group members, or a specific treatment we are using with an individual client, the most basic question we want to answer is: Is the intervention effective? Some of the most rigorous designs we have at our disposal to address such questions are what are called between-groups and within-subjects designs, both of which are often referred to as true experimental designs.
In Chapters 5 and 6 we identify the goal of research as isolating relationships among constructs of interest and operationalizing constructs into the independent and dependent variables while simultaneously eliminating sources of bias, contamination, and error. Perhaps the most essential rules of research are expressed by Kerlinger's MAXMINCON principle, in which researchers try to maximize the systematic variance of the variables under study, minimize error variance, and control extraneous variables. Extraneous variables and error variance can mask or obscure the effects of the independent variable on the dependent variable.
In this chapter we discuss two designs—between-groups and within-subjects—that are often referred to as true experimental designs because of their emphasis on experimental control, minimizing extraneous variables, and internal validity. These emphases are achieved through random assignment and manipulating the independent variable, two core elements that define true experimental designs. Even though students sometimes feel intimidated about true experimental designs because of the heavy, ominous meaning that the words sometimes convey, the designs are actually quite straightforward; the label is more ominous than the actual design. True experimental designs are commonly categorized into between-groups design and within-subjects design.
The between-groups design often adheres to the MAXMINCON principle. Differences between treatments can be maximized by making the treatment (independent variable) stronger or even exaggerated. Thus, researchers will often examine the effects of extreme treatments, such as five counselor disclosures in 50 minutes, or three counselor influence attempts in 15 minutes. Moreover, the between-groups design can be arranged to control extraneous variables and minimize error variance through random assignment of treatment condition and manipulating the independent variable, while controlling for other factors.
The essential feature of between-groups design is the comparison of variables across two or more groups under tightly controlled experimental conditions. In early counseling research, a common comparison group was some type of control group, a group that did not receive one of the active treatments in the study. Over time, differences between or among experimental treatments have been compared. To adequately make comparisons across groups necessitates that the groups do not differ in important ways before the experiment. Thus, initial differences between groups in terms of individual difference variables, demographics, and situational variables must be minimized prior to experimental manipulations to reduce threats to internal validity. Because of the emphasis on comparison and equivalent groups, assignment of participants to groups is a critical consideration in between-groups design. In fact, one of the major identifying features of between-groups design is the random assignment of participants to different treatment conditions. In short, the between-groups design is a powerful investigative tool, and often the most strongly favored design (Kazdin, 2003; Kerlinger, 1986; Shadish et al., 2002).
The hallmark of the within-subjects design is that it attempts to minimize error variance due to individual variation by having each participant serve as his or her own control because all participants are exposed to all of the treatment conditions. This design is another type of true experimental design because of the random assignment of treatment order. The random assignment that occurs in the within-subjects design is the assignment of the order in which the treatments are delivered, as opposed to the different treatment conditions in between-groups designs.
For example, perhaps a researcher has two videos that may be useful for increasing participants' empathy related to issues of poverty and social injustice. In a within-subjects design, all participants would view both videos, but not at the same time. For example, one group of participants would receive intervention Tx1 (video 1) before Tx2 (video 2), whereas the other group would receive the opposite sequence, Tx2 before Tx1. Conversely, using a between-groups design, one group of participants would receive intervention (video 1) only, whereas the other group of participants would only receive intervention Tx2(video 2). In both designs, each participant is assigned to either sequence (within-subjects) or intervention condition (between-groups) randomly, as a matter of chance. Hence, the comparison in a within-subjects design is between different time periods in which separate intervention conditions are in effect, whereas the comparison in a between-groups design is between the groups that received different intervention conditions.
PARTICIPANT ASSIGNMENT
Because a hallmark of true experiments is the random assignments of participants, we will discuss this issue prior to introducing the different types of between-groups and within-subject designs. It is critical that the people in the groups that are compared do not differ in important ways before the experiment or measurement begins. The intended outcome of assigning people to groups is to eliminate systematic differences across groups before the experiment, so that if any changes are detected in one or more of the groups after the experiment, the change can be attributed to the independent variable. Participants therefore need to be assigned to groups in an unbiased fashion and free from extraneous variables.
The most effective way of ensuring comparable groups is to assign participants to groups randomly, or in such a way that each participant has the same probability of being assigned to each group. Such assignment tends to equalize both known and unknown sources of participant variation across groups, so that extraneous variables will not bias the study.
A number of procedures exist for randomly assigning participants to groups. This can be done by generalizing random numbers through Microsoft Excel (using the RANDBETWEEN function) or online programs (e.g., www.random.org, www.psychicscience.org/randomlist.aspx) to determine the order of assigning participants to groups. Note that random assignment would most likely result in unequal numbers of participants in each of the groups. For statistical purposes it is better to have equal numbers across groups. To deal with this issue, Kazdin (2003) suggested assigning participants in blocks based on the number of groups. For example, with a study of three groups, within each block of three participants, the experimenter would simply randomly assign one participant to each of the three groups. This procedure is particularly useful when participants begin the experiment periodically, or at different times.
In counseling research, a researcher will often have a sample identified and available at the beginning of an investigation. For example, a researcher might have 20 people who are available and have expressed an interest in some kind of treatment group, such as assertiveness training or group therapy. In this situation, the investigator knows the total number of participants, their names, and their general characteristics such as age and sex. Underwood (1966) has labeled this type of participant pool as captive. In this situation, random assignment is easily accomplished at one time via a list of random numbers, or even by drawing names from a hat. Quite often in counseling research, however, we do not have the entire sample at the outset, but rather must engage in sequential assignment (Underwood, 1966). For example, imagine that a researcher is investigating the effect of two types of precounseling information on client expectations of therapy. Most counseling centers have only a few clients beginning therapy each day, which would necessitate randomly assigning clients to the two types of precounseling information each day. In this case, clients can be assigned to either treatment as they enter counseling via some sort of randomization process.
BETWEEN-GROUPS DESIGNS
In this section we first discuss the strengths and weaknesses of two specific between-groups designs. Because the central focus of between-groups designs is to compare between different treatment groups and/or with control groups (please note that having a control group is not required), the second section explicitly discusses issues pertaining to control groups. The third section discusses more complex designs that contain two or more independent variables, which are called factorial designs. In the last section we discuss related issues of matching and dependent samples designs.
Two Common Experimental Between-Groups Designs
We now discuss the two most commonly identified experimental between-groups designs. To do so, we use the following symbols to represent various processes in the research design. Ob indicates an “observation” or point where data are collected as a dependent variable; and Tx indicates the exposure of a group to an experimental variable, often a treatment intervention of some kind. The purpose of Ob, in essence, is to measure the effects of Tx. The first subscripts following Ob and Tx indicate the sequence of occurrence: Ob1 is the first observation, Ob2 is the second, and so on. And the second subscripts following Ob and Tx indicate the assigned condition: Ob1a is the first observation of the treatment A group, Ob2ctrl is the second observation of the control group.
After describing each of these two designs, we then discuss advantages and disadvantages of each, referring particularly to validity issues (see Table 11.1). It is important to note that these two designs are most easily conceptualized by using one independent variable. For example, the independent variable may represent two treatment conditions, or contain two levels—treatment and no treatment (that is, control group).
TABLE 11.1Pros and Cons of Different Between-Groups Designs
Design |
Pros |
Cons |
Overall between-groups designs |
•Internal validity due to controlling for various threats (e.g., history, maturation, instrumentation, testing effects) |
•Limited generalizability to other populations or nonexperimental settings •Withholding study treatment for control or waitlist group |
Posttest only |
•Cost-efficient due to no pretest •Eliminates pretest sensitization |
•Limited information on group equivalence prior to treatment |
Pre-post design |
•Pre-scores can be controlled for •Pre-scores to select/remove cases •Pre-scores to describe participants •Pre-post scores to examine individual performances |
•Sensitization due to pretest |
FIGURE 11.1Posttest-Only Control Group Design
Posttest-Only Control Group Design Notationally, the posttest-only control group design is conceptualized as shown in Figure 11.1. In its most basic form, this design involves the random assignment of participants to two groups; one of the groups receives exposure to a treatment while the other group serves as a control group and thus receives no treatment. Both groups receive a posttest, but neither group receives a pretest. The basic purpose of the design is to test the effect of Tx, the independent variable, on observations of the dependent variable, vis-à-vis Ob2a and Ob2ctrl.
Strengths Although the posttest-only control group design is the most basic form of between-groups designs, it controls for most of the threats to internal validity. And the strengths of internal validity in which we describe as follows also apply to other between-groups designs. For example, history would have affected each group equally because Ob2a and Ob2ctrl occurred at the same time. Likewise, maturation, instrumentation, testing effects, and regression are controlled in that they are expected to be equally manifested in both the experimental and control groups. For example, if extreme scores were used, the control group would be expected to regress as much as the experimental group.
In many ways the posttest-only control group design is the prototypical experimental design and most closely reflects the characteristics needed to attribute a causal relationship from the independent variable to the dependent variable (Shadish et al., 2002). The difference between Ob2a and Ob2ctrl reflects the degree to which treated participants are different from untreated participants at the end of the treatment period. Of course, the observed difference needs to be statistically significant (have statistical conclusion validity) to justify a claim that the treatment indeed is effective (also see Chapter 7 for more discussion on validity issues).
In spite of the simplicity of the posttest-only design, there are some concerns regarding it. The primary concern is that because the dependent variable is examined only at the end of treatment, statements about actual change cannot be made; put another way, there is no evidence to show that the treatment group improved vis-à-vis their level of functioning prior to treatment. However, in our view the level of functioning of treated individuals (at Ob2a) versus their level of functioning had they not been treated (Ob2ctrl) is the most important comparison, not the change from before treatment to after treatment because change may be due to other factors (e.g., depressed individuals generally becoming less depressed because it is a cyclical disorder). The logic of experimentation does not require that pretreatment levels of functioning be assessed; thus, a pretest is not used.
One of the strengths of the posttest-only control group design, therefore, is that a pretest is unnecessary. Practically speaking, sometimes the repeated testing of participants is bothersome to the participants and expensive to the researcher in terms of time and effort. Furthermore, the absence of pretests removes the need to collect both the pretest and posttest scores, and hence it may be easier to have participants respond anonymously, thereby protecting the confidentiality of responses. Another advantage of the posttest-only control group design is that it eliminates pretest sensitization (which is discussed more fully as a disadvantage to the pretest-posttest control group design).
Weaknesses The absence of a pretest in this design limits the information available to researchers, such as being able to check if group equivalence prior to the treatment was in fact established through random assignment or to know participants' level of functioning prior to treatment. More detailed arguments for using pretests are presented in the discussion of the pretest-posttest control group design.
Although the posttest-only control group design is generally considered an internally valid experimental design, like the other between-groups designs, there are issues pertaining to external validity, namely the interaction of selection and treatment (Shadish et al., 2002). From an internal validity perspective, selection of participants is not a threat because participants are randomly assigned across groups. However, from an external validity perspective, the generalizability of the results of the study to another population is unknown, like any other experimental design. For example, it is possible that a treatment (e.g., a career-planning workshop) is effective but only for the particular sample (e.g., returning adults who have a broader set of work experiences). Another threat to external validity pertains to reactivity to the experimental situation. That is, participants may react differently, perhaps in biased or socially desirable ways, because they are in an experiment, which again threatens the generalizability of the findings. Because counseling is an applied field, we are especially concerned with external validity, and these and other threats to external validity merit serious consideration (also refer to Chapter 7 and Chapter 8 for more discussion on external validity issues).
Finally, a practical issue pertaining to between-groups design is that of timing. To adequately control for history effects, the investigator must conduct the experimental and control sessions simultaneously. Sometimes this requirement places excessive time and energy constraints on the experimenter. Nonetheless, history effects may not be controlled for if the experimenter conducts the two sessions, say, one month apart. The greater the time differential between group administrations, the greater the likelihood of confounding history effects. For a more detailed discussion and an example of history effects, please refer to Chapter 7.
An Example A study aimed at understanding the effects of music therapy on self and experienced stigma in psychiatric patients used a posttest-only design. Silverman (2013) was interested in examining whether music therapy could reduce psychiatric patients' level of stigma. Silverman used a between-groups posttest-only design to compare the effectiveness of three conditions in reducing stigma: (a) music therapy, (b) education, and (c) wait-list control group. Participants (N = 83) were randomly assigned through clusters to one of the three conditions. Results indicated that participants in the music therapy group reported lower posttest scores on measures of disclosure (self-stigma), discrimination (experienced stigma), and overall stigma (composite score) compared with the waitlist control group. However, the education group's stigma scores did not significantly differ from the music therapy group or the waitlist control group. In sum, music therapy was found to be an effective approach in reducing the stigma psychiatric patients had in the forms of self-disclosure and perceived discrimination from others. The study is an example of utilizing a posttest-only design to examine the effects of certain interventions.
Pretest-Posttest Control Group Design Notationally, the pretest-posttest control group design is conceptualized as shown in Figure 11.2. This design involves the random assignment of participants to two (or more) groups, with one group receiving treatment while the other group receives no treatment and thus serves as a control group. Both groups receive a pretest and a posttest. The purpose of the design is to test the effect of the independent variable, Tx, which is reflected in the differences on the dependent variable, specifically between Ob2a and Ob2ctrl.
Strengths This design controls for most of the threats to internal validity discussed by Shadish et al. (2002), and in that way it is similar to the posttest-only control group design. The unique strength of this design pertains to the use of the pretest, which allows the researcher to perform various analyses that may be helpful in making valid inferences about the effects of the independent variable.
One of the most important reasons for giving a pretest is that pretest scores can be used to reduce variability in the dependent variable, thereby creating a more powerful statistical test. In essence, such a strategy attempts to minimize error variance in line with the MAXMINCON principle. Much of the variance in any dependent variable is due to individual differences among the participants. Knowledge of the pretest level of functioning allows the researcher to use statistical methods, such as the analysis of covariance, to remove the variance found in the pretest from the variance in the posttests. Such procedures can reduce drastically the number of participants needed to achieve a desired level of statistical power (Porter & Raudenbush, 1987). Of course, the pretest in this case need not be the same measure as the posttest; however, it must be correlated with the posttest to allow a covariance analysis.
FIGURE 11.2Pretest-Posttest Control Group Design
Another important reason to give a pretest is that it can be used to help eliminate post hoc threats to internal validity. In this regard, one strategic use of pretests is to compare participants who terminate or drop out to those participants who remain. If more participants terminate from the treatment group than from the control group, then differential attrition is a particularly troublesome threat; however, if pretest scores indicate that those participants who terminated did not differ significantly from those who remained, then concern about differential attrition is reduced.
Pretests can also be used to select or deselect participants. For example, in a study on depression, the researchers may wish to select only those participants who are in the moderately depressed range. For example, if participants report very few symptoms of depression, these participants may not exhibit any change on the dependent variable even though the treatment would have been effective with moderately or even severely depressed participants.
Pretest scores can also be used to describe the participants of a study. For example, it would be important to describe the level of anxiety of undergraduate participants in a study of test anxiety to determine whether the participants were representative of clients who were really affected by test anxiety.
Finally, the pretest-posttest scores allow the researcher to examine the individual performance of specific participants. Kazdin (2003) suggested that in this way, researchers might examine participants who benefited the most versus those who benefited the least from the treatment intervention. Identifying participants in such a fashion, combined with any relevant anecdotal information, may suggest hypotheses for future research. In short, the pretest provides additional information to researchers, and perhaps some clues for future research directions.
Two often-stated advantages of pretests are controversial. The first pertains to comparing posttest scores to pretest scores to determine the degree to which the treatment was beneficial. The problem with making inferences from pretest measures to posttest measures is that there are too many rival hypotheses to infer the degree to which treatment was effective by comparing pretest scores to posttest scores. For this reason, “gain scores” (differences from pretest to posttest) are typically not recommended for statistical analyses. Instead, it is better for researchers to restrict themselves to making inferences only about differences at the posttest, because fewer threats are involved. Parenthetically, statisticians typically recommend using the pretest as a covariate in analyzing the posttest scores (see Huck & McLean, 1975). These techniques adjust or reduce error variance across individuals.
There is a second controversial use of pretest scores. Recall that random assignment was a means of distributing individual differences randomly across the two groups to remove any systematic bias due to selection or assignment. But the groups will not be exactly the same in all aspects; random error, if you will, will often result in some differences between groups. Often there is a tendency to check whether random assignment succeeded—that is, to see whether the groups were indeed comparable. To do so, a researcher might examine as a preliminary analysis the pretest scores to ascertain whether there are significant differences between the groups before treatment. However appealing this process is, there are some complex issues that make these comparisons far from straightforward (Wampold & Drew, 1990).
First, how big a difference is necessary to decide whether random assignment failed? For small sample sizes, statistically significant differences between two groups' pretest scores are unlikely to be obtained, but in large samples, it is much more likely that relatively small differences between samples will be statistically significant. Second, pretest scores represent only possible differences on the particular characteristics measured; what about differences in age, gender, intelligence, education, and a host of other variables that were not examined? Third, if a very large number of factors are compared before treatment, by chance some differences will be found.
In short, it is important to note that however appealing it is to check the effectiveness of random assignment in eliminating differences between the groups before the independent variable was introduced, there are a number of complexities that make it difficult to conclude with absolute certainty that the two groups are “equal.” Suffice it to say that there are some problems and controversy with this procedure.
Parenthetically, if one wants to ensure that a nuisance factor is evenly distributed across the groups, another alternative is to use a matching procedure. For instance, to equate groups based on intelligence, participants in the treatment and control groups could be matched on intelligence. This process, and its advantages and disadvantages, are discussed in the section on dependent samples designs.
Weaknesses It is ironic that the unique strength of the pretest-posttest control group design, namely the pretest, is also the main weakness. It is often assumed that pretesting will not sensitize participants to a particular treatment. In the two-group pretest-posttest control group design, the effect of repeatedly administering a test to the treatment group (Ob1a to Ob2a) is the same for the control group (Ob1ctrl to Ob2ctrl). Therefore, the effect of repeated testing is not a threat to internal validity.
However, the pretest may have a potential sensitizing effect pertaining to external validity, and thus generalizing the results from the study to other samples. It is unclear whether any changes found at posttest might be due to the groups being sensitized by the pretest; that is, it is unclear if the same effect of Tx on Ob2a would be found again without the sensitizing effect of Ob1a. For example, a pretest questionnaire on attitudes about rape might cue participants not only to reflect on this topic, but also to process information differently in the ensuing treatment, say, an awareness-enhancing workshop about date rape. Although the treatment may or may not have an effect by itself, the interactive effects of the pretest may result in substantially greater changes at posttest. A real problem could result if practitioners implemented the workshop but without the pretest, and thus the treatment had a much weaker treatment effect than they thought they would have. When researchers use the pretest-posttest control group design, they need to be cautious in generalizing the results of the study, and they must discuss this sensitization issue explicitly.
An Example Shechtman and Pastor (2005) used a between-groups pretest-posttest design to assess the effectiveness of two types of group treatment offered to 200 elementary school children in a center for children with learning disabilities in Israel. More specifically, the authors were interested in whether students would evidence better academic and psychosocial outcomes in either cognitive-behavioral treatment groups or humanistic therapy groups as compared to individual academic assistance alone. Their results suggested that either form of group therapy rather than individual academic assistance resulted in more academic (reading and math), psychological adjustment, and social adjustment gains. Moreover, they also found that the humanistic therapy group resulted in better outcomes than the cognitive behavioral therapy group. The gains were found on all measures pretest to posttest, and most of the differences were also found at a three-month follow-up. The authors interpreted their findings as suggesting that addressing children's general concerns and emotions in and of themselves without focusing on their academic failure may be constructive (Shechtman & Pastor, 2005).
To examine other examples of the pretest-posttest control group design, see Cheng, Tsui, and Lam (2015), who also employed this design to assess the effectiveness of a gratitude intervention for reducing stress of health care practitioners in Hong Kong. Participants were assigned to groups that (a) wrote work-related gratitude diaries, (b) wrote work-related hassle diaries, and (c) a control group. The results suggested that those in the gratitude intervention showed a significant larger decrease in stress and depressive symptoms over time compared to those in the other two groups. Likewise, Lemberger and Clemens (2012) examined the effects of a small group counseling intervention (versus a control group) with inner-city African American elementary school children; the results indicated that participants who received the intervention resulted in significant higher metacognitive skills and feelings of connectedness to school.
RESEARCH IN ACTION 11.1
Identify a research topic of your interest. Propose how you would examine your research question utilizing the two between-groups designs: (a) posttest-only control group design and (b) pretest-posttest control group design. Discuss the conceptual and practical advantages and disadvantages of using each design. Conclude with the most appropriate design for your study.
Use of Control Groups
To this point, the designs discussed have included a control group. The purpose of this arrangement is to compare treated participants with nontreated participants. In this way, the effect of the treatment vis-à-vis no treatment can be determined. However, there are some cases where the use of control groups is not warranted. For instance, it is unethical to withhold treatment from participants who are in need of treatment and who have a condition for which a treatment is known to work. For example, it would be unethical to have a control group of suicidal clients in a study of a new crisis-intervention technique. Furthermore, the research question may not refer to the absence of a treatment. For a study comparing the relative effectiveness of two different types of treatment approaches, a control group is not needed; inclusion of a control group, however, would answer the additional question of whether either of these two treatments is more effective than no treatment.
Although some research questions do not call for control groups, the logic of much research dictates the use of a control group. Control group refers generically to a class of groups that do not receive any interventions that are designed to address the outcome in the study. For example, if an intervention designed for alcohol abuse is to receive personalized feedback about one's level of alcohol use (percentile compared to a norm group), a control group can either receive no additional information or unrelated information (e.g., percentile of height, weight). It should be realized that even though this implies that the researchers do not provide any intervention, participants in such groups could seek alternative interventions or information elsewhere (e.g., internet).
Often it is practically and ethically difficult to have a group that does not receive any treatment. However, a viable control condition can be obtained by using a waiting-list control group. Typically, participants are randomly assigned to either the treatment condition or the waiting-list control group; at the end of the treatment phase and the posttests, the treatment is made available to the participants in the waiting-list control group. In either the pretest-posttest control group design or the posttest-only control group design, the treatment given to the waiting-list participants can be analyzed to test the reliability of the results (Kazdin, 2003) or to rule out threats to the validity of quasi-experimental designs (Shadish et al., 2002). One disadvantage of the waiting-list control group is that long-term follow-up of the control participants is lost (because they have by then received treatment). Another disadvantage is that although ultimately the participants in the waiting-list control group receive treatment, the treatment is withheld for some time. (For more details on this topic, see Chapter 20.)
Another type of control group is the placebo control group. Participants in a placebo control group are led to believe that they are receiving a viable treatment, even though the services rendered them are nonspecific and supposedly ineffective. For example, in a group counseling outcome study, participants in the placebo condition may be in a discussion group with no active group counseling. The rationale for including a placebo control is that it enables the researcher to separate the specific effects of a treatment from effects due to client expectations, attention, and other nonspecific aspects. Some investigators contend that the major effects of the counseling process are due to nonspecific factors (Wampold, 2001); inclusion of a placebo control group allows determination of whether the effects of a treatment are greater than those obtained under conditions that appear to clients to be viable but do not contain the major aspects of the active treatments.
A final type of control group is the matched control group. Participants in a matched control group are paired in some way with participants in the treatment group. The primary purpose of this type of design is to reduce variance due to a matching factor (which is discussed later in this chapter under Dependent Samples Designs).
Factorial Designs
Factorial designs are used when two or more independent variables are employed simultaneously to study their independent and interactive effects on a dependent variable. Factorial designs are extensions of these earlier designs, namely by the addition of independent variables. With factorial designs it is more useful to visualize the design by diagramming the levels of the independent variables into cells. For example, lets say a researcher were interested in testing the effectiveness of two interventions designed to enhance cross-cultural awareness in high school students. The two interventions (Tx1 and Tx2) and a no-treatment control are formed. In addition, the researcher is interested in examining whether male and female students might respond differently to the interventions. The study would be considered a 2 (Gender: male and female) × 3 (Tx1, Tx2, and control) posttest-only design containing six cells. To illustrate, please see Figure 11.3, which includes posttest cultural awareness scores in each cell.
In this hypothetical example, one intervention (Tx1 cultural awareness score = 20) was found to be more effective than the other intervention (Tx2 cultural awareness score = 10) for boys due to the higher posttest cultural awareness scores. In contrast, it was the other way around for girls; those that received Tx2 intervention (Tx2 cultural awareness score = 15) had higher posttest cultural awareness scores than those who received the Tx1 intervention (Tx1 cultural awareness score = 8). Alternately, if there were two independent variables that each had three levels or conditions, this would be considered a 3 × 3 design and have nine cells.
Strengths The unique strength or advantage of the factorial design is that it tests the effects of two or more independent variables, and of their interaction with each other on the dependent variable. The factorial design provides more information than the single-independent-variable designs because it simultaneously tests two or more independent variables.
In our hypothetical study, the researcher could examine whether two interventions have an effect on the dependent variables, as well as whether a participant's gender has an effect on the dependent variables. The effect of an independent variable on a dependent variable is often referred to as a main effect. Because of the efficiency of such simultaneous tests in factorial designs, it is not uncommon for researchers to test two, three, or even four independent variables in one study. Usually these added independent variables are person (personality) variables.
FIGURE 11.3Factorial Design Example
More important, factorial designs allow the investigator to examine the interaction of the independent variables. An interaction means that the effect of one of the independent variables depends on the levels of one or more other independent variables. In our hypothetical example, the researcher might find that either one of the treatments does not have the same effect on all participants, but instead one intervention was more effective for boys whereas the other intervention was more effective for girls. Thus, factorial designs not only result in more information because they examine the effects of more than one independent variable, but also result in more complex information about the combined effects of the independent variables.
Another advantage of factorial designs is that if the second independent variable added to the design is related to the dependent variable as expected, then the unexplained variance in the dependent variable is reduced. Reducing unexplained variance is again related to Kerlinger's MAXMINCON, which in essence increases the power of the statistical test for analyzing factorial design (for example, in the analysis of variance, the denominator of the F ratio is reduced).
In a way, our fictitious example indicates how the factorial design can provide important qualifications about relationships between variables. The factorial design provides some answers about the conditions under which a treatment may operate, such as the gender of participants, the type of intervention, the age of clients, or the problem-solving style of clients. Whereas the single-variable study most often investigates whether a variable (most notably some treatment) has any effect, the factorial design examines more complex questions that approximate the complexity of real life.
Weaknesses Although at first one might think the more information, the better, it is important to realize the costs involved as more variables are added to designs. With the addition of variables, the results of the study become more complex and sometimes too complex. In a 2 × 2 design, the researcher would typically examine the main effects of two levels of variable A, the main effects of two levels of variable B, and the interaction of A with B. In a 2 (A) × 2 (B) × 2 (C) design, the investigator would typically examine the main effects of variables for two levels of A, B, and C; the two-way interactions of A with B and B with C; and the three-way interaction among A, B, and C. Complex interactions between three, four, or more independent variables typically are difficult to interpret, and the results of the study may be unclear. Researchers should not add independent variables just to have more than one independent variable; instead, independent variables need to be carefully selected on theoretical and empirical grounds after thought is given to the research questions of interest.
Another disadvantage of the factorial design is the flip side of an advantage: If additional independent variables are added to the design and these variables turn out to be unrelated to the dependent variable, then the power of some statistical test may be reduced. There are also complications regarding the conclusions that can be drawn when the independent variable is a status variable (e.g., counselor gender) and is not manipulated. (For more details, see the discussion in Chapter 8.)
An Example Merrill, Reid, Carey, & Carey (2014) examined the moderating effects of gender and depression level on the effectiveness of brief motivational intervention in reducing alcohol use. They found a three-way interaction effect of the intervention. For women, those with low-depression reduced their drinking more after the brief motivational intervention than the control groups, whereas women with high depression did not show any differential improvement compared to the control groups. On the contrary, high-depression men showed significant reductions in weekly drinks following the brief motivational interventions, whereas low-depression men did not show differential improvement compared to the control groups. The results of the study indicated that the effectiveness of brief motivational interventions for alcohol abuse differs across participants by gender and depression level.
Dependent Samples Designs
Dependent samples designs are a type of between-groups design that are intended to address issues related to some of the problems mentioned previously related to random assignment of participants. Dependent samples designs are based on the assumption that a particular extraneous variable, let's say intelligence or level of psychological functioning, is important to the outcome of the study. Importance in this context can be defined in two ways.
First, the variable may be theoretically important for understanding the phenomenon under investigation. In this case, the variable definitely should be examined for its own sake. For example, if intelligence is thought to be an important variable theoretically, then it should be included as an independent variable in a factorial design. In this way the effects of intelligence, as well as the effects of the interaction of intelligence with the treatments (or with other independent variables), can be determined.
Second, if the variable is not interesting for its own sake, it might best be labeled a nuisance variable. Although a nuisance factor is not examined explicitly (i.e., by inclusion as an independent variable in a factorial design), it remains an important consideration in the design of an experiment because it could affect the results in unknown ways. For example, pretest level of functioning may not be interesting to the researcher in the sense that the effectiveness of treatment for clients at different levels of psychological functioning is not a burning research question. Nevertheless, it is desirable to have the treatment and control groups comparable on psychological functioning so that psychological functioning does not confound the results. Sometimes a useful way to reduce the effects of a confounding variable is to match participants on the basis of the potentially confounding variable pretest scores and then randomly assign one of the matched participants to the treatment group and the remaining participant to the control group, as illustrated in Table 11.2. As a result, the two samples are dependent. In this way, the researcher can be relatively certain that levels of psychological functioning are comparable across the two groups. More important, if the nuisance factor is related to the dependent variable as expected, then the variance in the nuisance variable can be removed from the variance in the outcome variable, resulting in a more powerful statistical test (Wampold & Drew, 1990). The typical statistical test for this type of design is the dependent samples t test (sometimes called the paired t test or correlated t test).
TABLE 11.2Assignment of Participants to Treatment and Control Groups in Dependent Samples Design
Pairs of Participants |
Treatment |
|
Control |
1 |
S11 |
is matched with |
S12 |
2 |
S21 |
is matched with |
S22 |
3 |
S31 |
is matched with |
S32 |
— |
— |
|
— |
— |
— |
|
— |
— |
— |
|
— |
N |
Sn1 |
is matched with |
Sn2 |
Note: Paired participants have comparable scores on pretest.
Essentially, the dependent samples t test accomplishes the same purpose as the analysis of covariance—it reduces unexplained variance and yields a more powerful test. The analysis of covariance does not require that participants be matched, and the reduction in unexplained variance is accomplished statistically, by the design of the experiment. The dependent samples design reduces uncertainty by matching comparable participants. Two participants who have high pretest scores are also likely to have high posttest scores; differences in posttest scores for these two matched participants are due presumably to the treatment (and other uncontrolled factors).
Dependent samples can be accomplished in other ways as well. Often natural pairs, such as monozygotic twins, are used. Because monozygotic twins have identical genetic material, using such pairs holds all hereditary factors constant. Other natural pairs include litter mates (not often applicable to counseling researchers), marital partners, siblings, and so forth.
The idea of two dependent samples can be expanded to include more than two groups (e.g., two treatment groups and a control group). Typically, the dependency is created by matching or by repeated measures (it is a bit difficult to find enough monozygotic triplets for such a study!). For example, Fitzgerald, Chronister, Forrest, and Brown (2013) were interested in the effectiveness of an employment-focused group counseling intervention—OPTIONS among male inmates. OPTIONS is aimed at enhancing inmates' career exploration, job-search skills, knowledge of career options, goal planning, and identification of resources. The researchers believed that age and release date would affect interventions outcomes. Thus, to ensure equivalency between the treatment intervention and control groups on age and release date, participants were first matched on these two variables before being randomly assigned to the treatment or control group. When more than two participants are matched and assigned to conditions, the design is called a randomized block design. Each group of matched participants is called a block, and the participants within blocks are randomly assigned to conditions. The randomized block design is typically analyzed with a mixed model analysis of variance (see Wampold & Drew, 1990).
In sum, matching is a way to control for a nuisance factor that is believed or known to have an effect on the dependent variable. Dependent sample designs are powerful tools for increasing the power of statistical tests. Properly used, these designs can enable the researcher to accomplish the same purpose with far fewer participants.
One final note: Many times in counseling research, randomly assigning participants to groups is not possible. For example, ethical problems would arise if a researcher tried to randomly assign clients to therapists with different levels of counseling experience, such as beginning practicum, advanced practicum, doctoral-level interns, and senior staff psychologists. If clients were assigned randomly to counselors it is quite likely that a client with complex psychological problems would be assigned to an inexperienced therapist who is ill-equipped to work therapeutically with such a client. In such applied situations, randomization may well introduce more practical problems than it solves experimentally. Sometimes researchers will attempt to show that clients are equivalent (matched) on several dimensions such as age, gender, presenting problem, and personality variables. Matching in such a post hoc fashion can rule out some dimensions in comparing clients, but it is important to realize that many variables, known or unknown, are simply left uncontrolled. Thus, a weakness of such field designs is that unknown variables may confound the relationships among the variables being investigated.
WITHIN-SUBJECTS DESIGNS
The remainder of this chapter examines within-subjects designs. The hallmark of the within-subjects design is that it attempts to minimize error variance due to individual variation by having each participant serve as his or her own control. Similar to the between-groups design, participants are randomly assigned to groups or treatments, and independent variables are manipulated. The unique feature of the within-subjects design is that all participants are exposed to all of the treatment conditions; random assignment involves assigning people to different sequences of treatment.
In this section we first provide an overview of two within-subjects designs: crossovers and counterbalanced crossover designs. We then discuss the strengths and limitations of these within-subjects designs.
Crossover Designs
Suppose a researcher wanted to compare the effects of two treatments (independent variables)—test interpretation of the Strong Interest Inventory (SII) and work genograms—on a dependent variable, vocational clients' career maturity. The researcher could use the within-participants design, as shown in Figure 11.4. Ob1, Ob2, and Ob3 represent different observations—in this case, administration of a career inventory (say, the Strengths Self-Efficacy Scale; Tsai, Chaichanasakul, Zhao, Flores, & Lopez, 2014). Tx1 represents the test interpretation treatment, and Tx2 represents the genogram treatment.
FIGURE 11.4Crossover Designs
This is called a crossover design; all participants are switched (i.e., crossed over) to another experimental condition, usually halfway through the study. Suppose the researcher conducted this study with 20 vocationally undecided adults as diagrammed. Suppose the researcher found a significantly greater change in career maturity between Ob2 and Ob3 than between Ob1 and Ob2 (p < .01); could he or she conclude that genograms are better at promoting career maturity than test interpretation? This conclusion would be quite tenuous because of the threats to internal validity embedded in this design, such as history (events may have happened to the participants between the administrations), maturation (normal development may have occurred), order effects (i.e., perhaps genogram treatments are more effective if they are presented as a second treatment), or sequence effects (i.e., perhaps the genogram treatment is effective only if it follows and perhaps adds to an SII test interpretation). In point of fact, a major difficulty in the within-subjects design is the possibility of confounding order or sequence effects. Order effects refers to the possibility that the order (i.e., the ordinal position, such as first or third) in which treatments were delivered, rather than the treatment per se, might account for any changes in the development variable. Sequence effects refer to the interaction of the treatments (or experimental conditions) due to their sequential order; that is, treatment Tx1 may have a different effect when it follows treatment Tx2 than when it precedes treatment Tx2.
Counterbalanced Crossover Designs How might the researcher control the sequential order threats to internal validity? One of the primary mechanisms used to control such threats is counterbalancing, which involves “balancing” the order of the conditions. Figure 11.5 is a diagram of a counterbalanced crossover design.
The participants are randomly assigned to two groups: [A] and [B]. Again, Tx1 and Tx2 in the diagram represent the two treatments, and the Ob's represent the different observation periods. Ob1a and Ob1bdesignate a pretesting assessment; Ob2a and Ob2b represent an assessment at the crossover point, and Ob3a and Ob3b indicate testing at the end of the experiment. Thus, the groups differ only in the order in which they receive the treatments. In this case, counterbalancing also controls for sequence effects: Tx1 precedes Tx2 for group [A], whereas Tx2 precedes Tx1 for group [B].
FIGURE 11.5Counterbalanced Crossover Design
It is important to be aware of two issues with regard to counterbalancing. First, the researcher can now use some simple statistical procedures to determine whether the order of the treatment conditions made any difference vis-à-vis the dependent variables. For example, a simple t test can be conducted on Ob2a versus Ob3b to determine whether treatment Tx1 resulted in differential effects depending on whether the treatment was administered first or second. A similar t test can be conducted on Ob3a versus Ob2b for treatment Tx2. These analyses are important not only for the present research, but also so that future researchers can know about order or sequence effects. A second issue is that even if there is an order effect, it can be argued that these effects are “balanced” or equal for both treatments (given the preceding example), and that order effects are therefore controlled.
Piet, Hougaard, Hecksher, and Rosenberg (2010) used a counterbalanced crossover design to examine the effects of group mindfulness-based cognitive therapy and group cognitive-behavioral therapy for young adults with social phobia. Of the 26 participants, 14 were randomly assigned to receive eight sessions of group mindfulness-based cognitive therapy followed by 12 sessions of group cognitive-behavioral therapy. The remaining 12 clients first received group cognitive-behavioral therapy and then mindfulness-based cognitive therapy. Participants' level of social anxiety, anxiety symptoms, and interpersonal problems were assessed (a) prior to treatment, (b) after the first series of treatments, (c) at completion of both treatments, and (d) then followed up 6 and (e) 12 months after treatment. Results revealed that mindfulness-based cognitive therapy achieved moderate-high pre-post effect sizes, and was not significantly different from the cognitive-behavioral therapy. Participants receiving treatments in both sequences continued to improve following the first and second treatment until the 6-month follow-up. Findings from this study suggested mindfulness-based cognitive therapy as an effective treatment along with cognitive-behavioral therapy for young adults with social anxiety.
RESEARCH IN ACTION 11.2
Briefly describe the differences between the two major categories of true experiments: between-groups designs and within-subjects designs. Identify a general research question of your interest and discuss whether it would be more appropriate to use a between-groups design or a within-subjects design. Moreover, are there any differences in what these two designs would address/answer in regard to your research question?
Strengths and Limitations
We will discuss five issues to depict the strengths and weaknesses of within-subjects design that can affect the appropriateness of a within-subjects design for a particular research question. These five issues are (a) experimental control, (b) statistical power, (c) time, (d) order effects, and (e) restriction of certain independent variables.
Experimental Control The traditional within-subjects design is potentially a powerful design because of its reliance on random assignment of treatments and manipulation of independent variables. The experimenter can often obtain a great deal of experimental control with this design, and the threats to internal validity tend to be low with a counterbalance crossover design. Moreover, the within-subjects design tends to minimize error variance due to normal individual variability by using each participant as his or her own control. The reduction of individual error variance is a noteworthy advantage of the within-subjects design, which merits consideration when the researcher is especially concerned about such error.
Statistical Power Because each participant receives all levels of the independent variable, there are typically some advantages from a statistical perspective. In general, a researcher can use half the number of participants in a counterbalanced crossover design and still retain the same statistical power as in the between-subjects design (see Kerlinger, 1986, for a more complete statistical discussion of this matter).
Time Although a within-subjects design can use fewer participants to obtain a level of statistical power similar to a between-groups design, the trade-off is that the within-subjects design takes longer to conduct. Consider a research team who want to compare interpersonal and cognitive-behavioral approaches to the treatment of depression. Suppose they recruit 24 depressed clients. If the team chooses to use a between-groups design, they can randomly assign 12 participants to 12 sessions of interpersonal treatment, and the remaining participants to 12 sessions of cognitive-behavioral treatment. In this design, at the end of 12 weeks the research team has implemented the interventions and has collected the data. If the research team instead uses a within-subjects design with only 12 clients—randomly assigning 6 clients to receive 12 sessions of interpersonal therapy followed by 12 sessions of cognitive therapy, and assigning the remaining 6 participants to receive treatment in the reverse order—the team would need 12 more weeks than for the between-groups design to implement the interventions and collect the data. Thus, sometimes an important consideration is the trade-off between the number of participants and the time required. We would encourage researchers, however, not to be too quick to overlook within-subjects designs only because of the time factor.
Order Effects As we indicated earlier, a special problem of the within-subjects design is the effects of order. Order effects are threats to internal validity. Even when order effects are controlled, as in the counterbalance crossover design, it is still important to check whether the order of the treatments affected the dependent variable. Sometimes it is assumed that because counterbalancing equalizes any effects due to order, the researcher can ignore such order effects. This strategy, however, does not provide any information about the basic question: Were there any order effects in a particular study? Such information can be useful to future researchers as they design their investigations on a similar topic. Likewise, practitioners may be interested in knowing if the order of treatments makes any difference as they plan to maximize their interventions.
Restriction of Variables A final consideration in the use of within-subjects designs involves the restriction of certain independent variables. It may not be possible to use certain independent variables in a within-subjects design. It is impossible, for example, to induce both the expectation that a given treatment will be effective and then the subsequent expectation that it will not be effective. Or two treatments may be too incompatible with each other. Kazdin (2003) offered as an example the conflicting approaches of systematic desensitization and flooding. It is important for the researcher considering a within-subjects design to closely examine the effects that multiple treatments may have on one another. Given that each participant receives all treatments, the experimenter must assess whether the combination of multiple treatments can be administered realistically and fairly. Finally, variables that involve some personality, demographic, and physical characteristics may not vary within the same participant in a given experiment. For example, a participant cannot be both a male and female participant, or be a participant from both a rural and an urban community; thus, these variables cannot be examined using within-subjects designs.
It is also important not to dismiss the utility of within-subjects designs if the limitations of these designs initially seem restrictive for a particular study. For example, based on the inherent differences within behavioral and psychodynamic therapy, it could easily be concluded that these two therapy orientations could not be compared within a particular set of participants. However, Stiles, Shapiro, and Firth-Cozens (1988) did use, quite successfully, a within-subjects design comparing eight sessions each of exploratory (interpersonal-psychodynamic) and prescriptive (cognitive-behavior) therapy. Although worries about treatment contamination may be present or even pervasive among counseling researchers, this study challenges us to fairly evaluate the crossover effect and to be creative in our thinking about within-subjects designs.
SUMMARY AND CONCLUSIONS
There are two types of true experiments: between-groups and within-subjects designs. These are true experiments because in both cases there is random assignment of treatments and manipulation of an independent variable. In between-groups designs, the random assignment allocates participants to treatment conditions to create experimental and control groups. In contrast, in within-subjects designs, all participants are exposed to all treatment conditions. Thus, the overall goal of the within-subjects design is to compare the effects of different treatments on each participant. Both designs lend themselves to Kerlinger's MAXMINCON principle. Because randomization of participants is a defining characteristic of both between-groups and within-subjects designs, we discussed participant assignment and group equivalence.
In terms of between-groups designs, we discussed the posttest-only control group design and the pretest-posttest control group design. These experimental designs are clearly powerful designs, because they can rule out many rival hypotheses. Each design controls for all the common threats to internal validity. A key feature of these designs is the random assignment of participants; randomly assigning participants to groups is a major source of control with regard to internal validity. Because control groups are commonly used in these designs, we discussed issues pertaining to different types of control groups, such as no-treatment groups, waiting-list control groups, placebo groups, and matched control groups.
We also described two traditional within-subjects designs, the crossover and counterbalanced crossover designs. Both of these designs make comparisons between two or more groups of participants, but in a different way from the between-groups design. In the crossover design, all participants are switched to another experimental condition, usually halfway through the study. Counterbalancing was introduced within this design as a way of reducing bias due to order effects. We suggested that at least five issues specific to the traditional within-subjects design can affect its utility for examining a particular research question, namely (a) experimental control (particularly with regard to individual participant variation), (b) statistical power, (c) time, (d) order effects, and (e) restriction of certain independent variables. In particular, we encouraged researchers to be creative in the application of traditional within-subjects designs. In short, within-subjects designs offer a powerful means of identifying causal relationships. The advantages of these designs are their ability to reduce both error variance (by using each participant as his or her own control) and the fewer number of participants needed in a particular study.
Clearly, the between-groups and within-subjects designs are useful designs for examining research questions of interest to those in the counseling profession. These designs are flexible and can be made applicable to a wide variety of research problems. However, it is important for the researcher in counseling to evaluate the strengths and limitations of these designs relative to the type of research question being asked and type of participants needed. Given the applied nature of many of our research questions in counseling, the researcher needs to consider carefully a broad range of issues pertaining to external validity to evaluate the utility of the true experimental designs in providing the most-needed information. In addition, many times the random assignment of participants to groups cannot be done because of ethical constraints, such as in a study of the effects of different levels of sexual harassment. We think students should be encouraged to consider the strengths and weaknesses of various designs in relation to the nature of various research questions. In other words, the utility of the design needs to be evaluated in the context of the research question, the existing knowledge bases, and internal and external validity issues.
STIMULUS QUESTIONS
Between-Groups and Within-Subjects Designs
This exercise is designed to promote reflection on between-groups and within-subjects experimental designs. After reading this chapter, write your responses to the following questions. Then discuss your responses with a peer in your class.
1.Talk to faculty and peers about their perceptions of the usefulness of the between-groups and within-subjects designs. What advantages and disadvantages first come to mind for them? Is there a pattern in the responses that is reflected when others speak of the disadvantages of between-groups and within-subjects designs?
2.Compare the primary strengths and weaknesses of between-groups and within-subjects designs. Could you argue that one of these designs is better than the other?
3.What are the key elements that define true experimental designs?
4.Randomization is a component of true experimental designs. Described how randomization is applied differently for between-groups and within-subjects designs.
5.Make a list of the pros and cons of using a control group. Of all the issues you list, can you pick one that you believe is the most important methodological issue when applying it to a research question of your interest?
6.In the early days of counseling, more between-groups and within-subjects designs were used. Now it is difficult to find good examples of them in our major journals. Why do you think this trend exists, and what do you think it means for the field of counseling?
Week 5 Homework Exercise CCMH/525 Version 3 |
1 |
University of Phoenix Material
Week 5 Homework Exercise
Answer the following questions in 25 to 50 words each (where applicable), covering material from Ch. 11 of Methods in Behavioral Research:
1. How may a researcher enhance the generalizability of the results of a single-case design?
2. What is program evaluation? Provide an example.
3. What are researchers examining in an efficacy assessment?
4. Unlike true experiments, quasi-experimental designs lack _____________.
5. Why might a researcher use a quasi-experimental design rather than a true experimental design?
6. A researcher is investigating the effects of life review group therapy on depression among older adult residents of assisted care facilities. The researcher identifies two facilities and recruits volunteer participants. In one of the facilities, life review therapy is conducted for eight weeks. In the other facility, a no-treatment support group takes place for eight weeks. Depression is measured after the program is administered. What type of research design is this? What is the greatest threat to internal validity in this type of design? Explain.
7. A researcher wants to investigate patriotic behavior across the lifespan. She samples people in the following age groups: 18–28, 29–39, 40–50, 51–60, and 61 and above. All participants are interviewed and asked to complete questionnaires and rating scales about patriotic behavior. This type of developmental research design is called ________________. What is the primary disadvantage of this type of design? Explain.
Copyright © 2017, 2015 by University of Phoenix. All rights reserved.