Between and Within Designs
Lecture 2: Which one?
Okay, so if we think we can influence something a whole lot, we can think of that as a big effect. Imagine that we have found a way of hypnotizing people that will give them an instant genius IQ! (Clearly a hypothetical idea.)
We know that the average IQ is 100 with a standard deviation of 15. So if we want to study the effects of our technique, we can randomly assign people either to the Experimental Group or the Control Group. You can imagine that the IQs in each group will cluster around 100 vary mostly between 85 and 115 (plus and minus one SD). So, you might expect the results of the study might look something like this
Experimental Group (Hypnosis) |
Control Group (no treatment) |
||
Participant # |
IQ Score |
Participant # |
IQ Score |
101 |
180 |
201 |
100 |
102 |
176 |
202 |
96 |
103 |
168 |
203 |
88 |
104 |
190 |
204 |
110 |
105 |
185 |
205 |
100 |
106 |
180 |
206 |
105 |
107 |
190 |
207 |
106 |
108 |
186 |
208 |
110 |
109 |
182 |
209 |
96 |
110 |
176 |
210 |
102 |
|
|
|
|
Mean |
181.30 |
Mean |
101.30 |
In other words, we can use a between groups design, because the variability we expect to see in the population (hence, our sample) is much less than the type of effect we are going to produce.
On the other hand, what if you could hypnotize people into just a few points gain in IQ? You could think of this as a small effect. How will it show up in two different groups?
Can you tell which of these groups got the treatment and which was control?? Did your treatment work? Who knows!! The differences among the individuals in the groups are more than the differences between the groups.
????? |
????? |
||
Participant # |
IQ Score |
Participant # |
IQ Score |
101 |
101 |
201 |
100 |
102 |
90 |
202 |
96 |
103 |
88 |
203 |
88 |
104 |
108 |
204 |
110 |
105 |
112 |
205 |
100 |
106 |
100 |
206 |
105 |
107 |
99 |
207 |
106 |
108 |
114 |
208 |
110 |
109 |
102 |
209 |
96 |
110 |
109 |
210 |
102 |
|
|
|
|
Mean |
102.30 |
Mean |
101.30 |
Okay, you all know that in order to detect such a subtle shift, you would have to know what each person’s IQ was to begin with!
So, what if you tried something like this:
Participant # |
IQ Score Before |
IQ Score After |
Change |
101 |
102 |
101 |
-1 |
102 |
85 |
90 |
5 |
103 |
83 |
88 |
5 |
104 |
101 |
108 |
7 |
105 |
106 |
112 |
6 |
106 |
96 |
100 |
4 |
107 |
95 |
99 |
2 |
108 |
109 |
114 |
5 |
109 |
99 |
102 |
3 |
110 |
103 |
109 |
6 |
The important thing here is not the average before or the average after, but the average change. In other words, the difference within subjects is the most critical.
On average, the change here is 4.20 IQ points—is it “significantly different” from 0? If there were no effect, the change score would be 0. Even if it weren’t 0 for everyone, it would average out to about 0, a few improving a bit and a few declining a bit. On average, there would be no change. But, when we look at these data, we can see that most all of the participants improved, so that even if there wasn’t a big effect, it seems to have been a non-trivial effect. Small, but significant.
Again, if you get the concept of “small but significant,” that is fabulous. The statistics will come later and build on this concept.
Not so fast, you say??
Well, okay, you are too smart. Yes, sometimes you might assume that taking one test over and over might result in improvements to the score just because a person is getting so much practice at test taking!
So, in this case, to control for the possible practice effects which would threaten internal validity, we would want to get our control group back and test them before and after their “control” experience. This would be a mixed design with both a between and within component.
Can you predict what the results would look like if there were no real effects of your intervention? Then you would expect similar change from before to after (often called pre- to post-) for both groups. If the average 4.2 point change we found in the last table were due to practice effects, then the control group would show a similar change.
What if there were effects of your intervention? Then the pre- to post- change would be greater in the treatment group than in the control group.
Pretty nifty, eh?
The between-within distinction is a fundamental principle of research design.
When to use between? When groups can be comparable and the effect will outweigh the variability in the group.
Ensuring that the groups are maximally comparable—limiting the selection criterion or matching groups on characteristics—is important in using between subject designs.
When to use within? When there are individual differences (variability in aspects of the participants) that are relevant to the dependent that might outweigh the size of the effect you believe you will find.
What makes a source of individual difference relevant to the dependent measure? Theory and empirical evidence. In other words, use the literature to inform your thinking about what factors have the potential to influence the variable of interest, and, in turn, the design of your study.
When to use a mixed design? Whenever the passage of time or the results of practice have the potential to account for changes found in a within subjects design.
Some additional considerations
The terms independent and dependent are used here to describe the data (or scores). It can be confusing because this is not the same usage as IV (independent variable) or DV (dependent variable).
We think of data as independent when one score in one condition has no link to a score in another condition. In most between subject designs, data are independent.
We think of data as dependent when one score in one condition is connected to a particular score in another condition. Within subject designs provide dependent data—each person’s pre- score is connected to that same person’s post- score. This connection is critical, and in this way, each participant serves as his or her own control.
There is a variant of the between subjects design that provides dependent data: matched pairs. When matched pairs are used, one participant is matched to another and then the two in the pair are randomly assigned to one of two treatment conditions. Imagine a study comparing twins responses to some type of diet or drug. You would compare one twin to his or her co-twin: two people (between) but with linked data (within).
Note that matching can be used at the group level. You can match your groups for the proportion of males and females, for example, but this design does not link individual to individual.
The importance of counter balancing
Imagine you were interested in the influence that eye color has on the length of sentence people judged “guilty” would receive in a mock courtroom scenario. So, you have pictures of a “defendant” provided along with a description of the crime. Half of the defendant’s have blue eyes, half brown. So that participants have more than one opportunity to make a judgment, you have each one evaluate multiple defendants, say 4 (and we will refer to them as D1, D2, D3, D4).
What else do you need to think about? You don’t want any other factors to confound eye color. For example, you would not want to compare black and white faces since blue eyes are so rare in Blacks. You would not want one eye-color to have all attractive faces and the other to have all plain faces. What to do?
Thanks to technology, it is easy to alter the color of eyes in a picture. So you take each picture and create a blue version and a brown version.
· All of your participants get half blue and half brown faces to judge, so you can compare the average blue and brown score for each participant.
· Each particular stimulus (defendants D1, D2, D3, and D4) gets judged in both brown and blue eyes.
· Since you can’t have two versions of the same face being judged by a participant without your participant thinking it odd or figuring out exactly what you are doing, you need to split up the pairs of Blue-Brown versions among the participants.
Let’s say you have 12 participants. Here is how you might present the stimuli to them. In this way, each participant gets 2 blue and 2 brown, and each defendant face is blue half the time and brown half the time across the 12 participants.
|
Stimuli |
|||
Participant |
D1 |
D2 |
D3 |
D4 |
101 |
Blue |
Blue |
Brown |
Brown |
102 |
Blue |
Blue |
Brown |
Brown |
103 |
Blue |
Blue |
Brown |
Brown |
104 |
Blue |
Brown |
Blue |
Brown |
105 |
Blue |
Brown |
Blue |
Brown |
106 |
Blue |
Brown |
Blue |
Brown |
107 |
Brown |
Blue |
Brown |
Blue |
108 |
Brown |
Blue |
Brown |
Blue |
109 |
Brown |
Blue |
Brown |
Blue |
110 |
Brown |
Brown |
Blue |
Blue |
111 |
Brown |
Brown |
Blue |
Blue |
112 |
Brown |
Brown |
Blue |
Blue |
Got it? If you can complete these two chapters with a solid understanding of
You are golden!