47.269 Research I:  Basics

Between and Within Designs

 

Lecture 2:  Which one?

Okay, so if we think we can influence something a whole lot, we can think of that as a big effect.  Imagine that we have found a way of hypnotizing people that will give them an instant genius IQ!  (Clearly a hypothetical idea.) 

 

We know that the average IQ is 100 with a standard deviation of 15.  So if we want to study the effects of our technique, we can randomly assign people either to the Experimental Group or the Control Group.  You can imagine that the IQs in each group will cluster around 100 vary mostly between 85 and 115 (plus and minus one SD).  So, you might expect the results of the study might look something like this

 

Experimental Group

(Hypnosis)

Control Group

(no treatment)

Participant #

IQ Score

Participant #

IQ Score

101

180

201

100

102

176

202

96

103

168

203

88

104

190

204

110

105

185

205

100

106

180

206

105

107

190

207

106

108

186

208

110

109

182

209

96

110

176

210

102

 

 

 

 

Mean

181.30

Mean

101.30

 

 

In other words, we can use a between groups design, because the variability we expect to see in the population (hence, our sample) is much less than the type of effect we are going to produce. 

On the other hand, what if you could hypnotize people into just a few points gain in IQ?  You could think of this as a small effect.  How will it show up in two different groups?

 

Can you tell which of these groups got the treatment and which was control??  Did your treatment work?  Who knows!!  The differences among the individuals in the groups are more than the differences between the groups.

 

?????

?????

Participant #

IQ Score

Participant #

IQ Score

101

101

201

100

102

90

202

96

103

88

203

88

104

108

204

110

105

112

205

100

106

100

206

105

107

99

207

106

108

114

208

110

109

102

209

96

110

109

210

102

 

 

 

 

Mean

102.30

Mean

101.30

 

Okay, you all know that in order to detect such a subtle shift, you would have to know what each person’s IQ was to begin with!

 

So, what if you tried something like this:

 

Participant #

IQ Score

Before

IQ Score

After

Change

101

102

101

-1

102

85

90

5

103

83

88

5

104

101

108

7

105

106

112

6

106

96

100

4

107

95

99

2

108

109

114

5

109

99

102

3

110

103

109

6

 

 

The important thing here is not the average before or the average after, but the average change.  In other words, the difference within subjects is the most critical. 

 

On average, the change here is 4.20 IQ points—is it “significantly different” from 0?  If there were no effect, the change score would be 0.  Even if it weren’t 0 for everyone, it would average out to about 0, a few improving a bit and a few declining a bit.  On average, there would be no change.  But, when we look at these data, we can see that most all of the participants improved, so that even if there wasn’t a big effect, it seems to have been a non-trivial effect.  Small, but significant.

 

Again, if you get the concept of “small but significant,” that is fabulous. The statistics will come later and build on this concept.

 

Not so fast, you say??

 

 

 

Well, okay, you are too smart.  Yes, sometimes you might assume that taking one test over and over might result in improvements to the score just because a person is getting so much practice at test taking!

 

So, in this case, to control for the possible practice effects which would threaten internal validity, we would want to get our control group back and test them before and after their “control” experience.  This would be a mixed design with both a between and within component.

 

Can you predict what the results would look like if there were no real effects of your intervention?  Then you would expect similar change from before to after (often called pre- to post-) for both groups.  If the average 4.2 point change we found in the last table were due to practice effects, then the control group would show a similar change.

 

What if there were effects of your intervention?  Then the pre- to post- change would be greater in the treatment group than in the control group.

 

Pretty nifty, eh?

 

The between-within distinction is a fundamental principle of research design. 

 

When to use between?  When groups can be comparable and the effect will outweigh the variability in the group. 

 

Ensuring that the groups are maximally comparable—limiting the selection criterion or matching groups on characteristics—is important in using between subject designs.

 

When to use within?  When there are individual differences (variability in aspects of the participants) that are relevant to the dependent that might outweigh the size of the effect you believe you will find.

 

What makes a source of individual difference relevant to the dependent measure?  Theory and empirical evidence.  In other words, use the literature to inform your thinking about what factors have the potential to influence the variable of interest, and, in turn, the design of your study.

 

When to use a mixed design?  Whenever the passage of time or the results of practice have the potential to account for changes found in a within subjects design.

 

 

Some additional considerations

 

The terms independent and dependent are used here to describe the data (or scores).  It can be confusing because this is not the same usage as IV (independent variable) or DV (dependent variable). 

 

We think of data as independent when one score in one condition has no link to a score in another condition.  In most between subject designs, data are independent. 

 

We think of data as dependent when one score in one condition is connected to a particular score in another condition.  Within subject designs provide dependent data—each person’s pre- score is connected to that same person’s post- score.  This connection is critical, and in this way, each participant serves as his or her own control. 

 

There is a variant of the between subjects design that provides dependent data:  matched pairs.  When matched pairs are used, one participant is matched to another and then the two in the pair are randomly assigned to one of two treatment conditions.  Imagine a study comparing twins responses to some type of diet or drug.  You would compare one twin to his or her co-twin:  two people (between) but with linked data (within).

 

Note that matching can be used at the group level.  You can match your groups for the proportion of males and females, for example, but this design does not link individual to individual.

 

 

 

The importance of counter balancing

 

 

Imagine you were interested in the influence that eye color has on the length of sentence people judged “guilty” would receive in a mock courtroom scenario.  So, you have pictures of a “defendant” provided along with a description of the crime.  Half of the defendant’s have blue eyes, half brown.  So that participants have more than one opportunity to make a judgment, you have each one evaluate multiple defendants, say 4 (and we will refer to them as D1, D2, D3, D4).

 

What else do you need to think about?  You don’t want any other factors to confound eye color.  For example, you would not want to compare black and white faces since blue eyes are so rare in Blacks.  You would not want one eye-color to have all attractive faces and the other to have all plain faces.  What to do?

 

 

 

 

 

Thanks to technology, it is easy to alter the color of eyes in a picture.  So you take each picture and create a blue version and a brown version. 

 

·        All of your participants get half blue and half brown faces to judge, so you can compare the average blue and brown score for each participant. 

·        Each particular stimulus (defendants D1, D2, D3, and D4) gets judged in both brown and blue eyes.

·        Since you can’t have two versions of the same face being judged by a participant without your participant thinking it odd or figuring out exactly what you are doing, you need to split up the pairs of Blue-Brown versions among the participants.

 

Let’s say you have 12 participants.  Here is how you might present the stimuli to them.  In this way, each participant gets 2 blue and 2 brown, and each defendant face is blue half the time and brown half the time across the 12 participants.

 

 

Stimuli

Participant

D1

D2

D3

D4

101

Blue

Blue

Brown

Brown

102

Blue

Blue

Brown

Brown

103

Blue

Blue

Brown

Brown

104

Blue

Brown

Blue

Brown

105

Blue

Brown

Blue

Brown

106

Blue

Brown

Blue

Brown

107

Brown

Blue

Brown

Blue

108

Brown

Blue

Brown

Blue

109

Brown

Blue

Brown

Blue

110

Brown

Brown

Blue

Blue

111

Brown

Brown

Blue

Blue

112

Brown

Brown

Blue

Blue

 

 

Got it?  If you can complete these two chapters with a solid understanding of

 

You are golden!