Reading: Corbett, Chapter 11
I. When to use control variables
You have tested a bivariate relationship. It is statistically significant (p <= .05). Can you now conclude that it supports existing theory? Not quite. Anytime you have a reasonable bivariate relationship (which you decided to examine for some theoretical reason -- remember that theory comes first)), you should introduce control variables. These control variables are of three kinds, and each is designed to investigate a different possibility. Which kind it is depends on what theory and past research tells you makes sense. Just as in the research process, you start with theory.
A. Confounding effects. Control variables that are related to both the independent and dependent variable are used to investigate possible confounding relationships. By confounding. we mean that the control variable changes or confounds what we originally thought to be true.
I ------------> D
/|\
/|\
\----- C -------/
A spurious relationship is an apparent yet false or misleading relationship that is caused by a third variable that is related to both the independent and dependent variable. Once you introduce the control variable, several possible things can happen.
1. The original relationship can remain at about the same level of strength and in the same direction for each value of the control variable. In that case we conclude that the original relationship was independent of the control variable.
2. The original relationship can disappear or become so weak that it is statistically insignificant for each value of the control variable. In that case we conclude that the original relationship was spurious, that it was only the result of the effect of the control variables.
3. The original relationship is strengthened for all values of the control variable in the controlled relationships. In this case we conclude that the control variable strengthens the original relationship.
4. The original relationship may be reversed for all values of the control variable. The control relationships will be significant and in the opposite direction of that in the original bivariate relationship. In this case we conclude that the control variable reversed the original relationship.
5. At least one of the control relationships is different than the original bivariate relationship while all the relationships for different values of the control variable are not the same. It could disappear or be reversed in one or more of the controlled relationships. In this case we say that the control variable conditions the original relationship. You will note in the next section that this is what you may be looking for in the first place, even when you have no reason to think that the control variable is related to either the indepencent or dependent variable.
6. This case is a bit rare, but it can happen. It happens when you thought you would have a good strong bivariate relationship, but no significant bivariate relationship was found. You go ahead and introduce the control variable and then find that you now have relationships, as you originally expected from theory. In this case you conclude that the control variable disclosed the relationship.
B. Conditioning. Some control variables are not related to either the independent or dependent variable, or at most are related to only one of them. Take for example the gender gap, in which gender is related to party choice. But suppose you think that age will also make a difference in this bivariate relation. You think that the gap will be much wider for young women who were part of the sexual revolution that it will be for older women who take their political cues from their spouses. So the independent variable is gender, the dependent variable is party, and the control variable is age. However, age does certainly not affect gender and probably does not affect party choice, so the relationship must be drawn differently.
gender ------------------> party
/\
|
|
age
In this case we say that we are controlling for age because age can have a conditioning effect on the bivariate relationship. Conditioning variables can have about as many different effects as in the case of potential spurious variables. While any of the possibilities we saw above can take place, here is what is most likely.
1. You get exactly what you expected from a conditioning variable. At least one of the control relationships is different than the original bivariate relationship while all the control relationships are not the same. It could disappear or be reversed in one or more of the controlled relationships. In this case we say that the control variable conditions the original relationship. You will note that this is precisely the same as in the case of a spurious control. The only difference is in the theoretical relationship between the control and the independent and dependent variables. In the case of our example above, we might find a strong gender gap for young women and a very much weaker one or even none for older women.
2. A conditioning variable could disclose a relationship while conditioning
it. Suppose you had no relationship between gender and party, but when
you controlled for age, you found a Democratic gender gap among young women
and a Republican gender gap among older women. In this case the control
variable conditions and discloses the relationship.
C. Intervening. This is a different theoretical situation, but you still perform the same kind of statistical controlling procedures. In this case the nature of the original bivariate relationship is not in question. You have already controlled for possible confounding variables and for possible conditioning variables. So the point here is not whether the independent variable has a causal impact on the dependent variable, but how it has that impact. Intervening variables are variables that the independent variable works through in its impact on a dependent variable. You do this to understand more about the exact theoretical relationship in question. It makes theory richer. For example, take the gender gap again. Exactly how does gender affect party choice. Perhaps it works through ideology. Perhaps women are more likely to see government not as the enemy who takes away profits and lays taxes, but as a defender of women’s rights and as a provider for family services such as education and day care. That is, because of different gender related issue concerns, women are more likely to embrace a liberal ideology and because of that are more likely to be Democrats. You could make a similar argument for income as an intervening variable, basing the analysis on single heads of households. Sticking with ideology, here is how the path diagram would look.
gender ---------> ideology -----------> party
In setting up tables, you treat the intervening variable just as you would a control variable. Again, several things can happen in the controlled relationships.
1. The original relationship can remain at about the same level of strength and in the same direction. In that case we conclude that the original relationship remains and reject the potential intervening variable. There is no intervening variable.
2. The original relationship can disappear or become so weak that it is statistically insignificant. In that case we conclude that the control variable intervenes between the independent variable and the dependent variable. You will note that this is precisely what we saw when a relationship was spurious. The difference in our conclusion was based on theory, not statistical tests.
3. The original relationship is weakened for each value of the control variable, yet remains significant. In this case we conclude that some intervention takes place, but there is also a direct effect as well as an indirect effect. We would redraw the path diagram as follows:
gender ---------->
ideology ------------> party
-------------------------------------->
That covers the major possibilities. The main thing to remember is that you start by testing a bivariate relationship. Then you use theory to identify possible confounding variables and possible conditioning variables. Test to see what impact they have. Then you go on to look at possible intervening variables to get a richer theoretical understanding of how an independent variable works.
II. How to control.
The statistical procedure for controlling depends on the level of measurement of the variables involved. Suppose your bivariate table was a crosstabs and your control variable was ordinal or nominal with just a few values. Then you just set up new crosstabs, one for each value of the control variable. Then you look at the relationship in each control table and compare it to the original bivariate table. A few examples we do in class should make this clear.
On the other hand, if the control variable is intervel or ratio level, the statistics are much more complex. At this level, I would advise that you reduce it to an ordinal variable (use MicroCase to collapse it into groups) with no more than about three groups (e.g. young, middle, and old age). Then produce your control tables, and again, compare each one to the origianl bivariate table.
If you had to do a test other than crosstabs for the bivariate relationship because the independent and dependent variables were ordinal with many categories or interval variables, then you simple redo whatever tests you did (scatterplot with regression or analysis of variance in most cases) for each value of the control variable. Again, you just compare each controlled relationship with the original bivariate relationship.
You will get some practice for this in the exercise in the workbook
section of chapter 11 in Corbett. We will also do some examples in class.
And of course, I expect you to do this in your research project, using
whatever variables we have measured that are theoretically relevant. Of
course, were you doing a project from scratch, you would need to anticipate
what control variables you would need (based on theory) before you planned
the survey and measure as many of them as you could.