I. When to use control variables
Suppose you have tested a bivariate relationship. Further suppose that you find that the relationship was statistically significant (p <= 0.05). Therefore you reject the null hypothesis that no relationship exists. Can you now conclude that it supports existing theory?
Not quite. Anytime you have a reasonable bivariate relationship (which you decided to examine for some theoretical reason--remember that theory comes first), you should introduce control variables. These control variables are of three kinds, and each is designed to investigate a different possibility. Which kind it is depends on what theory and past research tells you makes sense. Just as in the research process, you start with theory.
A. Confounding effects.
Control variables that are related to both the independent and dependent variable are used to investigate possible confounding relationships. By confounding, we mean that the control variable changes or confounds what we originally thought to be true. If you have a good bivariate relationship and have a possible confoundnig variable, this should be the first thing you look at after examining the bivariate relationship. Why? Because it could show that the bivariate relationship is false. In the diagram below, C is the possible confounding variable.
|
|
A spurious relationship is an apparently true yet false or misleading relationship that is caused by a third variable that is related to both the independent and dependent variable. That third variable causes the independent and dependent variable to vary together so that the independent variable only seems to cause the dependent variable to change. But introducing the control variable can have other effects as well. Here are the possibilities after you introduce a control variable.
1. The original relationship can remain at about the same level of strength and in the same direction for each value of the control variable. In that case we conclude that the original relationship was independent of the control variable.
2. The original relationship can disappear or become so weak that it is statistically insignificant for each value of the control variable. In that case we conclude that the original relationship was spurious, that it was only the result of the effect of the control variables. In the diagram above, this is indicated by the X through the path from A to B.
3. The original relationship is strengthened for all values of the control variable in the controlled relationships. In this case we conclude that the control variable strengthens the original relationship.
4. The original relationship may be reversed for all values of the control variable. The control relationships will be significant and in the opposite direction of that in the original bivariate relationship. In this case we conclude that the control variable reversed the original relationship. By reverse, I mean that the shift in percentages would go in the opposite direction from what they originally did, or if you have two ratio variables, a positive relationship would become negative. This is rare, but it can happen.
5. At least one of the control relationships is different than the original bivariate relationship while all the relationships for different values of the control variable are not the same. It could disappear or be reversed in one or more of the controlled relationships. In this case we say that the control variable conditions the original relationship. You will note in the next section that this is what you may be looking for in the first place, even when you have no reason to think that the control variable is related to either the independent or dependent variable. But if you find this when you are looking for a possible confounding effect, you just can add another arrow, the same one you see in the diagram below.
6. This case is a bit rare, but it can happen. It happens when you thought you would have a good strong bivariate relationship, but no significant bivariate relationship was found. You go ahead and introduce the control variable and then find that you now have relationships, as you originally expected from theory. In this case you conclude that the control variable disclosed the relationship.
B. Conditioning
Some control variables are not related to either the independent or dependent variable, or at most are related to only one of them. Take for example the gender gap, in which gender is related to party choice. But suppose you think that age will also make a difference in this bivariate relation. You think that the gap will be much wider for young women who were part of the sexual revolution that it will be for older women who take their political cues from their spouses. So the independent variable is gender, the dependent variable is party, and the control variable is age. However, age does certainly not affect gender and probably does not affect party choice, so the relationship must be drawn differently.
gender ------------------> party
identification
/\
|
|
age
In this case we say that we are controlling for age because age can have a conditioning effect on the bivariate relationship. Conditioning variables can have about as many different effects as in the case of potential confounding variables. While any of the possibilities we saw above can take place, here is what is most likely.
1. You get exactly what you expected from a conditioning variable. At least one of the control relationships is different than the original bivariate relationship while all the control relationships are not the same. It could disappear or be reversed in one or more of the controlled relationships. In this case we say that the control variable conditions the original relationship. You will note that this is precisely the same as in the case of a spurious control. The only difference is in the theoretical relationship between the control and the independent and dependent variables. In the case of our example above, we might find a strong gender gap for young women and a very much weaker one or even none for older women.
2. A conditioning variable could disclose a relationship while conditioning
it. Suppose you had no relationship between gender and party, but when
you controlled for age, you found a Democratic gender gap among young women
and a Republican gender gap among older women. In this case the control
variable conditions and discloses the relationship.
C. Intervening
This is a different theoretical situation, but you still perform the same kind of statistical controlling procedures. In this case the nature of the original bivariate relationship is not in question. You have already controlled for possible confounding variables (the first thing you should do) and for possible conditioning variables (the second thing you should do). So the point here is not whether the independent variable has a causal impact on the dependent variable, but how it has that impact. Intervening variables are variables that the independent variable works through in its impact on a dependent variable. You do this to understand more about the exact theoretical relationship in question. It makes theory richer.
For example, take the gender gap again. Exactly how does gender affect party choice. Perhaps it works through ideology. Perhaps women are more likely to see government not as the enemy who takes away profits and lays taxes, but as a defender of women’s rights and as a provider for family services such as education and day care. That is, because of different gender related issue concerns, women are more likely to embrace a liberal ideology and because of that are more likely to be Democrats. You could make a similar argument for income as an intervening variable, basing the analysis on single heads of households. Sticking with ideology, here is how the path diagram would look.
gender ---------> ideology -----------> party identification
In setting up tables, you treat the intervening variable just as you would a control variable. Again, several things can happen in the controlled relationships when you are looking at a possible intervening variable.
1. The original relationship can remain at about the same level of strength and in the same direction. In that case we conclude that the original relationship remains and reject the potential intervening variable. There is no intervening variable.
2. The original relationship can disappear or become so weak that it is statistically insignificant. In that case we conclude that the control variable intervenes between the independent variable and the dependent variable. You will note that this is precisely what we saw when a relationship was spurious. The difference in our conclusion was based on theory, not statistical tests.
3. The original relationship is weakened for each value of the control variable, yet remains significant. In this case we conclude that some intervening takes place, but there is also a direct effect as well as an indirect effect (through the intervening variable). We would redraw the path diagram as follows:
gender ---------->
ideology ------------> party
-------------------------------------->
That covers the major possibilities. The main thing to remember is that you start by testing a bivariate relationship. Then you 1) use theory to identify possible confounding variables, and 2) possible conditioning variables. Test to see what impact they have. Then you go on to look at 3) possible intervening variables to get a richer theoretical understanding of how an independent variable works.
II. How to control
Put simply, a control variable lets us examine the original bivariate relationship for each value of the control variable. So for example, suppose that our bivariate relationship is between gender and vote. We might think that marital status conditions the relationship, so we want to control for marital status. What we do is look at the relationship between gender and vote for single people and then between gender and vote for married people and see how these new relationships differ from the original relationship.
The statistical procedure for controlling depends on the level of measurement of the variables involved. Suppose your bivariate table was a crosstabs and your control variable was ordinal or nominal with just a few values. Then you just set up new crosstabs, one for each value of the control variable. Then you look at the relationship in each control table and compare it to the original bivariate table, as just noted in the previous paragraph. A few examples we do in class should make this clear.
On the other hand, if the control variable is intervel or ratio level, the statistics are much more complex. At this level, I would advise that you reduce it to an ordinal variable (use MicroCase to collapse it into groups) with no more than about three groups (e.g. young, middle, and old age). Then produce your control tables, and again, compare each one to the original bivariate table.
If you had to do a test other than crosstabs for the bivariate relationship because the independent and dependent variables were ordinal with many categories or interval variables, then you simple redo whatever tests you did (scatterplot with regression or analysis of variance in most cases) for each value of the control variable. Again, you just compare each controlled relationship with the original bivariate relationship.
You will get some practice for this in exercises we do in class. And of course, I expect you to do this in your research project, using whatever variables we have measured that are theoretically relevant. Of course, were you doing a project from scratch, you would need to anticipate what control variables you would need (based on theory) before you planned the survey and measure as many of them as you could.
I have created a table to help you keep track of how to compare bivariate and control tables to see what the impact of the control variable has on the bivariate relationship. Here it is along with instructions on how to use it. We will use it in class exercises in class.
Analysis Table for Bivariate and Control Tables
|
|
Bivariate Table |
Control Table 1 |
Control Table 2 |
Control Table 3 |
|
Row 1 % point shift |
|
|
|
|
|
Row 2 % point shift |
|
|
|
|
|
Row 3 % point shift |
|
|
|
|
|
Significance (p=?) |
|
|
|
|
|
Strength (e.g. Cramer V, Tau B, Tau C, or r) |
|
|
|
|
Instructions:
1.
Using the data and stats from your bivariate and control tables, fill in the
cells in the table. Depending on the number of values that your control
variable has, you may or may not need the last column. And depending on the
number of values your dependent variable takes on, you may or may not need
the “Row 3 % point shift” row. Tables are easier to interpret if they have
no middle row. If you see no shift of any consequence in the middle row of a
bivariate table, you may be wise to just eliminate that row by making that
value of the dependent variable missing data.
2.
You can also use this table for ANOVA and Regression, but then you do not
need the “Row % point shifts) rows.
3.
The % point shifts and strengths allow you to see at a glance what impact
the control variable had on the bivariate relationship.
4.
The significance levels allow you to see if the control variable made the
bivariate relationship disappear so as to not be significant (i.e,
confounded the bivariate relationship to make it spurious), or in a very
rare case if the control variable revealed a relationship that was not
significant in the bivariate table.
Copyright, Robert E. Botsch, 2009-12