Significance Tests and Measures of Association
Reading: Corbett, Chapter 10.
Suppose you have a relationship that you have tested. It is tested in some kind of crosstab, an analysis of varience, or a regression. The purpose of this unit is to learn how to answer two questions.
1) How strong is that relationship?
2) What is the probability that this relationship is not real, that is the result of drawing a bad sample from a population in which no relationship exists?
We answer the first question by using statistics that are measures of association. Corbett will give you several measures of association. As in the past, what you use depends on the level of measurement of the variable in the hypothesis and the type of test used to view a relationship. For crosstabs with nominal measures we can use the PRE test (proportional reduction in error test). We can also use the Lambda test, or the Cramer's V test. All of these have in common that they range from 0 to 1, and the closer to 1, the stronger the relationship.
When dealing with two ordinal measures that are related in a crosstabs, the most appropriate measures of association are the gamma, Kendall tau (tau-b for square tables and tau-c for non-square tables), and Somer's D. These tests range form -1 to +1, with the sign telling the direction of the relationship. Minus means that as one increases the other decreases. Plus means that as one goes up so does the other. The closer to +1 or -1 the stronger the relationship.
When dealing with interval/ratio measures, the most frequently used measure of association is the Pearson correlation, designated as r. This also ranges from +1 to -1. It tells you the extent to which the points of the two variables form a straight line on a scatterplot. The sign gives you the direction of the line.
Signficance tests are produced for each kind of association. What they tell you is whether the relationship you found in the sample is likely to really exist in the general population or is just an accident from sampling error. In social science we want the chances of accident to be low, so we conventionally insist that the significance be .05 or lower. That means that there is at worst no more than a 5% chance that we could get a relationship like this in the sample if no relationship existed in the general population from which the sample was drawn. Just like in computing sample error, the sample must be such that every member of the population has an equal chance of being chosen. Like sample error, significance tests are very sensitive to sample size. Larger samples will are more likely to produce significant associations. Just like you need a large sample to correctly detect a winner in a close election, you need a large sample to detect a weak relationship. Using small samples, only the strongest associations will come out as significant.
The most common test for crosstabs is the chi square test. I will go over in class how to compute it by hand. But mostly in the real world we just let the computer calculate it. Other tests are too hard to compute by hand, at least in this course.
Let me just give you the formulas here for the chi square test, which is the sumover all cells of (Fe-Fo) squared divided by Fe.
where Fo are observed frequencies in the actual crosstab
Fe are the frequencies expected by chance (meaning that this is what the frequencies would be if there were no relationship between the two variables), which can be computed as follows:
After computing the chi square you look up the value in the table I will provide and see the probability of the column it falls in. The bigger the chi square, the more significant the relationship in the sample. If the chi square is at least as big as the value in the .05 (or 5%) column, then one concludes that there is less than a 5% chance that we could find such a relationship in the sample when there is no relationship in the general population. We would then reject the null hypothesis (do you remember about the null?). The row you use in the chi square table is determined by the degrees of freedom (df) with
df = (# rows-1)(# columns-1).
I know all this sounds complicated, but in practice it is really quite
simple. An example or two should sort it out for you. We will do this in
class.