Chapter 3. Scientific
Methodology and Statistics
Last updated 9-21-2011
Copyright 2009-11 Robert E. Botsch
There
was a man who drowned crossing a stream with an average depth of six inches.
Anonymous
OUTLINE
I. What we mean by
"scientific"
II. The goals of science
A. Describe
B. Explain
C. Predict
D. Relationships between explanation and prediction
III. Steps of scientific research and
underlying assumptions
A. Steps
1. Problem selection
2. Theory formulation
3. Hypothesis development
4. Operationalization
5. Data gathering
6. Analysis and hypothesis testing
7. Theory reformulation
B. Underlying assumptions
1. Human behavior occurs in regular patterns
2. Reason allows us to observe and discover these patterns
3. Patterns are governed by laws
4. Reason allows us to discover these laws
5. Scientific knowledge is "intersubjective"
6. The goal of science is to increase knowledge, not apply it
IV. Characteristics of scientific explanations
A. Conditional
B. Probabilistic
C. Partial
D. Open
V. Some key ideas in scientific
research
A. Units of analysis
B. Properties and variables
C. Possible relationships among variables
1. Causal
2. Conditioning
3. Reciprocal
4. Symmetrical
5. Spurious
6. Controlling for a third variable
D. Measurement
1. Criteria of accuracy
a. Reliability
b. Validity
2. Levels of measurement
a. Nominal or qualitative
b. Ordinal or ranked
c. Interval
VI. Statistics‑‑a few
basics
A. Definition and purpose of statistics
B. Two general types of statistics
1. Descriptive
2. Inferential
C. Examples
1. Descriptive
a. Measures of central tendency
1) Mean
2) Median
3) Mode
4) Relationship to the frequency
distribution‑‑or why you should ask for all three
b. Measures of dispersion
1) range
2) percentile
3) standard deviation
c. Measures of relationships
2. Inferential: moving from a sample to a population
a. Sampling
b. Statistics that measure
"significance"‑‑or how likely is it that my sample
statistics are close to being right
TEXT
I. What we mean by "scientific"
We have been talking a great deal about
political science in terms of its history, evolution, and kinds of theories
that make up the discipline. Exactly what makes political science a
"science?" What do we mean when we say "scientific?" From
what we have already said, you should have a pretty good idea. For example, you
already should know that the theories we labeled as "empirical" are
more scientific than "normative" theories. We can test empirical
theories based on observations. Normative theories can't ever be tested
completely. Empirical theories involve "fact" statements while
normative theories involve "value" statements. If you remember this
much, that's great. You're well on your way to understanding this material. If
not, go back and look at those ideas again.
Generally speaking, what we mean by scientific is being very precise
and self‑conscious in our methods of finding out about things. In
fact, we want to be so precise that someone else can duplicate what we have
done and see if they get the same results. This is called replication. To make our research replicable, we must describe
every detail of what we have done and how we have done it. If the research
can't be replicated so that the same conclusions can be reached, then whatever
those conclusions are (a theory, a hypothesis, a fact) cannot be considered
scientific.
II. The goals of science
The goals of all scientific research, whether in politics or in biology,
are to describe, explain, and predict. We mentioned this earlier in
discussing the definition of theory. Theories help us understand political
behavior by focusing our attention and questions on certain properties of that
behavior so that we are better able to describe, explain, and predict. In the
case of political science research and theory, our aim is to understand
political behavior.
A. Describe
This is usually the first step. We have to be able to describe what
we are observing before we can explain it or predict what will happen.
Quite frankly, this is the stage where much political science research is right
now and has been for some time. We are still struggling to identify all the
important variables in such areas as bureaucratic organizational behavior,
campaign contributions, political psychology, and pressure group organization.
In other areas such as voting behavior, we have pretty well established what
the important variables are: party identification, incumbency, name
recognition, ideological orientation, satisfaction with job performance,
personal image of the incumbent (trust, leadership, competence), expectations
for the future (is the country/state/whatever on the right track), and personal
background characteristics. Those who study voting behavior now spend most of their
time combining all these variables in increasingly sophisticated mathematical
models and fine tuning their measurement methods.
B. Explain
Once we figure out what we are describing and the terms, concepts, and
variables that describe this phenomenon, we begin to see relationships
among these things. Marx sought to explain why one stage of history followed
another after he developed the concept of dialectical materialism. Systems
theorists first developed the concepts of input, output, and feedback before
they could talk about the relationship between system persistence and feedback
and support. Explanations are more than just associations between things. They
are the LOGIC and REASONING behind the associations. They answer the question of
WHY something happened. The content of an explanation, (that is, its logic and
reasoning, concepts, and variables) is another way of thinking about theory.
C. Predict
Once you can explain, you are well on your way to prediction. If you
know that event B followed event A in the past and you have a logical
explanation linking the two, then you can predict B whenever you observe A is
taking place. However, prediction is always more risky because situations are
rarely precisely the same as they ever were in the past. That is to say, we may
once again observe something that looks like A, but it is usually not quite
like A. In complex human relations, just knowing about the past changes the
present. Therefore, the conditions that define reality are constantly
changing--we are shooting at a moving target. Hopefully, the change is slight
enough so that we can still make reasonably correct predictions.
For example, we know that the powers of incumbency pose great difficulty
to anyone who wishes to deny an incumbent her or his party's re-nomination‑‑if
the incumbent desires re-nomination. So if you have event A, that is, an
incumbent that desires re-nomination, you can usually predict that she or he
will win it. Such predictions held for President Gerald Ford in 1976, President
Jimmy Carter in 1980, George H.W. Bush in 1992, Bill Clinton in 1996, and
George W. Bush in 2004. All except Clinton and George W. had serious opposition
from within their own political party (Ronald Reagan and Ted Kennedy and Pat
Buchanan respectively). (Ironically, the Democrats defeat in the 1994
congressional elections helped
The way the media describes presidential elections is another good
example. Elections are frequently seen in terms of their similarity to past
elections. Yet no two elections are ever exactly the same. Was the election of
1992 (incumbent Bush versus
However, trying to predict which explanation will fit for an upcoming election
is much harder. After the Gulf War of 1990, who would have even predicted any
explanation that pointed toward George H.W. Bush's defeat? Those who predicted
in 1995 that an unpopular Bill Clinton would follow George Bush's steps as a
one-term president wished to take their prediction back a few months later.
D. Relationships between
explanation and prediction
From the above discussion, you should have noted that there is a time
relationship between explanation and prediction. Explanation is post
hoc, that is, after the fact, while prediction takes place before the fact.
You may also have noted that our ability to predict seems to depend
on our ability to explain. Although that is usually correct, it's not quite
all that simple. In fact, sometimes we can predict when we can't explain. It's
also possible that we can sometimes explain and cannot (or will not) use that
explanation to predict. Let's take these two special cases and look at each
separately.
1. Predictions that are not explained.
Sometimes we know the conditions that precede an event, but don't
know what the logical connections are. To put this another way, we
sometimes have a well-established relationship without any theory. We can't
really explain the relationship.
Natural science provides many examples of this, where certain situations
are strongly associated with particular results, but WHY is not clearly known.
For example, byssinosis or "brown lung" is a lung disorder that is
strongly associated with exposure to cotton dust. The relationship is so strong
that the government has issued regulations regarding cotton dust in textile
plants. However, we do not know for sure what substance in the cotton dust
causes the disorder. Nor do we know exactly how and why cells in the lungs react
as they do to that substance.
In political science we try to avoid this problem. We insist that
researchers only look at relationships that are suggested by some theory or explanation.
If we happen upon a relationship that lacks a theory (like the relationship
between the league of the team that wins the professional baseball World Series
and which party wins the presidential election in that year), we don't take it
seriously. We refuse to make serious predictions. Doing so would be criticized
as "unprofessional"--a negative label.
2. Explanations that cannot (or will not) be used for prediction.
This situation arises when we know the relationship between certain
conditions and an event, but we are either unable or are unwilling to measure
these conditions or make the prediction.
For example, many theories explain the outbreak of wars. Some of them
focus on the mental conditions and outlooks of political leaders. But, for
obvious reasons, leaders are unwilling to let us find out what is going on
inside of their heads. Moreover, even if they were willing, we don't have the
time or ability to monitor this kind of thing on an ongoing basis. So political
reality and resources limit us here.
We may also be limited for ethical or normative reasons. For
example, we have many theories that explain criminal behavior in terms of
background and attitudes. Researchers could develop tests that could identify
those who are likely to engage in violent criminal acts with a fairly high
degree of accuracy. Those tests could be used to identify these people at a
fairly early age. Then they could be given counseling or perhaps even isolated from society. Would you be willing
to do this‑‑even if we could be, say, 90% accurate? Probably not,
and the reason is that it violates very strong and basic values of justice. We
believe that people should not be singled out and "punished" for
something that they have not in fact actually done. This would go beyond even
"thought crimes." This would create a new category that could be
called "predisposition crimes." How accurate would such a test have
to be before you would be willing to condone its use? That's an interesting
question to think about. Even if it were 100% accurate, would you be willing to
create a set of laws that punish people for "criminally predisposed
personalities?" This is what the 2002 science fiction Tom Cruise movie "Minority
Report" was about.
Modern medicine allows us to test the genetic predisposition people have
for many diseases today. In some cases, insurance companies have refused to
give coverage to people having these predispositions. From the insurance
company's point of view, the refusal keeps costs down for them and others who
pay premiums. But the refusal strikes many others as extremely callous. The
company seems to care only about profits—that is the nature of corporations. Should
insurance only be for the healthy? Genetic testing could make that possible for
many costly diseases. This is why insurance reform is one of the most popular
parts of many proposed health care reforms.
III. Steps of scientific research and underlying assumptions
All scientific research generally follows a step by step process. We
might break down the process in many ways. Many researchers will combine some
of these steps or will break them down even further. What I want to do here is
give you a general idea, with examples, of the major ideas in these steps.
Second, I want to discuss two kinds of assumptions involved in doing
scientific research. First, we must assume some things to be true if the steps
of the research process are to make any sense. They are logically necessary.
Second, scientific researchers assume some additional things that go beyond
necessary logical assumptions. These additional assumptions take what we have
called normative stances‑‑value positions about what scientists
SHOULD do (remember?).
A. Steps
1. Problem selection
We have already talked about problem selection in the sense that this involves
taking a value position. In selecting a problem, you are saying what you
think is important for you to do. You cannot avoid this step. Even if you
abdicate the decision and do anything others are willing to pay you to do, you
are saying that money is important.
2. Theory formulation
Here you find out what is already known about the problem you
have selected. Generally you do what is called a "literature search."
Why should you spend a great deal of time rediscovering what is already known?
For example, if you are interested in why more Americans don't take the
time and trouble to vote, you would begin by looking at voting behavior
literature to find out about models of voting behavior. One interesting model
(the economic theory of voting) was
developed by an economist named Anthony Downs. He argued that given the time
and trouble and limited expected payoff for voting, voting is irrational. That
is, the likelihood of your vote being the deciding vote is practically zero.
And even if your vote was the deciding vote, it would do you no good. The
elected official would have no way of knowing that it was your vote.
Consequently, voting has no expected payoff, and therefore, most people act
irrationally in voting in most elections. This theory suggests that the proper
question is not why people DON'T vote, but why DO people vote at all!
3. Hypothesis development
Once you have examined existing theories, you then develop hypotheses
that are suggested by the theories. You can design these hypotheses to
accomplish one or more of several goals. They might be used to test
the truth of the theory, especially if you don't believe the theory will
hold or if you think that previous tests were poorly done. Hypotheses might be
designed to refine the theory by showing where it holds and where it
does not hold. Hypotheses might expand the theory to see if the theory
holds for some new class of political behavior to which it has never been
applied. In any case, the hypothesis should be some specific statement about
a relationship that logically flows from the theory that can be tested by
empirical means.
For example, suppose you think that
4. Operationalization
Once you have a hypothesis, you have to turn it into something you can
actually measure--something you can observe. The process of taking variables
and describing how measurements are made of these variables is called operationalization.
For example, looking at our hypothesis about civic clubs and voting, you
could perform a survey in which you ask potentially eligible voters the number
of civic clubs to which they belong. Then you ask them whether or not they
voted in the last election. What you have done here is change the abstract
concepts of socialization and voting participation into concrete measures, in
this case two concrete questions.
5. Data gathering
This is where you actually go out and gather the data you need to
test your hypothesis. You may select a sample and perform a survey. You may
do in‑depth interviews with political elites. You may look up facts about
governments or public officials, e.g. voting records of congresspersons. What
you do depends on the hypothesis to be tested and the kinds of things to which
it applies.
For example, another way to test your hypothesis on political
socialization and voting participation is to compare nations with respect to
civic memberships and voting. You would have to compare nations with similar
types of government. It wouldn't make sense to compare a republic with a
totalitarian state because this hypothesis assumes that voting is a matter of
choice. Many totalitarian governments require citizens to vote as a sign of
support for the regime.
6. Analysis and hypothesis testing
Here is where you arrange your observations in such a way that allows
you to see if the data supports the hypothesis or not. You have to code
the data (we'll explain this later) and then display it in such a way that the
hypothesis can be tested. Usually some tables are used along with some statistics.
We'll do some of this as an exercise in the next chapter.
7. Theory reformulation
Once you have found out whether or not your observations support your
original hypothesis, you must then think about what impact this finding has
on existing theory. In doing so, you contribute to the building and
refinement of scientific knowledge.
For example, if you find no relationship between civic club memberships
and likelihood of voting, you would conclude that either socialization theory
does not apply to voting behavior or that civic clubs do not effectively engage
in political socialization. These conclusions suggest other new problems that
can then become subjects for research. How do civic clubs deal with values of
political obligation? If they do attempt to promote the value of voting as a
civic duty, why is the effort ineffective? Maybe only some kinds of civic
organizations engage in political socialization. And so on. I hope you get the
idea. By the way, the hypothesis is true. Membership is associated with higher
rates of voting.
B. Underlying assumptions
Logically speaking, we must assume a number of things to be true in
order carry out the process of research.
1. Human behavior occurs in regular patterns
If people always acted in random ways, then observing their behaviors in
politics or in any other realm of life wouldn't tell us anything about them.
Some pattern must exist for us to observe if we are to learn anything.
2. Reason allows us to observe and discover these patterns
This involves having faith in our ability to figure out what these
patterns are. If we don't think we are smart enough, then why even try?
3. Patterns are governed by laws
We also assume that underlying causes create these patterns‑‑that
one behavior is stimulated by some other behavior or condition. If causes did
not exist, then we cannot explain or predict, which are two-thirds of the basic
goals of science.
4. Reason allows us to discover these laws
This is similar to the second assumption. If we don't have faith in our
ability to figure out what these causal laws are, then not much point exists in
even going this far.
5. Scientific knowledge is "intersubjective"
This is a matter of definition. Being intersubjective is what separates
scientific knowledge from other kinds of knowledge, like "faith"
knowledge, or intuition. It is a definition that most people who are called
scientists agree to use. However, equating science with intersubjective
knowledge does have some normative overtones. Use of the term
"scientifically based knowledge" tends to relegate all other kinds of
knowledge in an inferior position. This has political value in that many people
can be intimidated into believing something if it is called a "scientific
fact" by a generally recognized scientific source. But I haven't told you
what intersubjective actually means yet. "Intersubjective" means doing
research and explaining it in such a way that it can be repeated or replicated by another
person who then should be able to make the same kinds of observations and draw
the same conclusions. This requires great precision in explaining how
reaseach is performed. So you see, I hope, that "intersubjective,"
or "between people" refers to agreement between people as to what
happened. As you should know, gaining agreement on what different people
observe is not always a simple matter.
6. The goal of science is to increase knowledge, not apply it
This is the second kind of assumption, one that is not logically
necessary. It has a great deal of normative content. It was the position taken
originally by the behavioralists who argued that the job of political
scientists was to increase rather than to apply knowledge. As you already know,
later political scientists rejected this notion as naive because the mere
existence of knowledge often results in its application.
Today, most political scientists would agree that this assumption is
naive. They would not accept it as an iron-clad rule for scientific research.
However, they would caution young activist political scientists that the more
outspoken and controversial they become, the less likely the public is to
continue to accept their findings as "scientific." That's another
hypothesis we could test!
IV. Characteristics of scientific
explanations
A. Conditional
Scientific explanations are rarely true under all circumstances.
Therefore, one of the important jobs of scientific research is to determine
under what circumstances an explanation does hold true. As we learn more about
when an explanation works and when it does not work, the specification of what
these conditions are becomes part of the explanation.
Example: party identification and
voting
For a long time political scientists have found that the best single
predictor of a person's vote is party identification. However, we have also found that this simple
explanation holds more in some circumstances than in others. The more a person
knows about the individual candidates (or thinks she knows) and the stronger
she feels about specific issues on which the candidates have differences, the
less power that party identification has as an explanation. On the other hand,
when a person is indifferent or has likes and dislikes that balance out between
two candidates, then party identification becomes the dominant factor. So the
power of party identification as a predictive variable depends on other
conditions, such as issue knowledge.
B. Probabilistic
The opposite of probabilistic is deterministic. Our explanations have
much error in them. We cannot determine exactly what people will do all the
time nor can we ever fully understand why they act the way they do. Why?
This limitation is due to several factors. First, our methods and measurements
are terribly imprecise. We do not have the tools to measure things like party
loyalty or political trust and legitimacy nearly as precisely as a physicist
can measure speed or mass. Second, an infinity of conditions affect human
behavior. We cannot account for all of them. Third, human behavior probably
differs from the behavior of physical objects in qualitative ways. If we have
that quality known as "free will," the ability to sometimes freely
choose despite all the social forces around us, then no matter how good our
tools became, we could never predict exactly what people will do.
Do these three factors render a science of politics impossible? The
answer is "no." But we must account for these factors in how we
present our findings. We must use such terms as "likely to,"
"tend to," or "probably" rather than "shall" or
"will."
Example: Incumbency as an explanation of electoral success
One of the most powerful explanations of electoral success is whether or
not the candidate is an incumbent or a challenger (i.e. facing an incumbent).
This explanation takes into account all of the powers of an incumbent: things
like name recognition, ability to raise campaign funds, engage in constituent
services, and perform and act statesman‑like rather than just make vague
promises. Therefore, incumbents are likely to win reelection, but
victory is far from certain. Occasionally, incumbents do stupid things and
challengers do brilliant things. Why? Free will or "fortuna"
(remember Machiavelli?) are possible explanations. Perhaps we simply don't know
the right things to measure yet in predicting when incumbents will lose. In any
case, all we can say is incumbents are "likely" to win, even in years
like 1994 or 2006, 2008, and 2010 when so many citizens were dissatisfied with the performance
of their government. Nevertheless, most all congressional incumbents who ran for
reelection did win in both 1994 and 2006 and even 2010. The Republican landslide in 1994 was
in open seats and in a few seats where Democratic incumbents lost (no
Republican incumbents lost). 2010 was similar. And in 2006 and 2008 it ran the other way with most losses
being on the Republican side. All of that is a bit unusual, but the theory that
incumbency explains electoral success held true.
C. Partial
Our explanations are partial, or not complete. They are partial as a
direct result of their being conditional. We don't have the time or expertise
to specify all possible conditions that affect the explanation. Hopefully, the
explanations are becoming more complete as research continues, but science
moves slowly. Complete explanations will certainly not be found in my lifetime.
Example: Explanations of voting behavior
Even the best scientific explanations of how people vote leave 5 to 10
percent of the vote unexplained (in statistical terminology, this is called
"unexplained variance").
That is to say, we don't really know why these people voted the way they did.
Over the years, we have developed better explanations that include more
conditions that have reduced the percentage that is unexplained, but a great
deal is still unexplained. However, this unexplained variance is usually enough
to determine the outcome in most elections. Of course, some of this may never
be explained because of the nature of human behavior (back to free will). Sometimes
people mismark their ballots or vote randomly because they know so little about
any candidates—these few voters can never be predicted. In any case, virtually
all of our explanations are still partial.
D. Open
Because of the facts that our explanations are partial, probabilistic,
and conditional, they must also be subject to change as we learn more.
We may learn new conditions. We may develop new measures or concepts (like
Marx's work alienation or class consciousness) that improve old explanations.
We may replicate work of others and find that mistakes were made in any of the
steps of the scientific process. Therefore, we say that all of our theories are
open to change. If they were not open, they would not be scientific
explanations. That is not to say that they are wrong. They simply could not be
fully tested by the methods of science.
Counter‑example:
"Scientific Creationism"
The political, legal, social, religious, and educational controversy
about the creation of the universe and human life can be boiled down to the
question of the openness of the explanation. Those who advocate that the
Biblical version of creation be given equal time in public schools to the
explanation of evolution have argued that their version has scientific support
and is therefore just as viable as evolution. That depends on exactly how one
defines science. If we say that science means looking for facts that are
logically consistent with some faith‑based assumption, then the
creationists have a strong argument. Creationists talk about some "Great
Mover" or "Force" that set everything in motion. They use these
terms, rather than "God," in order to get around the charge that they
are supporting religion in public schools.
However, if by science we mean that our only assumptions are that we
will establish fact through observational means and the use of logic and that
the resulting explanations will be open and subject to change, then the
creationists can no longer be seen as dealing in the realm of science. Their
initial assumption is a matter of faith, not observation, and it is NOT open or
subject to change. The countercharge by creationists that traditional science
is also based on values is in a strict sense correct -- it is based on the
value of believing that scientific facts rest on observation and that all
theories are open to change. I see this as simply a matter of definition, but
one could argue that it is also a value.
By theological standards, the creationists (who are now calling
themselves supporters of "intelligent design," since they lost the political
battle over including creationism in public schools) may be right, but
scientific standards have no way of telling. And of course it could well be
that there is an all-powerful God behind all these scientific causes – as the
Catholic Church has said in the past, God is the cause of all causes. But that last
step is religion, not science. This has been the basic finding of several court
decisions that have rejected giving "scientific creationism" the
scientific status that evolution has. Creationists who argue that evolutionary
explanations do not explain all that exists, or that evolutionary theory has
constantly had to be modified have in reality only supported the theory as a
scientific theory. And in the strict sense evolution is only a theory -- not a
fact. But it is a theory that has a heck of a lot of supporting evidence, so
most scientists accept it as a fact, albeit one that has details that are
constantly under revision. But that is exactly what science does! Scientific
explanations are partial and are open--BY DEFINITION!!
V. Some key ideas in scientific
research
A. Units of analysis
This simply refers to whatever the individual "things" are
that we are studying. In political science the "things" are usually people.
However, we might be studying constitutions, as Aristotle did. Or we might be
studying and classifying elections, as was done in critical election theory
(remember?). Or we might be looking at nations as does balance of power theory.
B. Properties and variables
Whatever kind of units we are looking at, we observe them in order to
take measurements of some property that each unit has. Elections have winners,
parties, and money spent. Constitutions have powers assigned, prohibitions, and
rights distributed. Nations have resources, powers, demographic (that means
characteristics of the population like age, race, religion, wealth, and so on),
economic, and geographic properties. In most cases, these properties vary. By
definition, properties that vary are called variables. If a property
does not vary, we then call it a constant.
C. Possible relationships among variables
Variables can be related to each other in many ways. Obviously, when
more than a few variables are involved, the patterns of relationship can become
quite complex. What I want to do here is talk about some of the most simple
relationships involving only 2 or 3 variables.
1. Causal
This is what we would always
like to find, some variable that causes change in some other variable as it
changes in some specified way. If the change in the first variable is enough
by itself to cause the change in the second variable, we can say that this
variable is sufficient to cause a change in the second variable. One thing that
social scientists look for as evidence of a causal relationship is the time
relationship between the changes that occur in the variables. If the change
in the second one consistently follows the change in the first, you have pretty
good evidence for a causal relationship.
A useful way to talk about
these relationships is to use diagrams, sometimes called "arrow
diagrams," or "path diagrams." A causal
relationship where a change in variable A results in a change in variable B is
shown below.
A
‑‑‑‑‑‑‑‑‑>
B
We use some special names
for the roles each of these variables play in this relationship. A is called
the independent variable, and B is called the dependent variable.
One way to think about and remember this is to say that the dependent
variable depends on the independent variable in the relationship. Look at
the arrow diagram and say this a couple of times.
Many examples of this simple
yet most important relationship exist. Let's look at one that is of political
significance in the South and in
Birth Weight ‑‑‑‑‑‑‑‑‑‑‑>
Infant Mortality
Let's make the picture a
little more complicated and add a third variable that also has a causal
relationship with the other two. Suppose the arrow diagram looks as
follows:
A ‑‑‑‑‑‑‑‑‑> B ‑‑‑‑‑‑‑‑‑>
C
Now we have two causal
relationships, one between A and B, and a second one between B and C. In a
sense, there is also a third one between A and C, but it is mediated by
variable B. In this case, variable B is called an intervening variable.
In order to specify the role that each variable plays, we must talk about the
role it plays in relation to some other variable or variables. For example, A
is an independent variable with respect to B, and B is an independent variable
with respect to C, but with respect to A and C both, B is an intervening
variable. Got it?
Let's apply this terminology
to our infant mortality example and expand it a bit. So low birth weight causes
high infant mortality‑‑so what? If we think about what causes low
birth weight, we begin to see the public policy implications. One of the
principal causes of low birth weight is poor nutrition, and one of the
principal causes of poor nutrition is poverty. I've added two intervening
variables and now you see the southern connection. Here's what we have in arrow
diagram form.
poverty ‑‑> nutrition ‑‑> birth weight ‑‑>
infant mortality
A good exercise at this
point would be for you to describe the roles that each of these variables play
with respect to the other variables. Try it!
Although we may say a
relationship is causal, in reality what usually happens in the complex world of
human behavior is much weaker than that. The independent variable usually
does not cause a change in the dependent variable, but merely makes some change
more likely to happen. We might say that many of our causal relationships are
really about variables that make something more likely to happen, but
are neither necessary nor sufficient in making it happen. For example, low
education contributes to poverty by making poverty more likely. But low
education is neither necessary nor sufficient in creating poverty. And, until
we get to a certain level of really low poverty, poverty only contributes to
poor nutrition. In political science we are almost always really talking about
how much some variable A contributes to some change in variable B. Rarely does
a change in A always cause B to change in a specified way.
2. Conditioning
Now we will start making
things a little more complicated. Sometimes a third variable weakens
or strengthens a relationship. Then we say that a variable plays a conditioning
role in the relationship, or to put it another way, the relationship only
exists (or is more likely to exist) under certain conditions.
To use our example about infant mortality, we might say that knowledge about
good nutrition will not by itself cause a person to have good nutrition.
Nutritional knowledge is more likely to lead to good nutrition under the
condition of financial means. So financial means conditions the relationship
between nutritional knowledge and the actual practice of good nutrition.
Knowing the conditioning variables is
politically important. How so? If we want to do something about infant
mortality, for example, we might want to create programs and/or policies that
create the right conditions for decreasing infant mortality or increasing good
nutritional practices, which we know helps reduce infant mortality.
In path diagram form, we
have a relationship between A and B that is affected, or conditioned, by the
value of a third variable C. So in terms of our example, A is nutritional
knowledge and B is the practice of nutrition, and C, the conditioning variable,
is financial means. You must have the money to buy and consume the things
you know are good for you. And just in case you did not know this, it turns out
that better food generally costs more, so knowledge by itself is not really
enough!
A --------------------------------> B
/\
|
|
|
C
3. Reciprocal
An easy way to think about a
reciprocal relationship is to think of it as a two-directional causal
relationship. Each variable simultaneously plays the role of both
independent and dependent variable. Each reinforces the other. Unlike a simple
causal relationship, no clear indication tells us which variable changes first
in time. Either variable could change first, or the changes may be so close in
time that telling which came first is impossible to tell. You might think of
this as a kind of "chicken and egg" situation. A reciprocal
relationship is shown in an arrow diagram as follows.
A <‑‑‑‑‑‑‑‑> B
Although making the example
of infant mortality illustrate this kind of relationship is a bit more
difficult, we can make it fit if we stretch things a bit. Low concern over
nutrition tends to reduce nutrition. However, we might also make an equally
strong argument that poor nutrition leads one to be less concerned about good
nutrition. To the extent that poor general health causes low motivation in all
areas of life, poor nutrition may cause low nutritional motivation. Which comes
first? Logically, either could happen first.
4. Symmetrical
This is a rather simple
situation where one variable simultaneously causes changes in two other
variables. Or you might say that we have one independent variable and two
dependent variables. The arrow diagram is as follows.
|
|
The
significance of this simple relationship is in our next type of relationship,
or to be more precise, "non-relationship."
5. Spurious
This is an apparent, yet
false causal relationship that is the result of some unknown third variable
having a symmetrical relationship with the first two variables. Therefore,
a spurious relationship is an untrue relationship. This could be
diagramed in an arrow diagram as shown below.
|
|
This poses a great problem
for researchers. Every causal relationship is potentially spurious. We
never know for sure until we have tested all possible third variables.
The process of doing this is called controlling for third
variables. The fact that we can never control for all third variables
along with the fact that we may someday find some third variable that
renders what we thought to be a causal relationship to be spurious are additional
reasons why scientific explanations are open and always subject to
change.
We need to add some other
terminology here that is often used. When a third variable C causes a bivariate
causal relationship between A and B to disappear, that is, renders it spurious,
we say that the third variable has a confounding effect or is a
confounding variable.
For example, medical
researchers noticed that people with very low levels of cholesterol have
higher death rates. They wondered what was going on, because low
cholesterol is supposed to be good.
cholesterol (low) --------------> death rate (high)
As good researchers, they
looked at things that could cause this relationship to be spurious. Pretty
quickly they found some good candidates for that third variable: smoking and
alcohol. High levels of alcohol consumption and heavy smoking depress the
appetite and cause cholesterol to be low. Simultaneously, these activities
contribute to higher death rates.
|
|
6. Controlling
for a Third Variable: How to do It
When we are trying to see if
a relationship is spurious or if a third variable conditions a relationship, we
control for whatever third variables we think might be related to both the
independent and dependent variables. We do this by reexamining the relationship
for each value of the control variable. Using the example above about
cholesterol and death rates, we would control for smoking by looking at the
relationship for smokers and then look again for non-smokers. If the
relationship is spurious, then the relationship between cholesterol and death
rate would disappear when looking at each group alone. If conditioning were
taking place, we would see a different relationship between cholesterol and
smoking for smokers than for non-smokers (which is not the case).
It turns out that the test
for spuriousness is the very same test for seeing if a third variable
intervenes between an independent and dependent variable. When you control for
the possible intervening variable, the relationship also disappears. So
the statistics cannot tell us whether a relationship is about spuriousness or
involves an intervening variable. We can only tell the difference from theory,
from whether we have reasons for the third variable to play a role between the
independent and dependent or whether it should have an effect on both.
Let's look back at our
example of infant mortality. Here is where it gets complicated and highly
controversial. Researchers have long noted a relationship between race and
infant mortality. Black mothers are more likely to have underweight infants
than are white mothers. Most observers regarded this apparent relationship
between race and birth weight as caused by poverty, where poverty plays in
intervening role and wipes out any direct relationship. Here is what that
relationship would look like in path diagram form. You will note that I drew
this path diagram so that it looks like a spurious relationship except for one
thing. Can you see what it is? The arrow between race and poverty is in the
other direction, so it is really the diagram for an intervening relationship.
Logically, it could not be in the other direction unless somehow a change in
poverty could cause a change in race! But if poverty does really intervene and
the relationship between face and birth weight does disappear when we look at
similar economic groups, we can still conclude that race has nothing to do with
birth weight directly.

Well that is what we
thought would happen. But it did not exactly turn out that way!
A number of researchers have
argued that the relationship between race and birth weight is real. They have
argued that some unknown genetic difference between blacks and whites makes
blacks more likely to have offspring with lower birth weights even after you
account for the impact of poverty. They have looked at the relationship between
race and birth weight while controlling for a number of third variables that
approximate poverty. For example, they have found that white mothers with low
education have larger infants than black mothers with low education. At the
other extreme, they have found that black mothers with high education have
smaller infants than white mothers with high education. Other researchers have
compared infant mortality rates of blacks in
Now this is highly
controversial for a number of reasons. The first is methodological. Critics of
this research argue that insufficient data on mothers are available to fully
measure the concept of poverty. Single variables like education only partially
capture the concept. To use the terminology we have been using, they are saying
that this new research is faulty at the "operationalization" stage,
that the measures used are not valid.
Recent research suggests that the
explanation may lie in the time between births, another variable. If for
cultural reasons blacks have births closer together than other racial groups, that
could be the intervening variable. The answer is still not certain--science
moves on. Until better data are available on mothers, we have no good reason to
think racial genetic differences exist.
The second reason that this
is controversial is that it has enormous public policy consequences. If science
tells policy makers that they can do little about infant mortality, then the
research may provide policy makers with an excuse to reduce their efforts to
combat infant mortality. Policy makers may conclude that health and educational
programs for pregnant women are a waste of valuable resources. Again, we see
that research has great political implications, regardless of the avowed
neutrality of those doing the research.
The third reason for controversy
is that even if in fact real racial differences exist, someone (probably many
someones) will put value connotations on the results. Whites who are racists
will presume that this is proof of what they have been sure of all along:
intrinsic white superiority. Some blacks will react to these outrageous and
unwarranted conclusions and argue that the research is a sign of white racism
in the scientific establishment.
D. Measurement
Measuring the variables in our hypotheses, as you can see in the
preceding example, is a very important part of the research process. If done
poorly, the results may not test what you intended to test and may even do
grave harm to someone. In this brief section, I want to talk about two factors
in measurement, accuracy and the precision (or levels) of the measurement.
1. Criteria of accuracy
Whenever we make a measurement, we worry about two criteria or standards
of accuracy. We must meet both tests in order to have any certainty about our
results.
a. Reliability
By reliability we simply mean consistency of results. A method of
measuring is considered to be reliable if different people applying it to the
same unit would consistently reach the same conclusion.
This may be easiest to understand by using an example of an unreliable
measurement. Suppose I gave you an elastic ruler and told you to measure the
length of this page. I would certainly get a variety of answers from different
members of the class. I'd find little consistency. Why? Two problems would
create inconsistency in your measurements. First, because the ruler is elastic,
it will stretch or contract in measuring any dimension. I could have helped you
here by telling you to lay it loosely without stretching or contracting.
Second, I didn't describe to you exactly what I meant by the
"length." Is it the longer
dimension of a single page or the shorter one? You can't tell if I don't
describe to you exactly what I mean.
Public opinion is one of the most problematic areas of political science
research in terms of reliability. Questions that are unclear or have multiple
interpretations cause reliability problems. Interviewers who influence the
people they are interviewing by appearance, tone of voice, or both cause reliability
problems. Interviewers who must interpret long and complex answers to
"open ended" questions cause reliability problems. "Closed
ended" type questions are much more reliable. In "closed ended"
questions, all the interviewer has to do is check a box corresponding to one of
several fixed answers from which the person being interviewed chooses.
b. Validity
By validity, we mean that the measurement actually measures what it
is supposed to measure as opposed to measuring something else. For example,
if we were to measure weight by using a ruler, we would have an invalid measure
of weight. As you can see, in the physical sciences validity is usually pretty
obvious.
However, in the social sciences the question of measurement validity is
a whole lot more subtle. In addition, as are so many other things in the social
sciences, validity can be politically controversial. The first thing to realize
is that if a measure is not reliable, it cannot be considered valid. If a measurement is unreliable, you don't
know what you are measuring, so how can it be valid (except by accident).
However, it does not work the other way. You can consistently (reliably)
measure the wrong thing (make invalid measures).
After we take care of reliability problems, validity gets a whole lot
more complicated, because you can blow the measurement in an infinite number of
ways. For example, in the example we used above on infant mortality, the
controversy about the role of race rests on questions of the validity of using
measures like education to indirectly measure poverty and nutrition. As a rule
of thumb, you are more likely to be valid when you measure things as
directly as you can rather than using indirect measures.
A second example could be measuring racism or sexism by asking someone
whether or not they agree or disagree with negative stereotypes (e.g. men have
a hard time making up their minds on even the most simple matters or that men
are too stubborn to stop and ask directions). The problem with stereotype
measures is that they measure education as much as belief in stereotypes. An
educated person knows that disagreeing with such stereotypes is a sign of
education, regardless of how one really feels.
Political controversy enters the picture when these measures are used to
justify treating people differently. Every qualification test from college
boards to promotion exams for fire fighters and police involves questions that
presumably validly measure a person's relative ability to be a success in the
position for which she is applying. Are the tests valid measures of how well
you will succeed? When you say that the SAT was not fair, what you probably
meant was that it was not a valid test of your ability to succeed in college.
These are the kind of charges of which court cases are made.
2. Levels of measurement
Not only do we want to measure accurately, we also want to measure as
precisely as we can. We want to do better than the stereotypical ancient
peoples who accurately but imprecisely called anything above about a dozen
"many."
a. Nominal or qualitative
Sometimes the best we can do is to distinguish among different qualities
or categories. In doing so, we are not measuring more or less of any quality.
Because we are not measuring more or less of anything, the categories can be
listed in any order. For example measuring party identification can be purely
qualitative: Republican, Democrat, or no identification. A second example would
be voting choice: Bush, Clinton, Perot or other. Order does not matter, so we
could just as easily list them as Clinton, Perot, and Bush. Other examples
should be easy for you to think of: race, gender, voter registration status,
and so on.
b. Ordinal or ranked
The next level is when we are measuring more or less of some quality
so that order does matter, but we are not measuring exact amounts so that we do
not know the precise difference between the categories. This means that we
cannot perform routine mathematical operations on the measurements like
addition and subtraction, and we are not dealing in units like dollars, votes,
years, and so on.
For example, we can take the measurement of party identification that we
had above and turn it into an ordinal measurement by adding the strength as
well as direction: strong Democrat, moderate Democrat, weak Democrat, no
identification, weak Republican, moderate Republican, and strong Republican.
Here the order becomes important, although we could quibble about exactly where
the category of "no identification" belongs. Even though we are
measuring more or less of the quality of "Democraticness" (and "Republicanness"),
we do not know the distance between a strong and moderate identifier. It may be
more or less than the distance between moderate and weak identifier‑‑we
simply have no way of knowing.
Another frequent type of example is the question that asks someone to
say how strongly they agree or disagree with some statement. Again, you know
the order as signifying more or less agreement (or disagreement), but you do
not know precise distances between the categories.
c. Interval
We have interval measurement when we are measuring precise amounts of
some property. When you have units of measurement involved (e.g. dollars,
years, and so on), you can bet that you have interval level measurement. (Note:
this is sometimes called “ratio” data. The only difference is that ratio data
has a true zero – that is having zero means having none of whatever it is you
are measuring. Except for temperature, where zero still has some degree of
warmth, almost anything else is really ratio. But we will keep it simple and
just call it all interval.) Having precise amounts is a desirable thing to
have, because you can then do things like add, subtract, and multiply and get
meaningful results. To put it another way, we can use more powerful
statistical tools when we have interval level measurement.
We can't use the example of party identification here because ordinal is
as precise as we are able to get on party identification. (Maybe someday
someone will come up with some kind of psychological units that can be
applied). But we can count votes, get family or hourly income in dollars (be
careful here, because income is often grouped‑‑like $10,000 to
$20,000 a year‑‑and then it becomes ordinal rather than interval),
get a precise age in years (if you think about it, age in years is also really
grouped data, but we'll not get too nit picking), or how many times one has
voted in the past four general elections. In all of these cases, we have
interval measurements.
VI. Statistics ‑‑ a few
basics
Statistics often intimidate people. Having struggled myself for a number
of years with statistics and having taught it to others, I am convinced that
this intimidation results from not knowing what one is trying to accomplish
with statistics. Students learn formulas and apply them to numbers without
really understanding WHY they are doing this in the first place. They get lost
in the trees and lose sight of the forest.
We're not going to go into much detail here (only a few trees), but you
are going to be faced with statistics all your life, so you should understand
what their purpose is. Statistics are also an important part of political
science. My purpose here is to give you an overview of this forest.
Cynics often said that you can make statistics say anything you want
them to. People say that statistics lie. In fact, statistics don't lie,
people do. People make unreliable and invalid measures and then produce
inappropriate statistics and present them to someone who doesn't know the right
questions to ask. I hope I can suggest a few of those right questions for you
as well as help you understand the purpose for using these complicated things
in the first place. If you ever take a statistics course (as I hope you will),
you should ask yourself "why am I using this statistic, where am I going
with this, what am I trying to show or find out" on at least a daily basis
(every five minutes would be better)!
A. Definition and purpose of statistics
Statistics are numbers that summarize some quality or characteristic
of data. Why do we want to do that? The answer is to make things more
simple, not complex, as unfortunately often seems the case. We put data (at all
levels of measurement) into numerical form in order to condense, summarize,
interpret, and analyze when we have to make decisions.
Let me illustrate with a simple example. I have asked you a lot of
questions in the course of the semester and I've kept records in many cases of
whether or not your answers were right or wrong--grades. Consider your grades
to be data. I want to somehow condense all of these data into one single
measure so that how much you learned can be quickly summarized in a condensed
form on a record (transcript) that future prospective employers can use to
evaluate you. I use statistics to do that. You get a grade on each test (a
statistic), grades are averaged together (another statistic), translated into a
letter grade (another statistic), and then letter grades are averaged together
in a weighted way (according to the number of hours per course) to produce a
grade point average (another statistic). Now all of this is so familiar to you
that you take it for granted. Nevertheless, it is an excellent example of
statistics that condense and summarize human behavior.
B. Two general types of statistics
1. Descriptive
Descriptive statistics are numbers that summarize some quality or
aspect of data. The key word here is summarize. The idea is to
simplify so that one number can be used to convey a lot of meaning about the
data. Your test average summarizes a lot of meaning about your performance on
all tests.
2. Inferential
Inferential statistics are measures that go beyond description and
allow us to infer something beyond the data. In inferential statistics, we
go from a few particular cases to larger more general conclusions.
All statistics about populations based on surveys involve inferential
statistics. Anytime you take a statistic for a sample, like the average income
of a sample of 1000 Americans, and infer the average income for all Americans
from that statistic, the inferred average is called an inferential statistic.
Therefore, descriptive statistics are often used along with the laws of
probability to create inferential statistics. The laws of probability can tell
us how likely our inferred average income is to be within some given distance
of the actual average income. Virtually all public opinion surveys do this kind
of thing. We infer what percentage of people will actually vote for a candidate
from a sample of people and then add an error factor called sampling error
(plus or minus some percentage depending
on the size of the sample). Remember, the point here is to INFER.
C. Examples
Let's start with some data and then use it to illustrate the different
kinds of statistics as we go along. Suppose that we have a small nation (very
small to make it simple) of 22 people. You can add as many zeros as you want to
make it larger and more realistic. Adding zeros does not change the math except
that zeros get added to the answers as well. Further suppose that of these 22
people, 5 had yearly incomes of $6,000 in American dollars; 4 incomes of $8,000;
3 at $10,000; 3 at $12,000; 3 at $14,000; 2 at $16,000; 1 at $18,000; and 1 at
$20,000.
Before we go on, I should note that I have already helped you a lot by
rearranging and condensing the data. What you would probably have to start with
is a list of individuals with their incomes in no particular order ($10,000,
$20,000, $6,000, $6,000, $12,000, ... $18,000). I have already rearranged the
data by counting the frequency of people at each income level.
1. Descriptive Statistics
a. Measures of central tendency
Suppose you wanted to tell what the typical income was for this nation.
Maybe you are interested in comparing its prosperity or standard of living to other
nations and want to use income as part of that measure. Or perhaps you work for
the nation's government and are producing a brochure to attract new people to
move here. In either case you want something to quickly tell about the income
of all the people who live here without having to list all the incomes. Another
way of saying this is that you want a statistic that tells about what the
center of the data are like.
We have invented some numbers (statistics) that tell us what the center
of data are like. We call them measures of central tendency. Which ones we can
use depend on the level of measurement we have and exactly what we want to know
about the center of the data. Each measure tells us something a little
different about the center.
1) Mean
You are already familiar with the mean. You probably know it as the
"average." To be more precise, it is the arithmetic average.
You simply add up the measurements and divide by the number of measurements
that were made (one for each unit) In our example the units are people and the
measurements are incomes. If you compute the mean for our little nation, it is
$10,900, a very respectable number on a per person (usually called "per
capita") basis.
The mean gives us the mathematical center where all the units have some
influence. The problem is that extreme units have more influence than ones
close to the center. For example, which grade affects your final average the
most, the 87, the 93, the 79, the 83, or the 32 on that test you shouldn't have
taken because you were sick the night before?
You can only compute means for interval level measurements. Why?
Well, if you think about it, computing a mean for ordinal or nominal data
wouldn't make sense because we couldn't add the measurements together. Suppose
we had measured the income in three groups: low (under $10,000), medium
($10,000 to $15,000), and high (over $15,000). How do we add together 9 lows
with 9 mediums and 4 highs? The addition can't be done using everyday
arithmetic. If we had some other kind of measurement that was nominal in nature
(e.g. race, with, say, 10 blacks and 12 whites), we still could not compute any
mean. So we have other measures of central tendency we use for these other levels
of measurement.
2) Median
The median is the value of the middle unit after all the units have
been arranged in order of magnitude (from lowest to highest or the other
way around). You might think of the median as the center in terms of what the
unit in the middle looks like.
This measure of central tendency can be used for either interval or
ordinal measurements. All we need to do is order the measurements and then
count to the middle one. If we have an even number of measurements, then we go
halfway between the two at the middle.
For example, suppose we want the median of the following party
identifications: strong Democrat, strong Democrat, weak Democrat, weak
Democrat, no identification, moderate Republican, strong Republican. We have 7
measurements. I listed them in order, so we merely count to the middle one (the
4th), and get weak Democrat as the middle one. Suppose we only had 6
measurements by striking out one of the weak Democrats. Then we would have to
go halfway between the 3rd and the 4th measurement (between the weak Democrat
and the non‑identifier). What we would have to say here is that the
median is "between weak Democrat and non‑identification."
Now let's use the example of incomes. If we ordered the 22 measurements
of income, the middle two (the 11th and 12th) are both $10,000. So the median
is simply $10,000 a year. You should verify to yourself that this is the
correct answer by going back to where I introduced this example and count your
way up to the 10th and 11th measurements. (If you can't get it to work out, ASK
me to go over it for you. You'll be expected to do it on the next test on your
own!)
Suppose we had measured income as low, medium, and high as shown in our
discussion of means. You saw above that we could not compute a mean. But since
this is ordinal measurement, we can compute a median. It again would be between the 11th and 12th
measures. Because there were 9 low incomes and 9 mediums (see above), both the
11th and the 12th would be in the medium category. Therefore, the median would
be "medium income" ($10,000 to $15,000).
3) Mode
The mode is the measurement that occurs most often. The mode tells
us what the most typical real unit looks like. All you have to do is group the
measurements and see how many occur for each value or category, and pick the
one with the most. That one defines the mode.
If you think about the way this is defined, the mode can be used for
all levels of measurement. If we were looking for the most typical race
(nominal level measurement) and there were 12 whites and 10 blacks in our
nation, our best guess would be white. White is the "modal" racial
category.
If we had measured income as
low, medium, and high as discussed above (ordinal level measurement), we would
have no unique mode as there are 9 measurements in both the low and medium
categories. So in this case we would have to say that two modes exist (or it is
"bimodal").
If we use our original measures of exact incomes (interval level
measurement), we can still compute a mode. It is $6,000 because that is the
measurement that appears most often.
4) Relationship to the frequency distribution‑‑or
why you should ask for all three
The frequency distribution is the frequency of units at each measurement or value. When I presented the data from the incomes example to you originally, I gave you the frequency distribution‑‑5 at the value of $6,000 and so on. Frequency distributions are often presented in tables and graphs. In tabular form our data would look as follows:
VALUE FREQUENCY
$6,000 5
$8,000 4
$10,000 3
$12,000 3
$14,000 3
$16,000 2
$18,000 1
$20,000 1
____
TOTAL 22
In bar
graph form, the frequency distribution for these data would be shown as
follows:

If a picture is worth a thousand words,
then perhaps the three statistics that we have computed for these data (the
mean, median, and mode) are almost worth a thousand words as well. Why? Knowing
the three measures of central tendency tells us a great deal about what the
frequency distribution looks like‑‑IF we know how to interpret
them. Let's write them down and see.
mean: $10,900
median: $10,000
mode: $6,000
The first thing you should notice is that they are different. Why? They are different is because of the shape of the distribution. If we had a perfect "bell shaped" distribution, all three measures would be the same. Suppose the mean, median, and mode were all at $13,000. that would mean that just as many people were above $13k as below $13k. It would mean that the most frequent income was $13k. You could still have extreme cases at the end, but they would have to balance each other out.
What has happened in the actual distribution
is that having most of the values above the single most frequent one (the mode)
pulled the median above the mode and a few extreme high income values pulled
the mean over the median. If the median and mean had been below the mode, then
we would know that most of the values fell below the mode and a few extreme
values fell way below the mode and median.
So what? If you know the three measures of
central tendency, you know something about the shape of the distribution. If
they are the same, the distribution is "bell shaped." If they are different, the distribution is
stretched (the formal term here is "skewed")
toward whichever extreme the mean is at‑‑in this case, skewed high
or to the right.
The second answer to so what is that it
matters to know all three because ALL THREE CAN BE CALLED AVERAGES. The person
who is presenting the data may have a reason to make the average seem high or
low. If they wanted to attract more people to move to our little nation, obviously
they would use the mean. If they wanted to attract business looking for people
willing to work for low wages, they would use the mode.
The moral of this story: when
someone presents you with an "average" in order to prove something,
ask WHAT KIND OF AVERAGE and ASK FOR OTHER "AVERAGES" as well.
Knowing may prove valuable to you!
b. Measures of dispersion
By measures of dispersion we mean statistics
that summarize how spread‑out the data are. We will present three of
these below.
1) range
Range refers to the distance between
the extreme values of the measurements. Range is computed by simply
subtracting the high extreme from the low extreme. This does not make any sense
unless we have interval level measurement because at the other two lower levels
(ordinal and nominal), we cannot even talk about distance.
In our income example, the range would be
$14,000. You get this by subtracting $6,000 from $20,000. A $14,000 difference
exists between the lowest unit and the highest unit.
2) percentile
Percentile is defined as the value or
measurement at which some given percent of the scores fall below. You've probably
seen percentiles in reporting the results of standardized tests like SAT
scores. Percentiles can be used for both interval and ordinal data, because all
we need is ordered data, not exact measurements.
Again, using the income data, the 50th
percentile would be the measurement at which 49% of the measurements fall
below. That would be the bottom eleven. Counting up, the 50th percentile is
then $10,000. (If you think about it, the 50th percentile is always the same as
the median.) The 90th percentile would be the measurement at which 89% of the
scores fell below. That's .9 x 22 = 19.8 or 20 scores (rounding up). Counting
up, $18,000 is the 21st score‑‑20 scores fall below it. So $18,000
is the 90th percentile, or as close as we can get to it with rounding. To be
more precise, $18,000 is actually the 90.9th percentile because 20 of 22 scores
fall below it (20/22 = .909).
When you scored in the 65th percentile on
your SAT's, that means that 64% of those who took the test scored below you and
34% above you.
3) standard deviation
This is the most complicated measure of
dispersion. Mathematically, it is
defined as the square root of the average (mean) squared distance from the
mean.
I know that sounds complicated. But if you
think about how standard deviation is computed, it makes sense. What we want
here is an average distance from the mean. The bigger this average distance,
the more spread out the data. That much should make sense. But why the squaring
and then square root? The answer is to get rid of the minus signs. Because some
of the scores are above and some below the mean, simply adding them up would
cause them to cancel each other out. In fact, you would get zero every time
because the average distance from the mean is zero. That's how the mean is
computed. The cases on one side would cancel out those on the other. So we
square to get rid of the minus signs and then take the square root after we
compute the average squared distance. (By the way, the average squared distance
is called the variance, and is used in building a lot of other statistical
formulas that need to account for how spread out the data are.)
If you really want to test yourself, try
and compute the standard deviation of our income data using the definition as a
guide. Show me what you did and I'll tell you if you did it right!
c. Measures of relationships
Often we are interested in seeing if a
relationship exists between changes in two variables. Does income change as
education changes? Does party identification change as income changes? These
are the kinds of questions that we must answer in order to test hypotheses and
build scientific explanations. (Remember the arrow diagrams?)
You can do this in many ways.
Unfortunately (or maybe fortunately?), most are beyond the scope of this
course. However, you will learn one way of doing this in the next chapter‑‑crosstabulations
(sometimes called two-way frequency distributions).
2. Inferential:
moving from a sample to a population
Anytime we move from a sample to make
generalizations about some larger population from which the sample was chosen,
we have entered the realm of inferential statistics. Therefore, all the descriptive
statistics we have discussed could be used as inferential statistics if they
were first computed for a sample and then we inferred from them the same
descriptive things about the general population from which the sample was
chosen. So you really have nothing new to learn here except the notion of
sampling, and the constraints on inferring, called significance. Here I merely
want to introduce the notions. You will learn more about them in the next
chapterthat concerns public opinion and surveying.
You might think about our little nation
and incomes that we have been using as an example up to this point. Suppose
that we are no longer talking about the whole nation, but rather a sample of 22
people from a much larger nation. Then we would use the descriptive statistics
that we calculated for the sample to infer descriptive things about the whole
nation.
a. Sampling
All we mean here is the process of
choosing a sample from some larger population. Ideally, we want to choose
the sample so that every unit in the larger population has an equal chance
of being chosen‑‑that's called a simple random sample. The goal is to choose a sample that is
representative of the general population, and getting a simple random sample is
one excellent way of doing that. How this is done in practice is discussed in
the next chapter.
b. Statistics that measure
"significance"‑‑or how likely is it that my sample
statistics are close to being right for the larger population
In science generally and in survey
research particularly, we take a very conservative position on significance. We
are unwilling to accept any inferred fact unless there is at least a 95% chance
that we are correct (or a 5% chance of being wrong). All of our formulas for
calculating sampling error are based on this position (which is called a significance level). You will see this
again in the next chapter in the discussion of the expected error in public
opinion surveys. To put this a slightly different way, if the sampling error in
a survey is said to be plus or minus 3%, then a 5% chance exists that the truth
for the population is outside of this range.
Knowing the things we have discussed will
certainly not make you an expert in statistics. But at least you get an idea of
some of the kinds of questions you need to ask when someone attempts to
"prove" their point by statistics. If you don't know what else to
ask, just ask "Exactly how did you measure that?" or "How did
you compute that?"
KEY TERMS
scientific
replication
three goals of
science
relationship
between
explanation and prediction
seven steps of
scientific
research
assumptions of
scientific
research
conditional
explanations
probabilistic
explanations
partial
explanations
unexplained
variance
open
explanations
units of
analysis
properties and
variables
constants
causal
relationships
arrow diagrams
independent
variable
dependent
variable
intervening
variable
conditioning
variables
reciprocal
relationships
symmetrical
relationships
spurious
(non)relationships
controlling
for a variable
reliability of
measures
validity of
measures
three levels
of measurement:
nominal,
ordinal, & interval
statistics
descriptive statistics
inferential
statistics
measures of central
tendency
mean
median
mode
frequency
distributions
bar graphs
bell shaped
distributions
measures of
dispersion
range
percentile
standard
deviation
sampling
significance
level