Chapter 3.  Scientific Methodology and Statistics

Last updated 9-20-2012

Copyright 2009-11 Robert E. Botsch

 

 

There was a man who drowned crossing a stream with an average depth of six inches.  Anonymous

 

OUTLINE

 

I. What we mean by "scientific"

 

II. The goals of science

    A. Describe

    B. Explain

    C. Predict

    D. Relationships between explanation and prediction

 

III. Steps of scientific research and underlying assumptions

    A. Steps

       1. Problem selection

       2. Theory formulation

       3. Hypothesis development

       4. Operationalization

       5. Data gathering

       6. Analysis and hypothesis testing

       7. Theory reformulation

    B. Underlying assumptions

       1. Human behavior occurs in regular patterns

       2. Reason allows us to observe and discover these patterns

       3. Patterns are governed by laws

       4. Reason allows us to discover these laws

       5. Scientific knowledge is "intersubjective"

       6. The goal of science is to increase knowledge, not apply it

 

IV. Characteristics of scientific explanations

    A. Conditional

    B. Probabilistic

    C. Partial

    D. Open

 

V. Some key ideas in scientific research

    A. Units of analysis

    B. Properties and variables

    C. Possible relationships among variables

       1. Causal

       2. Conditioning 

       3. Reciprocal

       4. Symmetrical

       5. Spurious

       6. Controlling for a third variable

    D. Measurement

       1. Criteria of accuracy

          a. Reliability

          b. Validity

       2. Levels of measurement

          a. Nominal or qualitative

          b. Ordinal or ranked

          c. Interval

 

VI. Statistics‑‑a few basics

    A. Definition and purpose of statistics

    B. Two general types of statistics

       1. Descriptive

       2. Inferential

    C. Examples

       1. Descriptive

          a. Measures of central tendency

             1) Mean

             2) Median

             3) Mode

             4) Relationship to the frequency distribution‑‑or why you should ask for all three

          b. Measures of dispersion

             1) range

             2) percentile

             3) standard deviation

          c. Measures of relationships

       2. Inferential: moving from a sample to a population

          a. Sampling

          b. Statistics that measure "significance"‑‑or how likely is it that my sample statistics are close to being right

 

 

 


TEXT

 

I. What we mean by "scientific"

 

     We have been talking a great deal about political science in terms of its history, evolution, and kinds of theories that make up the discipline. Exactly what makes political science a "science?" What do we mean when we say "scientific?" From what we have already said, you should have a pretty good idea. For example, you already should know that the theories we labeled as "empirical" are more scientific than "normative" theories. We can test empirical theories based on observations. Normative theories can't ever be tested completely. Empirical theories involve "fact" statements while normative theories involve "value" statements. If you remember this much, that's great. You're well on your way to understanding this material. If not, go back and look at those ideas again.

 

     Generally speaking, what we mean by scientific is being very precise and self‑conscious in our methods of finding out about things. In fact, we want to be so precise that someone else can duplicate what we have done and see if they get the same results. This is called replication. To make our research replicable, we must describe every detail of what we have done and how we have done it. If the research can't be replicated so that the same conclusions can be reached, then whatever those conclusions are (a theory, a hypothesis, a fact) cannot be considered scientific.

 

 

II. The goals of science

 

     The goals of all scientific research, whether in politics or in biology, are to describe, explain, and predict. We mentioned this earlier in discussing the definition of theory. Theories help us understand political behavior by focusing our attention and questions on certain properties of that behavior so that we are better able to describe, explain, and predict. In the case of political science research and theory, our aim is to understand political behavior. 

 

    A. Describe

 

     This is usually the first step. We have to be able to describe what we are observing before we can explain it or predict what will happen. Quite frankly, this is the stage where much political science research is right now and has been for some time. We are still struggling to identify all the important variables in such areas as bureaucratic organizational behavior, campaign contributions, political psychology, and pressure group organization. In other areas such as voting behavior, we have pretty well established what the important variables are: party identification, incumbency, name recognition, ideological orientation, satisfaction with job performance, personal image of the incumbent (trust, leadership, competence), expectations for the future (is the country/state/whatever on the right track), and personal background characteristics. Those who study voting behavior now spend most of their time combining all these variables in increasingly sophisticated mathematical models and fine tuning their measurement methods.

 

    B. Explain

   

     Once we figure out what we are describing and the terms, concepts, and variables that describe this phenomenon, we begin to see relationships among these things. Marx sought to explain why one stage of history followed another after he developed the concept of dialectical materialism. Systems theorists first developed the concepts of input, output, and feedback before they could talk about the relationship between system persistence and feedback and support. Explanations are more than just associations between things. They are the LOGIC and REASONING behind the associations. They answer the question of WHY something happened. The content of an explanation, (that is, its logic and reasoning, concepts, and variables) is another way of thinking about theory.

 

    C. Predict

 

     Once you can explain, you are well on your way to prediction. If you know that event B followed event A in the past and you have a logical explanation linking the two, then you can predict B whenever you observe A is taking place. However, prediction is always more risky because situations are rarely precisely the same as they ever were in the past. That is to say, we may once again observe something that looks like A, but it is usually not quite like A. In complex human relations, just knowing about the past changes the present. Therefore, the conditions that define reality are constantly changing--we are shooting at a moving target. Hopefully, the change is slight enough so that we can still make reasonably correct predictions. 


     For example, we know that the powers of incumbency pose great difficulty to anyone who wishes to deny an incumbent her or his party's re-nomination‑‑if the incumbent desires re-nomination. So if you have event A, that is, an incumbent that desires re-nomination, you can usually predict that she or he will win it. Such predictions held for President Gerald Ford in 1976, President Jimmy Carter in 1980, George H.W. Bush in 1992, Bill Clinton in 1996,  George W. Bush in 2004 and Barack Obama in 2012. All except Clinton, George W., and Obama had serious opposition from within their own political party (Ronald Reagan and Ted Kennedy and Pat Buchanan respectively). (Ironically, the Democrats defeat in the 1994 congressional elections helped Clinton. No other Democrat thought the nomination in '96 would be worth having because in 1995 it appeared that the Republicans would almost surely win. By late 1995 when Clinton's chances improved dramatically, it was too late to get a viable campaign started.) However, this explanation/prediction may not always hold because the relationship may be changing or because other related factors (that perhaps we don't fully know about in our theory) are not the same as before. For example, as a political party becomes weaker in terms of both its organization and its place in the minds of average citizens, the ability of the president to use his powers as head of his party to guarantee his re-nomination may be weakened.

 

     The way the media describes presidential elections is another good example. Elections are frequently seen in terms of their similarity to past elections. Yet no two elections are ever exactly the same. Was the election of 1992 (incumbent Bush versus Clinton) closest to 1948 when an embattled incumbent (Harry Truman) ran against what he called a "do nothing" Congress? Was it closer to 1960 when a new generation of leaders with new ideas wanted to replace an older generation? Was it closer to 1976 when voters were clearly dissatisfied with the incumbent but were a little afraid of the inexperienced challenger? Was it like the 1980 election in that the economy was the deciding factor? 1992 did share something in common with all of them. Which fit best? Choosing the best fit and best explanation is much easier once the election is over (though it is still not a simple matter).

 

     However, trying to predict which explanation will fit for an upcoming election is much harder. After the Gulf War of 1990, who would have even predicted any explanation that pointed toward George H.W. Bush's defeat? Those who predicted in 1995 that an unpopular Bill Clinton would follow George Bush's steps as a one-term president wished to take their prediction back a few months later. Clinton won easilyover Bob Dole.

 

     On the other hand, the unhappiness of the nation with the W. Bush presidency did make the prediction of Republican defeat in 2008 fairly easy. The winner was going to be the candidate least like W. Bush, and despite gallant efforts from John McCain to present himself as a different kind of Republican, a "maverick," Obama was less like Bush and more clearly associated with what the nation wanted--change.

 

     In the 2010 Congressional elections one could safely predict that the Democrats would lose seats because the party that picks up seats and the White House almost always loses in the next non-presidential election, especially if the economy is doing poorly. But how many seats would the Democrats lose? Would it be more like 1994 when Republicans took over both houses in what was called a "Republican tsunami?" Of would it be closer to the usual number so that Democrats maintained control of both houses, but by much closer margins? That was a much more difficult prediction, depending on last minute events and on some "fortuna" and on the qualities and skills of particular candidates in their campaigns. It turned out to be more than a tsunami in the House but the Democrats barely held the Senate. 2012 would normally look very bad for an incumbent president with an unemployment rate over 8% and an economy recovering very slowly, especially in new jobs. But other factors like a possibly weak Republican nominee who was perceived to be selling ideas that contributed to the economic collapse in 2007 and who seemed out of touch with average people could help the incumbent to hang on. With a lot of variables and so many unknowns (including debate performances, late changes in the economy, or some unanticipated crisis at home or abroad), a prediction is hard.

 

    D. Relationships between explanation and prediction 

 

     From the above discussion, you should have noted that there is a time relationship between explanation and prediction. Explanation is post hoc, that is, after the fact, while prediction takes place before the fact. 

 

     You may also have noted that our ability to predict seems to depend on our ability to explain. Although that is usually correct, it's not quite all that simple. In fact, sometimes we can predict when we can't explain. It's also possible that we can sometimes explain and cannot (or will not) use that explanation to predict. Let's take these two special cases and look at each separately.

 

       1. Predictions that are not explained.

 

     Sometimes we know the conditions that precede an event, but don't know what the precise logical connections are. To put this another way, we sometimes have a well-established relationship without much theory. We can't fully explain the relationship.

 

     Natural science provides many examples of this, where certain situations are strongly associated with particular results, but WHY is not clearly known. For example, byssinosis or "brown lung" is a lung disorder that is strongly associated with exposure to cotton dust. This was almost certainly not just an accidential correlation given the large numbers of studies that came to the same conclusion and the logic that breathing in dust overa long time increased the probablility of lung problems. The relationship is so strong that the government issued regulations regarding cotton dust in textile plants. However, we did not know for sure what substance in the cotton dust caused the disorder. Nor did we know exactly how and why cells in the lungs reacted as they do to that substance.

 

     In political science we try to avoid this problem. We insist that researchers only look at relationships that are suggested by some theory or explanation. If we happen upon a relationship that lacks a theory (like the relationship between the league of the team that wins the professional baseball World Series and which party wins the presidential election in that year), we don't take it seriously. We refuse to make serious predictions. Doing so would be criticized as "unprofessional"--a negative label.

 

       2. Explanations that cannot (or will not) be used for prediction.

 

     This situation arises when we know the relationship between certain conditions and an event, but we are either unable or are unwilling to measure these conditions or make the prediction.

 

     For example, many theories explain the outbreak of wars. Some of them focus on the mental conditions and outlooks of political leaders. But, for obvious reasons, leaders are unwilling to let us find out what is going on inside of their heads. Moreover, even if they were willing, we don't have the time or ability to monitor this kind of thing on an ongoing basis. So political reality and resources limit us here.

 

     We may also be limited for ethical or normative reasons. For example, we have many theories that explain criminal behavior in terms of background and attitudes. Researchers could develop tests that could identify those who are likely to engage in violent criminal acts with a fairly high degree of accuracy. Those tests could be used to identify these people at a fairly early age. Then they could be given counseling or perhaps even  isolated from society. Would you be willing to do this‑‑even if we could be, say, 90% accurate? Probably not, and the reason is that it violates very strong and basic values of justice. We believe that people should not be singled out and "punished" for something that they have not in fact actually done. This would go beyond even "thought crimes." This would create a new category that could be called "predisposition crimes." How accurate would such a test have to be before you would be willing to condone its use? That's an interesting question to think about. Even if it were 100% accurate, would you be willing to create a set of laws that punish people for "criminally predisposed personalities?" This is what the 2002 science fiction Tom Cruise movie "Minority Report" was about.

 

     Modern medicine allows us to test the genetic predisposition people have for many diseases today. In some cases, insurance companies have refused to give coverage to people having these predispositions. From the insurance company's point of view, the refusal keeps costs down for them and others who pay premiums. But the refusal strikes many others as extremely callous. The company seems to care only about profits—that is the nature of corporations. Should insurance only be for the healthy? Genetic testing could make that possible for many costly diseases. This is why insurance reform, like preventing insurance companies from turning people down for pre-existing conditions, is one of the most popular parts of president Obama's Affordable Care Act.

 

 

III. Steps of scientific research and underlying assumptions

 

     All scientific research generally follows a step by step process. We might break down the process in many ways. Many researchers will combine some of these steps or will break them down even further. What I want to do here is give you a general idea, with examples, of the major ideas in these steps.  


     Then I want to discuss two kinds of assumptions involved in doing scientific research. First, we must assume some things to be true if the steps of the research process are to make any sense. They are logically necessary. Second, scientific researchers assume some additional things that go beyond necessary logical assumptions. These additional assumptions take what we have called normative stances‑‑value positions about what scientists SHOULD do (remember?).

 

    A. Steps

 

       1. Problem selection

    

     We have already talked about problem selection in the sense that this involves taking a value position. In selecting a problem, you are saying what you think is important for you to do. You cannot avoid this step. Even if you abdicate the decision and do anything others are willing to pay you to do, you are saying that money is important. 

 

       2. Theory formulation

 

     Here you find out what is already known about the problem you have selected. Generally you do what is called a "literature search." Why should you spend a great deal of time rediscovering what is already known?

 

     For example, if you are interested in why more Americans don't take the time and trouble to vote, you would begin by looking at voting behavior literature to find out about models of voting behavior. One interesting model (the economic theory of voting) was developed by an economist named Anthony Downs. He argued that given the time and trouble and limited expected payoff for voting, voting is irrational. That is, the likelihood of your vote being the deciding vote is practically zero. And even if your vote was the deciding vote, it would do you no good. The elected official would have no way of knowing that it was your vote. Consequently, voting has no expected payoff, and therefore, most people act irrationally in voting in most elections. This theory suggests that the proper question is not why people DON'T vote, but why DO people vote at all!

 

       3. Hypothesis development 

 

     Once you have examined existing theories, you then develop hypotheses that are suggested by the theories. You can design these hypotheses to accomplish one or more of several goals. They might be used to test the truth of the theory, especially if you don't believe the theory will hold or if you think that previous tests were poorly done. Hypotheses might be designed to refine the theory by showing where it holds and where it does not hold. Hypotheses might expand the theory to see if the theory holds for some new class of political behavior to which it has never been applied. In any case, the hypothesis should be some specific statement about a relationship that logically flows from the theory that can be tested by empirical means.

 

     For example, suppose you think that Downs was really on to something in his economic theory of voting. You want to improve his theory by combining it with political socialization theories that might explain why people so often do commit the "irrational" act of voting. Political socialization tells us that many of our behaviors are learned behaviors that have no payoffs in and of themselves. Rather, behaviors are reinforced by approval and peer group pressure. Perhaps you learned that from the "Money Game" we played at the beginning of the semester. Thinking about this a while, you might come up with the following hypothesis: the more civic clubs in which a person is a member, the more likely that person is to vote. Of course, the reasoning here is that civic clubs provide a lot of reinforcement and encouragement to vote.

 

       4. Operationalization

 

     Once you have a hypothesis, you have to turn it into something you can actually measure--something you can observe. The process of taking variables and describing how measurements are made of these variables is called operationalization. 

 

     For example, looking at our hypothesis about civic clubs and voting, you could perform a survey in which you ask potentially eligible voters the number of civic clubs to which they belong. Then you ask them whether or not they voted in the last election. What you have done here is change the abstract concepts of socialization and voting participation into concrete measures, in this case two concrete questions.

 

       5. Data gathering    

 

     This is where you actually go out and gather the data you need to test your hypothesis. You may select a sample and perform a survey. You may do in‑depth interviews with political elites. You may look up facts about governments or public officials, e.g. voting records of congresspersons. What you do depends on the hypothesis to be tested and the kinds of things to which it applies.

 

     For example, another way to test your hypothesis on political socialization and voting participation is to compare nations with respect to civic memberships and voting. You would have to compare nations with similar types of government. It wouldn't make sense to compare a republic with a totalitarian state because this hypothesis assumes that voting is a matter of choice. Many totalitarian governments require citizens to vote as a sign of support for the regime.

 

       6. Analysis and hypothesis testing

 

     Here is where you arrange your observations in such a way that allows you to see if the data supports the hypothesis or not. You have to code the data (we'll explain this later) and then display it in such a way that the hypothesis can be tested. Usually some tables are used along with some statistics. We'll do some of this as an exercise in the next chapter.

 

       7. Theory reformulation

 

     Once you have found out whether or not your observations support your original hypothesis, you must then think about what impact this finding has on existing theory. In doing so, you contribute to the building and refinement of scientific knowledge.

 

     For example, if you find no relationship between civic club memberships and likelihood of voting, you would conclude that either socialization theory does not apply to voting behavior or that civic clubs do not effectively engage in political socialization. These conclusions suggest other new problems that can then become subjects for research. How do civic clubs deal with values of political obligation? If they do attempt to promote the value of voting as a civic duty, why is the effort ineffective? Maybe only some kinds of civic organizations engage in political socialization. And so on. I hope you get the idea. By the way, the hypothesis is true. Membership is associated with higher rates of voting.

 

    B. Underlying assumptions 

 

     Logically speaking, we must assume a number of things to be true in order carry out the process of research.

 

       1. Human behavior occurs in regular patterns

 

     If people always acted in random ways, then observing their behaviors in politics or in any other realm of life wouldn't tell us anything about them. Some pattern must exist for us to observe if we are to learn anything.

 

       2. Reason allows us to observe and discover these patterns

 

     This involves having faith in our ability to figure out what these patterns are. If we don't think we are smart enough, then why even try?

 

       3. Patterns are governed by laws

 

     We also assume that underlying causes create these patterns‑‑that one behavior is stimulated by some other behavior or condition. If causes did not exist, then we cannot explain or predict, which are two-thirds of the basic goals of science.

 

       4. Reason allows us to discover these laws

 

     This is similar to the second assumption. If we don't have faith in our ability to figure out what these causal laws are, then not much point exists in even going this far.

 

       5. Scientific knowledge is "intersubjective" 

 

     This is a matter of definition. Being intersubjective is what separates scientific knowledge from other kinds of knowledge, like "faith" knowledge, or intuition. It is a definition that most people who are called scientists agree to use. However, equating science with intersubjective knowledge does have some normative overtones. Use of the term "scientifically based knowledge" tends to relegate all other kinds of knowledge in an inferior position. This has political value in that many people can be intimidated into believing something if it is called a "scientific fact" by a generally recognized scientific source. But I haven't told you what intersubjective actually means yet. "Intersubjective" means doing research and explaining it in such a way that it can be repeated or replicated by another person who then should be able to make the same kinds of observations and draw the same conclusions. This requires great precision in explaining how reaseach is performed. So you see, I hope, that "intersubjective," or "between people" refers to agreement between people as to what happened. As you should know, gaining agreement on what different people observe is not always a simple matter.

 

       6. The goal of science is to increase knowledge, not apply it

 

     This is the second kind of assumption, one that is not logically necessary. It has a great deal of normative content. It was the position taken originally by the behavioralists and more recently by some of the rational choice oriented people in the field. They argued that the job of political scientists was to increase rather than to apply knowledge. As you already know, later political scientists rejected this notion as naive because the mere existence of knowledge often results in its application.

 

      Today, most political scientists would agree that this assumption is naive. They would not accept it as an iron-clad rule for scientific research. However, they would caution young activist political scientists that the more outspoken and controversial they become, the less likely the public is to continue to accept their findings as "scientific." That's another hypothesis we could test!

 

 

IV. Characteristics of scientific explanations

 

    A. Conditional

 

     Scientific explanations are rarely true under all circumstances. Therefore, one of the important jobs of scientific research is to determine under what circumstances an explanation does hold true. As we learn more about when an explanation works and when it does not work, the specification of what these conditions are becomes part of the explanation.

 

           Example: party identification and voting 


     For a long time political scientists have found that the best single predictor of a person's vote is party identification.  However, we have also found that this simple explanation holds more in some circumstances than in others. The more a person knows about the individual candidates (or thinks she knows) and the stronger she feels about specific issues on which the candidates have differences, the less power that party identification has as an explanation. On the other hand, when a person is indifferent or has likes and dislikes that balance out between two candidates, then party identification becomes the dominant factor. So the power of party identification as a predictive variable depends on other conditions, such as issue knowledge.

 

    B. Probabilistic

 

     The opposite of probabilistic is deterministic. Our explanations have much error in them. We cannot determine exactly what people will do all the time nor can we ever fully understand why they act the way they do. Why? This limitation is due to several factors. First, our methods and measurements are terribly imprecise. We do not have the tools to measure things like party loyalty or political trust and legitimacy nearly as precisely as a physicist can measure speed or mass. Second, an infinity of conditions affect human behavior. We cannot account for all of them. Third, human behavior probably differs from the behavior of physical objects in qualitative ways. If we have that quality known as "free will," the ability to sometimes freely choose despite all the social forces around us, then no matter how good our tools became, we could never predict exactly what people will do.

 

     Do these three factors render a science of politics impossible? The answer is "no." But we must account for these factors in how we present our findings. We must use such terms as "likely to," "tend to," or "probably" rather than "shall" or "will."

 

     Example: Incumbency as an explanation of electoral success 


     One of the most powerful explanations of electoral success is whether or not the candidate is an incumbent or a challenger (i.e. facing an incumbent). This explanation takes into account all of the powers of an incumbent: things like name recognition, ability to raise campaign funds, engage in constituent services, and perform and act statesman‑like rather than just make vague promises. Therefore, incumbents are likely to win reelection, but victory is far from certain. Occasionally, incumbents do stupid things and challengers do brilliant things. Why? Free will or "fortuna" (remember Machiavelli?) are possible explanations. Perhaps we simply don't know the right things to measure yet in predicting when incumbents will lose. In any case, all we can say is incumbents are "likely" to win, even in years like 1994 or 2006, 2008, and 2010 when so many citizens were dissatisfied with the performance of their government. Nevertheless, most  all congressional incumbents who ran for reelection did win in both 1994 and 2006 and even 2010. The Republican landslide in 1994 was in open seats and in a few seats where Democratic incumbents lost (no Republican incumbents lost). 2010 was similar. And in 2006 and 2008 it ran the other way with most losses being on the Republican side. All of that is a bit unusual, but the theory that incumbency explains electoral success held true.

 

    C. Partial

 

     Our explanations are partial, or not complete. They are partial as a direct result of their being conditional. We don't have the time or expertise to specify all possible conditions that affect the explanation. Hopefully, the explanations are becoming more complete as research continues, but science moves slowly. Complete explanations will certainly not be found in my lifetime.

 

         Example: Explanations of voting behavior

 

     Even the best scientific explanations of how people vote leave 5 to 10 percent of the vote unexplained (in statistical terminology, this is called "unexplained variance"). That is to say, we don't really know why these people voted the way they did. Over the years, we have developed better explanations that include more conditions that have reduced the percentage that is unexplained, but a great deal is still unexplained. However, this unexplained variance is usually enough to determine the outcome in most elections. Of course, some of this may never be explained because of the nature of human behavior (back to "free will"). Sometimes people mismark their ballots or vote randomly because they know so little about any candidates—these few voters can never be predicted. In any case, virtually all of our explanations are still partial.

 

    D. Open 

 

     Because of the facts that our explanations are partial, probabilistic, and conditional, they must also be subject to change as we learn more. We may learn new conditions. We may develop new measures or concepts (like Marx's work alienation or class consciousness) that improve old explanations. We may replicate work of others and find that mistakes were made in any of the steps of the scientific process. Therefore, we say that all of our theories are open to change. If they were not open, they would not be scientific explanations. That is not to say that they are wrong. They simply could not be fully tested by the methods of science.

 

           Counter‑example: "Scientific Creationism"

 

     The political, legal, social, religious, and educational controversy about the creation of the universe and human life can be boiled down to the question of the openness of the explanation. Those who advocate that the Biblical version of creation be given equal time in public schools to the explanation of evolution have argued that their version has scientific support and is therefore just as viable as evolution. That depends on exactly how one defines science. If we say that science means looking for facts that are logically consistent with some faith‑based assumption, then the creationists have a strong argument. Creationists talk about some "Great Mover" or "Force" that set everything in motion. They use these terms, rather than "God," in order to get around the charge that they are supporting religion in public schools.

 

     However, if by science we mean that our only assumptions are that we will establish fact through observational means and the use of logic and that the resulting explanations will be open and subject to change, then the creationists can no longer be seen as dealing in the realm of science. Their initial assumption is a matter of faith, not observation, and it is NOT open or subject to change. The countercharge by creationists that traditional science is also based on values is in a strict sense correct -- it is based on the value of believing that scientific facts rest on observation and that all theories are open to change. I see this as simply a matter of definition, but one could argue that it is also a value.

 

     By theological standards, the creationists (who are now calling themselves supporters of "intelligent design," since they lost the political battle over including creationism in public schools) may be right, but scientific standards have no way of telling. And of course it could well be that there is an all-powerful God behind all these scientific causes. As the Catholic Church has said in the past, God is the cause of all causes. But that last step is religion, not science. This has been the basic finding of several court decisions that have rejected giving "scientific creationism" the scientific status that evolution has. Creationists who argue that evolutionary explanations do not explain all that exists, or that evolutionary theory has constantly had to be modified have in reality only supported the theory as a scientific theory. And in the strict sense evolution is only a theory -- not a fact. But it is a theory that has a heck of a lot of supporting evidence, so most scientists accept it as a fact, albeit one that has details that are constantly under revision. But that is exactly what science does! Scientific explanations are partial and are open--BY DEFINITION!!  

 

 

V. Some key ideas in scientific research

 

    A. Units of analysis 


     This simply refers to whatever the individual "things" are that we are studying. In political science the "things" are usually people. However, we might be studying constitutions, as Aristotle did. Or we might be studying and classifying elections, as was done in critical election theory (remember?). Or we might be looking at nations as does balance of power theory. 

 

    B. Properties and variables

 

     Whatever kind of units we are looking at, we observe them in order to take measurements of some property that each unit has. Elections have winners, parties, and money spent. Constitutions have powers assigned, prohibitions, and rights distributed. Nations have resources, powers, demographic (that means characteristics of the population like age, race, religion, wealth, and so on), economic, and geographic properties. In most cases, these properties vary. By definition, properties that vary are called variables. If a property does not vary, we then call it a constant.

 

    C. Possible relationships among variables

 

     Variables can be related to each other in many ways. Obviously, when more than a few variables are involved, the patterns of relationship can become quite complex. What I want to do here is talk about some of the most simple relationships involving only 2 or 3 variables.

 

       1. Causal  

 

     This is what we would always like to find, some variable that causes change in some other variable as it changes in some specified way. If the change in the first variable is enough by itself to cause the change in the second variable, we can say that this variable is sufficient to cause a change in the second variable. One thing that social scientists look for as evidence of a causal relationship is the time relationship between the changes that occur in the variables. If the change in the second one consistently follows the change in the first, you have pretty good evidence for a causal relationship.  

     A useful way to talk about these relationships is to use diagrams, sometimes called "arrow diagrams," or "path diagrams."  A causal relationship where a change in variable A results in a change in variable B is shown below.

 

                            A ‑‑‑‑‑‑‑‑‑> B  

 

     We use some special names for the roles each of these variables play in this relationship. A is called the independent variable, and B is called the dependent variable. One way to think about and remember this is to say that the dependent variable depends on the independent variable in the relationship. Look at the arrow diagram and say this a couple of times.   

     Many examples of this simple yet most important relationship exist. Let's look at one that is of political significance in the South and in South Carolina in particular. It is one that has great implications for social and medical policy in the region. It involves the question of infant mortality (measured by deaths per 1,000 births before the age of 1 year). The South has long had the highest rates of infant mortality in the nation, and South Carolina has about the highest of any state. Why? To answer that question we must look for causal variables. One that medical science has well established as a causal factor is birth weight. The relationship can be described as follows: the lower the birth weight, the more likely an infant is to die. In arrow diagram form: 

 

              Birth Weight ‑‑‑‑‑‑‑‑‑‑‑> Infant Mortality 

 

     Let's make the picture a little more complicated and add a third variable that also has a causal relationship with the other two. Suppose the arrow diagram looks as follows: 

 

                     A ‑‑‑‑‑‑‑‑‑> B ‑‑‑‑‑‑‑‑‑> C    

 

     Now we have two causal relationships, one between A and B, and a second one between B and C. In a sense, there is also a third one between A and C, but it is mediated by variable B. In this case, variable B is called an intervening variable. In order to specify the role that each variable plays, we must talk about the role it plays in relation to some other variable or variables. For example, A is an independent variable with respect to B, and B is an independent variable with respect to C, but with respect to A and C both, B is an intervening variable. Got it?

     Let's apply this terminology to our infant mortality example and expand it a bit. So low birth weight causes high infant mortality‑‑so what? If we think about what causes low birth weight, we begin to see the public policy implications. One of the principal causes of low birth weight is poor nutrition, and one of the principal causes of poor nutrition is poverty. I've added two intervening variables and now you see the southern connection. Here's what we have in arrow diagram form. 

 

                poverty ‑‑> nutrition ‑‑> birth weight ‑‑> infant mortality

 

     A good exercise at this point would be for you to describe the roles that each of these variables play with respect to the other variables. Try it!

 

     Although we may say a relationship is causal, in reality what usually happens in the complex world of human behavior is much weaker than that.  The independent variable usually does not cause a change in the dependent variable, but merely makes some change more likely to happen. We might say that many of our causal relationships are really about variables that  make something more likely to happen, but are neither necessary nor sufficient in making it happen. For example, low education contributes to poverty by making poverty more likely. But low education is neither necessary nor sufficient in creating poverty. And, until we get to a certain level of really low poverty, poverty only contributes to poor nutrition. In political science we are almost always really talking about how much some variable A contributes to some change in variable B. Rarely does a change in A always cause B to change in a specified way.

 

       2. Conditioning 

 

     Now we will start making things a little more complicated.  Sometimes a third variable weakens or strengthens a relationship.  Then we say that a variable plays a conditioning role in the relationship, or to put it another way, the relationship only exists (or is more likely to exist) under certain conditions.  

 

     To use our example about infant mortality, we might say that knowledge about good nutrition will not by itself cause a person to have good nutrition. Nutritional knowledge is more likely to lead to good nutrition under the condition of financial means. So financial means conditions the relationship between nutritional knowledge and the actual practice of good nutrition.

 

     Knowing the conditioning variables is politically important. How so? If we want to do something about infant mortality, for example, we might want to create programs and/or policies that create the right conditions for decreasing infant mortality or increasing good nutritional practices, which we know helps reduce infant mortality.  

     In path diagram form, we have a relationship between A and B that is affected, or conditioned, by the value of a third variable C. So in terms of our example, A is nutritional knowledge and B is the practice of nutrition, and C, the conditioning variable, is financial means. You must  have the money to buy and consume the things you know are good for you. And just in case you did not know this, it turns out that better food generally costs more, so knowledge by itself is not really enough!  

 

                                         A --------------------------------> B

                                                           /\

                                                            |

                                                            |

                                                            |

                                                           C   

 

       3. Reciprocal 

 

     An easy way to think about a reciprocal relationship is to think of it as a two-directional causal relationship. Each variable simultaneously plays the role of both independent and dependent variable. Each reinforces the other. Unlike a simple causal relationship, no clear indication tells us which variable changes first in time. Either variable could change first, or the changes may be so close in time that telling which came first is impossible to tell. You might think of this as a kind of "chicken and egg" situation. A reciprocal relationship is shown in an arrow diagram as follows. 

 

                                    A <‑‑‑‑‑‑‑‑> B 

 

     Although making the example of infant mortality illustrate this kind of relationship is a bit more difficult, we can make it fit if we stretch things a bit. Low concern over nutrition tends to reduce nutrition. However, we might also make an equally strong argument that poor nutrition leads one to be less concerned about good nutrition. To the extent that poor general health causes low motivation in all areas of life, poor nutrition may cause low nutritional motivation. Which comes first? Logically, either could happen first.  

  

       4. Symmetrical 

 

     This is a rather simple situation where one variable simultaneously causes changes in two other variables. Or you might say that we have one independent variable and two dependent variables. The arrow diagram is as follows.


 

 

 

 

  

     The significance of this simple relationship is in our next type of relationship, or to be more precise, "non-relationship." 

 

       5. Spurious  

 

     This is an apparent, yet false causal relationship that is the result of some unknown third variable having a symmetrical relationship with the first two variables. Therefore, a spurious relationship is an untrue relationship. This could be diagramed in an arrow diagram as shown below. 

 

 

   

     This poses a great problem for researchers. Every causal relationship is potentially spurious. We never know for sure until we have tested all possible third variables. The process of doing this is called controlling for third variables. The fact that we can never control for all third variables along with the fact that we may someday find some third variable that renders what we thought to be a causal relationship to be spurious are additional reasons why scientific explanations are open and always subject to change. 

 

    We need to add some other terminology here that is often used. When a third variable C causes a bivariate causal relationship between A and B to disappear, that is, renders it spurious, we say that the third variable has a confounding effect or is a confounding variable.

 

     For example, medical researchers noticed that people with very low levels of cholesterol have higher death rates. They wondered what was going on, because low cholesterol is supposed to be good.        

    

                        cholesterol (low) --------------> death rate (high)

 

     As good researchers, they looked at things that could cause this relationship to be spurious. Pretty quickly they found some good candidates for that third variable: smoking and alcohol. High levels of alcohol consumption and heavy smoking depress the appetite and cause cholesterol to be low. Simultaneously, these activities contribute to higher death rates.

 

 

  

       6. Controlling for a Third Variable: How to do It

 

     When we are trying to see if a relationship is spurious or if a third variable conditions a relationship, we control for whatever third variables we think might be related to both the independent and dependent variables. We do this by reexamining the relationship for each value of the control variable. Using the example above about cholesterol and death rates, we would control for smoking by looking at the relationship for smokers and then look again for non-smokers. If the relationship is spurious, then the relationship between cholesterol and death rate would disappear when looking at each group alone. If conditioning were taking place, we would see a different relationship between cholesterol and smoking for smokers than for non-smokers (which is not the case).

     It turns out that the test for spuriousness is the very same test for seeing if a third variable intervenes between an independent and dependent variable. When you control for the possible intervening variable, the relationship also disappears.  So the statistics cannot tell us whether a relationship is about spuriousness or involves an intervening variable. We can only tell the difference from theory, from whether we have reasons for the third variable to play a role between the independent and dependent or whether it should have an effect on both.

 

     Let's look back at our example of infant mortality. Here is where it gets complicated and highly controversial. Researchers have long noted a relationship between race and infant mortality. Black mothers are more likely to have underweight infants than are white mothers. Most observers regarded this apparent relationship between race and birth weight as caused by poverty, where poverty plays in intervening role and wipes out any direct relationship. Here is what that relationship would look like in path diagram form. You will note that I drew this path diagram so that it looks like a spurious relationship except for one thing. Can you see what it is? The arrow between race and poverty is in the other direction, so it is really the diagram for an intervening relationship. Logically, it could not be in the other direction unless somehow a change in poverty could cause a change in race! But if poverty does really intervene and the relationship between face and birth weight does disappear when we look at similar economic groups, we can still conclude that race has nothing to do with birth weight directly.

 

 

     Well that is what we thought would happen. But it did not exactly turn out that way!

 

    A number of researchers have argued that the relationship between race and birth weight is real. They have argued that some unknown genetic difference between blacks and whites makes blacks more likely to have offspring with lower birth weights even after you account for the impact of poverty. They have looked at the relationship between race and birth weight while controlling for a number of third variables that approximate poverty. For example, they have found that white mothers with low education have larger infants than black mothers with low education. At the other extreme, they have found that black mothers with high education have smaller infants than white mothers with high education. Other researchers have compared infant mortality rates of blacks in South Carolina with blacks elsewhere and found no significant difference. They also found that South Carolina whites are also not significantly different than whites elsewhere. Looking at these facts together, they have tentatively concluded that the major reason for higher infant mortality rates in southern states is NOT poverty, but the simple fact that southern states have higher percentages of blacks in their populations. Mississippi and South Carolina, the two states with highest percentage of blacks in their populations, should therefore be expected to have the highest infant mortality rates. Indeed, they consistently have. 

 

     Now this is highly controversial for a number of reasons. The first is methodological. Critics of this research argue that insufficient data on mothers are available to fully measure the concept of poverty. Single variables like education only partially capture the concept. To use the terminology we have been using, they are saying that this new research is faulty at the "operationalization" stage, that the measures used are not valid.

 

    Recent research suggests that the explanation may lie in the time between births, another variable. If for cultural reasons blacks have births closer together than other racial groups, that could be the intervening variable. The answer is still not certain--science moves on. Until better data are available on mothers, we have no good reason to think racial genetic differences exist. 

 

     The second reason that this is controversial is that it has enormous public policy consequences. If science tells policy makers that they can do little about infant mortality, then the research may provide policy makers with an excuse to reduce their efforts to combat infant mortality. Policy makers may conclude that health and educational programs for pregnant women are a waste of valuable resources. Again, we see that research has great political implications, regardless of the avowed neutrality of those doing the research. 

 

     The third reason for controversy is that even if in fact real racial differences exist, someone (probably many someones) will put value connotations on the results. Whites who are racists will presume that this is proof of what they have been sure of all along: intrinsic white superiority. Some blacks will react to these outrageous and unwarranted conclusions and argue that the research is a sign of white racism in the scientific establishment.

 

 

    D. Measurement 


     Measuring the variables in our hypotheses, as you can see in the preceding example, is a very important part of the research process. If done poorly, the results may not test what you intended to test and may even do grave harm to someone. In this brief section, I want to talk about two factors in measurement, accuracy and the precision (or levels) of the measurement.

 

       1. Criteria of accuracy

 

     Whenever we make a measurement, we worry about two criteria or standards of accuracy. We must meet both tests in order to have any certainty about our results.

 

          a. Reliability

 

     By reliability we simply mean consistency of results. A method of measuring is considered to be reliable if different people applying it to the same unit would consistently reach the same conclusion. 

 

     This may be easiest to understand by using an example of an unreliable measurement. Suppose I gave you an elastic ruler and told you to measure the length of this page. I would certainly get a variety of answers from different members of the class. I'd find little consistency. Why? Two problems would create inconsistency in your measurements. First, because the ruler is elastic, it will stretch or contract in measuring any dimension. I could have helped you here by telling you to lay it loosely without stretching or contracting. Second, I didn't describe to you exactly what I meant by the "length."  Is it the longer dimension of a single page or the shorter one? You can't tell if I don't describe to you exactly what I mean.

 

    Public opinion is one of the most problematic areas of political science research in terms of reliability. Questions that are unclear or have multiple interpretations cause reliability problems. Interviewers who influence the people they are interviewing by appearance, tone of voice, or both cause reliability problems. Interviewers who must interpret long and complex answers to "open ended" questions cause reliability problems. "Closed ended" type questions are much more reliable. In "closed ended" questions, all the interviewer has to do is check a box corresponding to one of several fixed answers from which the person being interviewed chooses. 


          b. Validity

 

     By validity, we mean that the measurement actually measures what it is supposed to measure as opposed to measuring something else. For example, if we were to measure weight by using a ruler, we would have an invalid measure of weight. As you can see, in the physical sciences validity is usually pretty obvious.

 

     However, in the social sciences the question of measurement validity is a whole lot more subtle. In addition, as are so many other things in the social sciences, validity can be politically controversial. The first thing to realize is that if a measure is not reliable, it cannot be considered valid.  If a measurement is unreliable, you don't know what you are measuring, so how can it be valid (except by accident). However, it does not work the other way. You can consistently (reliably) measure the wrong thing (make invalid measures). 

 

     After we take care of reliability problems, validity gets a whole lot more complicated, because you can blow the measurement in an infinite number of ways. For example, in the example we used above on infant mortality, the controversy about the role of race rests on questions of the validity of using measures like education to indirectly measure poverty and nutrition. As a rule of thumb, you are more likely to be valid when you measure things as directly as you can rather than using indirect measures.

 

     A second example could be measuring racism or sexism by asking someone whether or not they agree or disagree with negative stereotypes (e.g. men have a hard time making up their minds on even the most simple matters or that men are too stubborn to stop and ask directions). The problem with stereotype measures is that they measure education as much as belief in stereotypes. An educated person knows that disagreeing with such stereotypes is a sign of education, regardless of how one really feels.

 

     Political controversy enters the picture when these measures are used to justify treating people differently. Every qualification test from college boards to promotion exams for fire fighters and police involves questions that presumably validly measure a person's relative ability to be a success in the position for which she is applying. Are the tests valid measures of how well you will succeed? When you say that the SAT was not fair, what you probably meant was that it was not a valid test of your ability to succeed in college. These are the kind of charges of which court cases are made. 


        2. Levels of measurement

 

     Not only do we want to measure accurately, we also want to measure as precisely as we can. We want to do better than the stereotypical ancient peoples who accurately but imprecisely called anything above about a dozen "many."

 

          a. Nominal or qualitative or categorical

 

     Sometimes the best we can do is to distinguish among different qualities or categories. In doing so, we are not measuring more or less of any quality. Because we are not measuring more or less of anything, the categories can be listed in any order. For example measuring party identification can be purely qualitative: Republican, Democrat, or no identification. A second example would be voting choice: Bush, Clinton, Perot or other. Order does not matter, so we could just as easily list them as Clinton, Perot, and Bush. Other examples should be easy for you to think of: race, gender, voter registration status, and so on.

 

          b. Ordinal or ranked

    

     The next level is when we are measuring more or less of some quality so that order does matter, but we are not measuring exact amounts so that we do not know the precise difference between the categories. This means that we cannot perform routine mathematical operations on the measurements like addition and subtraction, and we are not dealing in units like dollars, votes, years, and so on. 

 

     For example, we can take the measurement of party identification that we had above and turn it into an ordinal measurement by adding the strength as well as direction: strong Democrat, moderate Democrat, weak Democrat, no identification, weak Republican, moderate Republican, and strong Republican. Here the order becomes important, although we could quibble about exactly where the category of "no identification" belongs. Even though we are measuring more or less of the quality of "Democraticness" (and "Republicanness"), we do not know the distance between a strong and moderate identifier. It may be more or less than the distance between moderate and weak identifier‑‑we simply have no way of knowing.

 


     Another frequent type of example is the question that asks someone to say how strongly they agree or disagree with some statement. Again, you know the order as signifying more or less agreement (or disagreement), but you do not know precise distances between the categories.

 

          c. Interval

 

     We have interval measurement when we are measuring precise amounts of some property. When you have units of measurement involved (e.g. dollars, years, and so on), you can bet that you have interval level measurement. (Note: this is sometimes called “ratio” data. The only difference is that ratio data has a true zero – that is having zero means having none of whatever it is you are measuring. Except for temperature, where zero still has some degree of warmth, almost anything else is really ratio. But we will keep it simple and just call it all interval.) Having precise amounts is a desirable thing to have, because you can then do things like add, subtract, and multiply and get meaningful results. To put it another way, we can use more powerful statistical tools when we have interval level measurement.

 

     We can't use the example of party identification here because ordinal is as precise as we are able to get on party identification. (Maybe someday someone will come up with some kind of psychological units that can be applied). But we can count votes, get family or hourly income in dollars (be careful here, because income is often grouped‑‑like $10,000 to $20,000 a year‑‑and then it becomes ordinal rather than interval), get a precise age in years (if you think about it, age in years is also really grouped data, but we'll not get too nit picking), or how many times one has voted in the past four general elections. In all of these cases, we have interval measurements.

 

 

VI. Statistics ‑‑ a few basics

 

     Statistics often intimidate people. Having struggled myself for a number of years with statistics and having taught it to others, I am convinced that this intimidation results from not knowing what one is trying to accomplish with statistics. Students learn formulas and apply them to numbers without really understanding WHY they are doing this in the first place. They get lost in the trees and lose sight of the forest.   


     We're not going to go into much detail here (only a few trees), but you are going to be faced with statistics all your life, so you should understand what their purpose is. Statistics are also an important part of political science. My purpose here is to give you an overview of this forest.

 

     Cynics often said that you can make statistics say anything you want them to. People say that statistics lie. In fact, statistics don't lie, people do. People make unreliable and invalid measures and then produce inappropriate statistics and present them to someone who doesn't know the right questions to ask. I hope I can suggest a few of those right questions for you as well as help you understand the purpose for using these complicated things in the first place. If you ever take a statistics course (as I hope you will), you should ask yourself "why am I using this statistic, where am I going with this, what am I trying to show or find out" on at least a daily basis (every five minutes would be better)! 

 

    A. Definition and purpose of statistics

 

     Statistics are numbers that summarize some quality or characteristic of data. Why do we want to do that? The answer is to make things more simple, not complex, as unfortunately often seems the case. We put data (at all levels of measurement) into numerical form in order to condense, summarize, interpret, and analyze when we have to make decisions.

 

      Let me illustrate with a simple example. I have asked you a lot of questions in the course of the semester and I've kept records in many cases of whether or not your answers were right or wrong--grades. Consider your grades to be data. I want to somehow condense all of these data into one single measure so that how much you learned can be quickly summarized in a condensed form on a record (transcript) that future prospective employers can use to evaluate you. I use statistics to do that. You get a grade on each test (a statistic), grades are averaged together (another statistic), translated into a letter grade (another statistic), and then letter grades are averaged together in a weighted way (according to the number of hours per course) to produce a grade point average (another statistic). Now all of this is so familiar to you that you take it for granted. Nevertheless, it is an excellent example of statistics that condense and summarize human behavior.

 

    B. Two general types of statistics

 

       1. Descriptive 


     Descriptive statistics are numbers that summarize some quality or aspect of data. The key word here is summarize. The idea is to simplify so that one number can be used to convey a lot of meaning about the data. Your test average summarizes a lot of meaning about your performance on all tests.

 

       2. Inferential

 

     Inferential statistics are measures that go beyond description and allow us to infer something beyond the data. In inferential statistics, we go from a few particular cases to larger more general conclusions.

 

     All statistics about populations based on surveys involve inferential statistics. Anytime you take a statistic for a sample, like the average income of a sample of 1000 Americans, and infer the average income for all Americans from that statistic, the inferred average is called an inferential statistic. Therefore, descriptive statistics are often used along with the laws of probability to create inferential statistics. The laws of probability can tell us how likely our inferred average income is to be within some given distance of the actual average income. Virtually all public opinion surveys do this kind of thing. We infer what percentage of people will actually vote for a candidate from a sample of people and then add an error factor called sampling error (plus or minus some percentage  depending on the size of the sample). Remember, the point here is to INFER.

 

    C. Examples

 

     Let's start with some data and then use it to illustrate the different kinds of statistics as we go along. Suppose that we have a small nation (VERY small to make it simple) of 22 people. You can add as many zeros as you want to make it larger and more realistic. Adding zeros does not change the math except that zeros get added to the answers as well. Further suppose that of these 22 people, 5 had yearly incomes of $6,000 in American dollars; 4 incomes of $8,000; 3 at $10,000; 3 at $12,000; 3 at $14,000; 2 at $16,000; 1 at $18,000; and 1 at $20,000.  


     Before we go on, I should note that I have already helped you a lot by rearranging and condensing the data. What you would probably have to start with is a list of individuals with their incomes in no particular order ($10,000, $20,000, $6,000, $6,000, $12,000, ... $18,000). I have already rearranged the data by counting the frequency of people at each income level.

 

       1. Descriptive Statistics

 

          a. Measures of central tendency

 

     Suppose you wanted to tell what the typical income was for this nation. Maybe you are interested in comparing its prosperity or standard of living to other nations and want to use income as part of that measure. Or perhaps you work for the nation's government and are producing a brochure to attract new people to move here. In either case you want something to quickly tell about the income of all the people who live here without having to list all the incomes. Another way of saying this is that you want a statistic that tells about what the center of the data are like.

 

     We have invented some numbers (statistics) that tell us what the center of data are like. We call them measures of central tendency. Which ones we can use depend on the level of measurement we have and exactly what we want to know about the center of the data. Each measure tells us something a little different about the center.

 

             1) Mean

    

     You are already familiar with the mean. You probably know it as the "average." To be more precise, it is the arithmetic average. You simply add up the measurements and divide by the number of measurements that were made (one for each unit) In our example the units are people and the measurements are incomes. If you compute the mean for our little nation, it is $10,900, a very respectable number on a per person (usually called "per capita") basis.

 

     The mean gives us the mathematical center where all the units have some influence. The problem is that extreme units have more influence than ones close to the center. For example, which grade affects your final average the most, the 87, the 93, the 79, the 83, or the 32 on that test you shouldn't have taken because you were sick the night before?  


     You can only compute means for interval level measurements. Why? Well, if you think about it, computing a mean for ordinal or nominal data wouldn't make sense because we couldn't add the measurements together. Suppose we had measured the income in three groups: low (under $10,000), medium ($10,000 to $15,000), and high (over $15,000). How do we add together 9 lows with 9 mediums and 4 highs? The addition can't be done using everyday arithmetic. If we had some other kind of measurement that was nominal in nature (e.g. race, with, say, 10 blacks and 12 whites), we still could not compute any mean. So we have other measures of central tendency we use for these other levels of measurement.

 

             2) Median

 

     The median is the value of the middle unit after all the units have been arranged in order of magnitude (from lowest to highest or the other way around). You might think of the median as the center in terms of what the unit in the middle looks like.

 

     This measure of central tendency can be used for either interval or ordinal measurements. All we need to do is order the measurements and then count to the middle one. If we have an even number of measurements, then we go halfway between the two at the middle. 

 

     For example, suppose we want the median of the following party identifications: strong Democrat, strong Democrat, weak Democrat, weak Democrat, no identification, moderate Republican, strong Republican. We have 7 measurements. I listed them in order, so we merely count to the middle one (the 4th), and get weak Democrat as the middle one. Suppose we only had 6 measurements by striking out one of the weak Democrats. Then we would have to go halfway between the 3rd and the 4th measurement (between the weak Democrat and the non‑identifier). What we would have to say here is that the median is "between weak Democrat and non‑identification."

 

     Now let's use the example of incomes. If we ordered the 22 measurements of income, the middle two (the 11th and 12th) are both $10,000. So the median is simply $10,000 a year. You should verify to yourself that this is the correct answer by going back to where I introduced this example and count your way up to the 10th and 11th measurements. (If you can't get it to work out, ASK me to go over it for you. You'll be expected to do it on the next test on your own!) 


     Suppose we had measured income as low, medium, and high as shown in our discussion of means. You saw above that we could not compute a mean. But since this is ordinal measurement, we can compute a median.  It again would be between the 11th and 12th measures. Because there were 9 low incomes and 9 mediums (see above), both the 11th and the 12th would be in the medium category. Therefore, the median would be "medium income" ($10,000 to $15,000).

 

             3) Mode

 

     The mode is the measurement that occurs most often. The mode tells us what the most typical real unit looks like. All you have to do is group the measurements and see how many occur for each value or category, and pick the one with the most. That one defines the mode.

 

      If you think about the way this is defined, the mode can be used for all levels of measurement. If we were looking for the most typical race (nominal level measurement) and there were 12 whites and 10 blacks in our nation, our best guess would be white. White is the "modal" racial category. 

 

     If we had measured income as low, medium, and high as discussed above (ordinal level measurement), we would have no unique mode as there are 9 measurements in both the low and medium categories. So in this case we would have to say that two modes exist (or it is "bimodal").

 

     If we use our original measures of exact incomes (interval level measurement), we can still compute a mode. It is $6,000 because that is the measurement that appears most often.

 

             4) Relationship to the frequency distribution‑‑or why you should ask for all three

 

     The frequency distribution is the frequency of units at each measurement or value. When I presented the data from the incomes example to you originally, I gave you the frequency distribution‑‑5 at the value of $6,000 and so on. Frequency distributions are often presented in tables and graphs. In tabular form our data would look as follows:

 

 

 

 

                 VALUE          FREQUENCY

                $6,000                     5

                $8,000                     4

               $10,000                    3

               $12,000                    3

               $14,000                    3

               $16,000                    2

               $18,000                    1

               $20,000                    1

                                             ____

                               TOTAL   22

 

     In bar graph form, the frequency distribution for these data would be shown as follows:

 


     If a picture is worth a thousand words, then perhaps the three statistics that we have computed for these data (the mean, median, and mode) are almost worth a thousand words as well. Why? Knowing the three measures of central tendency tells us a great deal about what the frequency distribution looks like‑‑IF we know how to interpret them. Let's write them down and see. 

 

                           mean: $10,900

                         median: $10,000

                           mode:  $6,000

 

     The first thing you should notice is that they are different. Why? They are different is because of the shape of the distribution. If we had a perfect "bell shaped" distribution, all three measures would be the same. Suppose the mean, median, and mode were all at $13,000. that would mean that just as many people were above $13k as below $13k. It would mean that the most frequent income was $13k. You could still have extreme cases at the end, but they would have to balance each other out.

 

 

     What has happened in the actual distribution is that having most of the values above the single most frequent one (the mode) pulled the median above the mode and a few extreme high income values pulled the mean over the median. If the median and mean had been below the mode, then we would know that most of the values fell below the mode and a few extreme values fell way below the mode and median. 


     So what? If you know the three measures of central tendency, you know something about the shape of the distribution. If they are the same, the distribution is "bell shaped."  If they are different, the distribution is stretched (the formal term here is "skewed") toward whichever extreme the mean is at‑‑in this case, skewed high or to the right.

 

     The second answer to so what is that it matters to know all three because ALL THREE CAN BE CALLED AVERAGES. The person who is presenting the data may have a reason to make the average seem high or low. If they wanted to attract more people to move to our little nation, obviously they would use the mean. If they wanted to attract business looking for people willing to work for low wages, they would use the mode.

 

     The moral of this story: when someone presents you with an "average" in order to prove something, ask WHAT KIND OF AVERAGE and ASK FOR OTHER "AVERAGES" as well. Knowing may prove valuable to you!

 

          b. Measures of dispersion

 

     By measures of dispersion we mean statistics that summarize how spread‑out the data are. We will present three of these below.

 

             1) range

 

     Range refers to the distance between the extreme values of the measurements. Range is computed by simply subtracting the high extreme from the low extreme. This does not make any sense unless we have interval level measurement because at the other two lower levels (ordinal and nominal), we cannot even talk about distance.

 

     In our income example, the range would be $14,000. You get this by subtracting $6,000 from $20,000. A $14,000 difference exists between the lowest unit and the highest unit.

 

             2) percentile 


     Percentile is defined as the value or measurement at which some given percent of the scores fall below. You've probably seen percentiles in reporting the results of standardized tests like SAT scores. Percentiles can be used for both interval and ordinal data, because all we need is ordered data, not exact measurements.

 

     Again, using the income data, the 50th percentile would be the measurement at which 49% of the measurements fall below. That would be the bottom eleven. Counting up, the 50th percentile is then $10,000. (If you think about it, the 50th percentile is always the same as the median.) The 90th percentile would be the measurement at which 89% of the scores fell below. That's .9 x 22 = 19.8 or 20 scores (rounding up). Counting up, $18,000 is the 21st score‑‑20 scores fall below it. So $18,000 is the 90th percentile, or as close as we can get to it with rounding. To be more precise, $18,000 is actually the 90.9th percentile because 20 of 22 scores fall below it  (20/22 = .909).

 

    When you scored in the 65th percentile on your SAT's, that means that 64% of those who took the test scored below you and 34% above you.

 

             3) standard deviation

 

     This is the most complicated measure of dispersion.  Mathematically, it is defined as the square root of the average (mean) squared distance from the mean.

 

     I know that sounds complicated. But if you think about how standard deviation is computed, it makes sense. What we want here is an average distance from the mean. The bigger this average distance, the more spread out the data. That much should make sense. But why the squaring and then square root? The answer is to get rid of the minus signs. Because some of the scores are above and some below the mean, simply adding them up would cause them to cancel each other out. In fact, you would get zero every time because the average distance from the mean is zero. That's how the mean is computed. The cases on one side would cancel out those on the other. So we square to get rid of the minus signs and then take the square root after we compute the average squared distance. (By the way, the average squared distance is called the variance, and is used in building a lot of other statistical formulas that need to account for how spread out the data are.)

 

     If you really want to test yourself, try and compute the standard deviation of our income data using the definition as a guide. Show me what you did and I'll tell you if you did it right!

 

          c. Measures of relationships 


     Often we are interested in seeing if a relationship exists between changes in two variables. Does income change as education changes? Does party identification change as income changes? These are the kinds of questions that we must answer in order to test hypotheses and build scientific explanations. (Remember the arrow diagrams?)

 

     You can do this in many ways. Unfortunately (or maybe fortunately?), most are beyond the scope of this course. However, you will learn one way of doing this in the next chapter‑‑crosstabulations (sometimes called two-way frequency distributions).

 

        2. Inferential: moving from a sample to a population

 

     Anytime we move from a sample to make generalizations about some larger population from which the sample was chosen, we have entered the realm of inferential statistics. Therefore, all the descriptive statistics we have discussed could be used as inferential statistics if they were first computed for a sample and then we inferred from them the same descriptive things about the general population from which the sample was chosen. So you really have nothing new to learn here except the notion of sampling, and the constraints on inferring, called significance. Here I merely want to introduce the notions. You will learn more about them in the next chapterthat concerns public opinion and surveying.

 

     You might think about our little nation and incomes that we have been using as an example up to this point. Suppose that we are no longer talking about the whole nation, but rather a sample of 22 people from a much larger nation. Then we would use the descriptive statistics that we calculated for the sample to infer descriptive things about the whole nation.

 

          a. Sampling

 

     All we mean here is the process of choosing a sample from some larger population. Ideally, we want to choose the sample so that every unit in the larger population has an equal chance of being chosen‑‑that's called a simple random sample. The goal is to choose a sample that is representative of the general population, and getting a simple random sample is one excellent way of doing that. How this is done in practice is discussed in the next chapter. 


          b. Statistics that measure "significance"‑‑or how likely is it that my sample statistics are close to being right for the larger population

 

     In science generally and in survey research particularly, we take a very conservative position on significance. We are unwilling to accept any inferred fact unless there is at least a 95% chance that we are correct (or a 5% chance of being wrong). All of our formulas for calculating sampling error are based on this position (which is called a significance level). You will see this again in the next chapter in the discussion of the expected error in public opinion surveys. To put this a slightly different way, if the sampling error in a survey is said to be plus or minus 3%, then a 5% chance exists that the truth for the population is outside of this range.

 

     Knowing the things we have discussed will certainly not make you an expert in statistics. But at least you get an idea of some of the kinds of questions you need to ask when someone attempts to "prove" their point by statistics. If you don't know what else to ask, just ask "Exactly how did you measure that?" or "How did you compute that?"

 


KEY TERMS

 

scientific

replication

three goals of science

relationship between

   explanation and prediction

seven steps of scientific

   research

assumptions of scientific

   research

conditional explanations

probabilistic explanations

partial explanations

unexplained variance

open explanations

units of analysis

properties and variables

constants

causal relationships

arrow diagrams

independent variable

dependent variable

intervening variable

conditioning variables

reciprocal relationships

symmetrical relationships

spurious (non)relationships

controlling for a variable

reliability of measures

validity of measures

three levels of measurement:

nominal, ordinal, & interval

statistics

descriptive statistics

inferential statistics

measures of central tendency

mean

median

mode

frequency distributions

bar graphs

bell shaped distributions

measures of dispersion

range

percentile

standard deviation

sampling

significance level