Chapter 5. Operationalization, Units of Analysis,
and Levels of Measurement
We have already introduced the concept of operationalization and talked
about the reliability and validity of the operationalized concepts, which
we call variables if the properties of those concepts varied when the measurement
rules are applied to them. The material in this section is really a continuation
of the last section. We will talk about exactly what kinds of things to which we
will apply the rules of operationalization. these things are called "cases" or "units of analysis."
We will discuss
where we might find these cases (surveys, etc). Finally, we will discuss levels of measurement
(nominal, ordinal, interval, ratio, and dichotomous). It turns out that levels
of measurement are critical in determining the kinds of statistical tests we can
do in the analysis stage of research.
Cases--Units of Analysis and Population
This refers to the thing to which we apply the rules of measurement. In social science it is often a person, because social science studies people. However, it could be things other than people: cities, constitutions, groups, countries, laws, and so on. In short, we are talking about the item for which we have data. Each thing that we perform measurements on is a case or unit of analysis. If you are not sure, just ask "what is the ______'s measurement," where measurement refers to what you are trying to measure, like party identification, or level of democracy, or freedom of the press, or whatever. Whatever you fill in the blank is the unit of analysis. So if we are trying to measure the candidate's party identification, the unit of analysis is each candidate. Another term that you will sometimes see in computer science or business is "record." It took me a while to figure this out, but the terms record, unit of analysis, and case all refer to the same thing--they are interchangeable--simply jargon from different fields.
All of the units to which the measurement could be applied (and later we will see that this is all the units to which the hypothesis applies) is called the population. It is the group of units in which we are ultimately interested.
Units of analysis comprise data, which can be classified as either individual and aggregate data. Individual data are when we measure something about a single entity, such as a person or a constitution or an international treaty. Aggregate data are when we measure something about collective group of entities, such as a county or state or nation when we take use measurements about all the sub-entities in the collective. For example, we might compare nations with respect to quality of health care based in longevity of people in the nations, say an average life expectancy at birth. We might compare the influence of Baptists in counties in South Carolina counties by comparing the percentage of Baptists in each county. That percentage is aggregate data. Of course, it ultimately rests on individual data, the religion of individual people in a county, but it is aggregate because we sum it up in some way an apply a single measure for the entire county.
This can get a little confusing when we have a unit of
analysis that could have both kinds of data associated with it. For example,
consider states. If we sum qualities of the people in a state up and come up
with a single measure of something for all those people for a state, we are
dealing with aggregate data. But if we take some quality that fits the state as
a whole but is not based on the individual people in the state, it is individual
data. Thinking about it this way, average years of education completed is
aggregate, but whether a state has a cabinet form of government or not is
individual. Just think of what is measured and how it
is put together and you should be ok.
Sources for Data
This is pretty simple. If you use data that someone else collected, as we do in using the data sets provided by MicroCase, it is called secondary data. If you collect the data yourself, it is called primary data. In using secondary data, you can not always get the measurements of all the variables in which you are interested, so sometimes you have to make substitutions. They may not be as valid as we would like--like substituting education for income, things that you know should be at least moderately associated with each other. Whether you use surveys, experiments, direct observation, content analysis, or data from public records or other archives depends on the problem you are interested in studying.
As a rule, you should use data that are closest to the units of analysis in your problem. So if you are interested in citizen knowledge and participation, your data should come from surveys of citizens. However, you could also compare groups of citizens using aggregate data to see if things like educational spending in a state is associated with higher rates of voting in that state.
Suppose I was interested in seeing how religion, specifically Baptists, affected how people voted on amending the South Carolina Constitution in the 2000 referendum to allow a state lottery. I could survey voters across the state and see how religious affiliation related to vote. That may not be possible because of time (we are years after that election) or cost. On the other hand, I could use previous surveys that were done at the time, if they asked the right questions. That would be secondary data. Or I could use secondary data that is also aggregate data on counties that include the percentage of Baptists and percentage voting for or against the amendment.
Levels of Measurement
Although we will cover five levels of measurement, for practical purposes we will only use four because one of them is rarely if ever found in political science and even if it were, we can treat it the same as another of the levels. Why should we care about these different levels. The level of measurement is important because it dictates the kind of statistical analysis that we can do. If you have a choice, higher is better. Let's start with the lowest and proceed to the highest level, The fifth level is really a special case that has unique characteristics that allow it to be treated as several of the other levels.
1. Nominal or categorical. This is just categories that fall in no particular order, e.g. religious affiliation. It does not measure more or less of anything.
2. Ordinal. Here we are measuring more of less of something, but not in exact amounts, e.g. slightly, moderately, or strongly agreeing with some statement. You know that strongly agreeing is more agreement than moderate agreement, but you are not sure exactly how much more.
3. Interval. This level measures more or less in exact amounts but without an absolute zero. For example, the year in which some event took place, like a birth, can be considered interval, because unless we date things back to the big bang or whatever, there is not absolute year of zero. Most things we can think of do have a real zero, like 0 income, 0 vetoes cast by a president, 0 years of education, and so on. So this one we can practically ignore. I would warn you here that often people will use this term interchangeably with the next level of measurement (as I do in my APLS 110 course to keep things simple).
4. Ratio. Same as interval, in exact amounts, but it makes sense to say that a unit has zero of whatever it is that we are measuring. For example, 0 years of age is when one is born. Or 0 education means that no years of school were completed, or 0 income, and so on. You can do a lot of math with this level of measurement. One caveat exists, however. Usually we do not measure things exactly. We do not measure exact age, as it changes every second. So strictly speaking, we often group ratio measurements to the nearest whole unit, whether it be years or nearest thousand dollars of income. So strictly speaking, most ratio measurements are really ordinal. However, as a rough guideline, we can pretend that it is ratio if we have in the neighborhood of 7 or more groups. If you group the ratio data in really broad groups (usually less than 7), it should be treated like ordinal data. So for example, if we took years of education and grouped it so that we had, less than high school, high school degree, some college, and college degree of more, 4 groups, it should be treated like ordinal data.
5. Dichotomous. If you have only two groups so that the data
is either one thing or not that thing (male and not male is the same thing
as male or female for this purpose), then it is dichotomous. All yes/no
questions are dichotomous. What is neat about dichotomous measurements
is that we can treat them like ANY of the other levels statistically. You
will see this later when we start doing some statistics.
Assignment:
I. Again go to some journal articles--anything in social science will so this time. Find TWO articles (different ones than you have used before) and answer the following:
1. What are the units of analysis?
2. What is the population?
3. Is it primary or secondary data? Is it individual or aggregate?
4. What is the source of the data?
5. What are the variables and what is the level of measurement of each?
Make sure that you find real research articles, not book reviews or something else!
II. Answer the following questions.
1. What is the unit of analysis in each of the following? a) a telephone survey. b) an exit poll. c) a comparison of health care systems in nations around the world. d) the type of governments cities have: strong mayor or council manager. e) cost of presidential elections since 1952. f) veto success of presidents: percentage upheld
2. Are the following aggregate or individual data? a) per capita weekly take-home pay in each nation across the world converted to 2000 U.S. dollars. b) average number of people checking books out each month per 1000 population in each county across the nation. c) public library budgets in each county across the nation. d) type of constitutional system, presidential or parliamentary in democratic republics around the world. e) voting choice of voters in an exit poll
3. Are the following primary of secondary data? a) when our class uses data from last year's class to test hypotheses. b) census data. c) testing hypotheses using the survey we perform this year. d) the National Election Survey (NES) that is done every election and available through MicroCase
4. What is the level of measurement of each of the following? a) age measured in years. b) class year in school: freshman, sophomore, and so on. c) hours completed towards degree d) student athlete or non-student athlete. e) on or off campus residence. f) major field of study: poli sci, nursing, etc. g) code for major field of study: 100 for no major, 157 for poli sci, 160 for psychology, 175 for sociology; 961 for nursing and so on. h) whether or not a respondent considers her or himself to be a supporter of the Tea Party Movement. i) Grade I give you for this homework assignment (S,M,U)
Copyright, Robert Botsch, 2009-10