PROFESSOR JOHN K. WILDGEN
To study politics, Harold Lasswell (1936) wrote, one must focus on the exercise of influence and the influentials who exist within societies and who shape societal value patterns by their determination of who gets what, when and how.
The involvement of the military in determining who gets what, when and how in Africa South of the Sahara is Robert W. Jackman’s focus in his “The Predictability of Coups d’Etat:
Model with African Data.” Coups d’etat, their causes, and their consequences have long been of intense interest to both policymakers and scholars, but explaining and predicting military coups are not an easy task because they are the result of a complex mix of historical, political, personal, economic, military, social, ethnic, and cultural factors.
Jackman used six different news chronologies to collect his data in order to measure the incidence of coups d’etat for 30 Black African States from 1960 through 1975 (Jackman, 1978: 1264-1265).
He gave each state the sum of one point for each reported coup plot, three points for each attempted coup, and five points for each successful coup. His index, which includes all reported coup related events, became Jackman’s dependent variable.
Jackman’s statistical model was quite robust. With only four substantive independent variables and three interaction terms he was able to explain 84 percent of the variation in coups d’etat across 29 Black African states from 1960 through 1975. Four independent variables are social mobilization (M), ethnic
dominance (C), party dominance (D) and turnout (P), and three interaction terms are D*P, C*D and C*P.
R2 = .84 reminds what Lewis-Beck emphasized in his “Applied regressions:
An Introduction”; that means, one rather sure symptom of high multicollinearity is a substantial R2 for the equation but statistically insignificant coefficients. A weaker signal is the regression coefficients, which change greatly in value when independent variables are dropped or added to the equation.
A frequent practice is to examine the bivariate correlations among the independent variables, looking for coefficients of about .8, or larger. The preferred method of assessing multicollinearity is to regress each independent variable on all the other independent variables (Michael Lewis-Beck, 1990:26).
This doubt is confirmed by Johnson, Slater and McGowan’s replication in which they found out that Jackman (1978:1269) offers evidence in the form of only six bivariate correlations among his four substantive independent variables that multicollinearity “is not a problem with this model.” Clearly, six bivariate correlations cannot demonstrate that all possible combinations of Jackman’s seven independent variables are not collinear for his data set. By computing Farrar and Glauber’s F test, they realized that only social mobilization is not collinear with the other explanatory variables.
When they tried to fit Jackman’s model to their expanded data set, problems immediately became evident. Among Jackman’s seven variables, only electoral turnout remains significant, and the entire regression accounts for only 43 percent of the variance, and only 30 percent when degrees of freedom are considered. Jackman’s results suffer a nearly complete breakdown because of the inclusion of six new cases that Jackman could have studied but did not; therefore, the external validity of Jackman’s original results would be open to serious criticism.
Multicollinearity “is used to denote the presence of linear relationships among explanatory variables” (Koutsoyiannis, 1973:225).
Multicollinearity will increase the standard error of each regression coefficient associated with a collinear explanatory variable while at the same time the estimated coefficients remain unbiased. Multicollinearity, therefore, can make it difficult to determine the independent impact of collinear variables in a multiple regression. As a consequence of these inflated standard errors, theoretically important explanatory variables may appear to be insignificant and hence may be dropped from the equation, thereby introducing the theoretically far more serious problem of misspecification bias (Tufte, 1974:148-155).
In addition, Slater and McGowan also found out that the plot of the residuals shows a heteroskedastic pattern in which African states with low values of their index tend to have negative residuals (more military intervention predicted than observed) and in which states with high scores on their index have quite large positive residuals (considerably less predicted intervention by the military than that actually recorded). The plot strongly suggests that the poor fit is the consequence of misspecification. Jackman’s “The Predictability of Coups d’Etat: A Model with African Data” has violated four assumptions of the regression model in regard to Lewis-Beck’s arguments: three of them are specification error, herteroskedasticity and multicollinearity (Michael s. Lewis-Beck, 1990:26).
All technical tactics to get a high R2 were negated by Gary King’s “How Not to Lie with Statistics:
Avoiding Common Mistakes in Quantitative Political Science.” R2 is often called the “coefficient of determination.” The R2 statistic is sometimes interpreted as a measure of the influence of X on Y. Others consider it to be a measure of the fit between the statistical model and the true model. A high R2 is considered to be proof that the correct model has been specified or that the theory being tested is correct.
According to King, all these interpretations are wrong. R2 is a measure of the spread of points around a regression line, and it is a poor measure of even that (Achen, 1983). Worse is that there is no statistical theory behind the R2 statistics.
Thus, R2 is not an estimator because there exists no relevant population parameter. All calculated values of R2 refer only to the particular sample from which they come, King argues. Thus, the high R2 is generally good news, but most of the useful information in R2 is already available in other commonly reported statistics (Gary King, 1986:575-579).
The failure to carefully trace out the correspondence to the theoretical assumptions can lead to incorrect conclusions equally as well as the application of a technique to an unsuitable level of measurement. That is Jackman’s fourth violation of the regression model’s assumptions: statistical technique.
Regression has been extensively analyzed and formally developed. Only it is suitable for estimation of more complex relationships and multiple equation models, e.g., simultaneous equations, causal modeling, etc. Jackman’s dependent variable, with three categories (reported coup plot, attempted coup and successful coup) cannot be classified into intervally-measured level by giving one point for the first event, three points for the second, and five points for the third.
There are a variety of powerful techniques, all grounded on a solid base of statistical theory. One or more of the techniques suitable for whatever level of measurement is used for the dependent variable. If one’s data are measured on less than an interval scale, one has only the choice of using regression erroneously or using an appropriate but weaker method of analysis. Alternative techniques exist.
Employing a procedure assuming a different level of measurement can seriously affect the estimates and lead to incorrect inferences and hypothesis tests. Thus, Jackman’s comparison application of OLS to a non-interval variable seriously underestimated the overall fit of the model to the data.
It is clear that the levels of measurement problem is real. Yet is is but one important factor in choosing an appropriate statistical technique. Other aspects must also be considered; that means, the differences in assumptions of the techniques.
Probit necessarily assumes that the stochastic term is normally distributed, an assumption which may not be necessary in all instances in using OLS regression. The discriminant procedure assumes that observations on the independent variables are normally distributed within each category of the dependent variable. This assumption is much stronger than any comparable assumption made in probit and OLS. Therefore, the plausibility of these and other assumptions must be weighted along with the level of measurement.
According to Aldrich and CNudde (1975), the level of measurement of the dependent variable crucially affects the selection of statistical techniques. The three techniques are ordinary least squares regression for intervally measured, probit for ordinally measured, and discriminant analysis for nominally measured dependent variables.
Ordinary least squares (OLS) is one criterion for selecting the best fitting line — the line that makes the sum of the squared e values smallest. The values of a and b for this line are the OLS estimates of the regression coefficients. If the model were true, then when we plot the predictions and the actual scores, the intersections between these two sets of points would form a particular type of linear function.
The choice of a statistical method inappropriate for the substantive and theoretical concern leads just as surely to worthless research. For nominal, dichotomous and other categorically measured dependent variables such as Jackman’s, alternative procedures should be employed.
The as-yet unaddressed problem of how to score the dependent variables remains, in that coups, plots, and attempted coups are irregular events and a time series of such events would be mainly zeros recording no events in most time periods. The weights used here are arbitrary, and this dependent variable should be measured by other statistical techniques. First of all, if the dependent variable is categorically measured, then the estimating technique should yield equivalent results if the arbitrary order is a,b,c or c,b,a, or any other permutation. Neither OLS nor probit would meet this condition. The technique we consider for the nominal level variable is discriminant analysis according to Aldrich and CNudde (1975).
In discriminant analysis the data cases are members of two or more mutually exclusive groups. The objective of discriminant analysis is to accurately predict the outcome of future incidents. Thus, future incidents can be considered as “ungrouped” or “unclassified” cases. “Interpretation” is to study the way to discriminate between the groups on the basis of some set of characteristics. The characteristics used to distinguish among the groups are called “discriminatory variables.”
In short, discriminant analysis is used to study the differences between two or more groups and a set of discriminatory variables (William R. Klecka, 1980:8-11). Our predicting the category of Y depends upon the values of the observation taken on each independent variable. These values can be shown as a vector of observations X = (xu x2,… xn) , which is also equivalently shown as a point in a geometrical space of n dimensions.
In discriminant analysis, we must assume that the independent variables are normally distributed, given a particular value of Y. One way of conceptualizing the purpose of discriminant analysis is to define regions of classification (separated by their boundaries) which maximize the variation within the predicted classes as a proportion of the total variance. By changing the value(s) of the constant(s), the boundaries between regions are changed, but are only changed by defining a new set of lines “parallel” to the old.
Secondly, if we can set these events on the stability-coup scale according to their seriousness, this dependent variable can be measured in the ordinal level.
stability plot attempted coup coup
Since there are so many variables that are ordinally measured, an alternative model and associated estimating technique are designed to substitute OLS without losing the power of OLS: probit model. Probit is short for a “probability unit”. Estimates for the probit model are developed by the method of maximum likelihood. This method capitalizes on the assumed normality of the error term. The maximum likelihood criterion is invoked by selecting those values which have associated with them the highest probability of having obtained the observed sample data. The fact that only an ordinal dependent variable is observed limits and weakens the interpretation of the coefficients; that means, the slope or b coefficient cannot be interpreted (as in OLS regression) as the amount of change in the dependent variable for a one-unit change in an independent variable. OLS regression yields a strictly linear prediction, while probit leads to quite nonlinear predictions. At the two bounds of probability, the probit model curves or flattens to approach 0 and 1 only in the limit, while OLS exceeds the limits by large amounts.
If y is trichotomous, we could align the numbers 1,2,3 to the three categories in order. However, equally appropriate would be the alignment of -100000, 999, 1000. The use of a dichotomous probit is insensitive to such shifts, but OLS regression could lead to dramatically different estimates in the two cases.
Probit is another criterion for selecting the values of a and b that maximize the likelihood that the observed variances and covariances of the X and Y variables could arise as sampling fluctuations if a and b were taken as true population coefficients — or maximum likelihood estimation (MLE). Thus, the conceptual difference between OLS and MLE, is that OLS is concerned with picking parameter estimates that yield the smallest sum of squared errors in the fit between the model and data, while MLE is concerned with picking parameter estimates that imply the highest probability or likelihood of having obtained the observed sample Y (Aldrics & Nelson, 1989:50-51).
Thirdly, if Jackman’s major focus was coups d’etat, plots and attempted coups will be regarded as invalid events, and his dependent variable should be classified as a dichotomy. In this case, Jackman’s mistake is using dichotomous dependent variables in regression, a linear model. Doing this can yield predicted probabilities greater than or less than zero, heteroskedasticity, inefficient estimates, biased standard errors, and useless test statistics (Gary King, 1986).
We often run into cases in the real world when the dependent variable is dichotomous. Researchers often deal with such situations by using discriminant analysis, weighted least squares regression, ordinary least squares regression. These methods can lead to misinterpretation of the results. Logit regression allows the researcher to evaluate the impact of a set of predictor variables on a dichotomous dependent variable without these problems.
Logit is well suited to many kinds of data frequently found in sociological research. So many of the dependent variables of interest in our discipline are dichotomous in nature, while many of the independent variables impacting on them are measured at other levels.
As Kachigan (1987:375) put it, “if ever there is a cardinal sin in statistical analysis it is to use a weaker analysis when a more powerful and efficient analysis is readily available.”
Logit circumvents the problems caused by categorization of continuous variable with the loss of a large amount of information, or by data transformation, forcing one’s model to conform to a straight line, and should be used in many cases where researchers have used OLS, WLS, or discriminant analysis.
No other technique will allow the researcher to analyze the effects of a set of independent variables of a dichotomous dependent variable with such minimal statistical bias and loss of information, Anthony Walsh (1987:178) emphasizes.
OLS regression fits a straight line which minimizes the sums of squares in vertical dimension, the computed beta assumes a constancy of change; that means, one unit increment in the independent variable has the same effect on Y regardless of the position of the independent variable on its scale. However, the actual coup does not conform to this linear constancy or change assumption. There appears to be a “threshold” level on the coup seriousness scale when the addition of another point radically affects the chance of coup versus non-coup; that means that in the middle of the range a change of one point has a much larger effect on the probability of coup that at the low and high level ends of seriousness due to S-shaped line. With logit, unlike OLS, the effect of Y for an additional unit increase in X is not constant over the range of X values. This “variable effect” is represented as a nonlinear logit function.
In addition to choosing an appropriate statistical technique for his dependent variables, Jackman needs to consider problems caused by the redundancy of independent variables. Having too many independent variables, which means the same thing, is not a good way to enhance the explanatory power. To improve the use of data analysis, factor analysis should be used to sort variables into different factors, and then, indexes.
In the factor analysis model, there are many observed variables from which the goal is to derive underlying (unobserved) factors. Factor analysis assumes that the observed variables are linear combinations of some underlying (hypothetical or unobservable) factors. Some of these factors are assumed to be common to two or more variables and some are assumed to be unique to each variable. The unique factors are then assumed to be orthogonal to each other. Hence, the unique factors do not contribute to the covariance between variables. Only common factors contribute to the covariation among the observed variables.
There are three steps a researcher usually employs in obtaining solutions to exploratory factor analysis: the preparation of an appropriate covariance matrix, extraction of initial (orthogonal) factors, and rotation to a terminal solution (Kim & Mueller, 1989:8-10).
In conclusion, the specific research question addressed should be the basic determinant of the sample used in any crossnational study. From Jackman’s study of the African military coups d’etat, researchers can apply appropriate techniques and methods to study coups d’etat in Latin American, Middle Eastern, and Asian third world states, creating the universe of cases examined in such studies. Statistical techniques — OLS, discriminant analysis, probit, logit — therefore, have a great impact on theoretical and technical thinking about IR.
Achen, Christopher H. Interpreting and Using Regression. (California: Sage Publications) 1982.
Aldrich, John and Cnudde, Charles F. “Probing the Bounds of Conventional Wisdom: A Comparison of Regression, Probit and Discriminant Analysis.” American Journal of Political Science, vol. XIX, no. 3, August 1975:591-608.
Aldrich, John H. and Nelson, Forrest D. Linear Probability, Logit and Probit Models. (California: Sage Publications) 1984:50-51.
Jackman, R.W. “The Predictability of Coups d’etat:
A Model with African Data.” American Political Science Review, February 1978:1262-1275.
Kim, Joe-on and Mueller, Charles. Factor Analysis:
Statistical Methods and Political Issues. (California: Sage Publications) 1989.
King, Gary. “How Not to Lie with Statistics: Avoiding Common Mistakes in Quantitative Political Science.” American Journal of Political Science, vol. 30, March 1986675-678.
Klecka, William. Discriminant Analysis. (California: Sage Publications) 1980.
Koutsoyiannis, A.c. Theory of Econometrics: An Introductory Exposition of Econometric Methods. (New York: Barnes and Noble) 1973.
Lasswell, Harold. World Politics and Personal Insecurity. (New York: The Free Press) 1965.
Lewis-Beck, Michael. Applied Regression: An Introduction. (California: Sage Publications) 1990.
Tufte, E.R. Data Analysis for Politics and Policy. (New Jersey: Prentice-Hall) 1979.