Abstract
Social scientists and statisticians often use aggregate data to predict individual-level behavior because the latter are not always available. Various statistical techniques have been developed to make inferences from one level (e.g., precinct) to another level (e.g., individual voter) that minimize errors associated with ecological inference. While ecological inference has been shown to be highly problematic in a wide array of scientific fields, many political scientists and analysis employ the techniques when studying voting patterns. Indeed, federal voting rights lawsuits now require such an analysis, yet expert reports are not consistent in which type of ecological inference is used. This is especially the case in the analysis of racially polarized voting when there are multiple candidates and multiple racial groups. The eiCompare package was developed to easily assess two of the more common ecological inference methods: EI and EI:R\(\times\)C. The package facilitates a seamless comparison between these methods so that scholars and legal practitioners can easily assess the two methods and whether they produce similar or disparate findings.Ecological inference is a widely debated methodology for attempting to understand individual, or micro behavior from aggregate data. Ecological inference has come under fire for being unreliable, especially in the fields of biological sciences, ecology, epidemiology, public health and many social sciences. For example, Freedman (1999) explains that when confronted with individual level data, many ecological aggregate estimates in epidemiology have been proven to be wrong. In the field of ecology Martin, B. A. Wintle, J. R. Rhodes, P. M. Kuhnert, S. A. Field, S. J. Low-Choy, A. J. Tyre, and H. P. Possingham (2005) expose the problem of zero-inflation in studies of the presence or absence of specific species of different animals and note that ecological techniques can lead to incorrect inference. Greenland (2001) describes the many pitfalls of ecological inference in public health due to the nonrandomization of social context across ecological units of analysis. Elsewhere, Greenland and J. Robins (1994) have argued that the problem of ecological confounder control leads to biased estimates of risk in epidemiology. Related, Frair, J. Fieberg, M. Hebblewhite, F. Cagnacci, N. J. DeCesare, and L. Pedrotti (2010) argue that while some ecological analysis can be informative when studying animal habitat preference, existing methods of ecological inference provide imprecise information on variation in the outcome variables and that considerable improvements are necessary. Wakefield (2004) provides a nice comparison of how ecological inference performs across epidemiological versus social scientific research. He concludes that in epidemiological applications individual-level data are required for consistently accurate statistical inference.
However, within the narrow subfield of racial voting patterns in American elections ecological inference is regularly used. This is especially common in scholarly research on the voting rights act where the United States Supreme Courts directly recommended ecological inference analysis as the main statistical method to estimate voting preference by racial group (e.g. Thornburg v. Gingles 478 U.S. 30, 1986). Because Courts in the U.S. have so heavily relied on ecological inference, it has gained prominence in political science research. The American Constitution Society for Law and Policy explains that ecological inference is one of the three statistical analyses that must be performed in voting rights research on racial voting patterns.1 As ecological inference evolved a group of scholars developed the eiPack package and published an article in R News announcing the new package (Lau, R. T. Moore, and M. Kellermann 2006).
This article does not conclude that ecological inference is appropriate or reliable outside the specific domain of American elections. Indeed, scholars in the fields of epidemiology and public health have correctly pointed out the limitations of individual level inference from aggregate date. However, its application to voting data in the United States represents one area where it may have utility, if model assumptions are met (Tam Cho and B. J. Gaines 2004). Indeed, the main point of our article is not to settle the debate on the accuracy of ecological inference in the sciences writ large, but rather to assess the degree of similarity or difference with respect to two heavily used R packages within the field of political science, ei and eiPack. Our package, eiCompare offers scholars who regularly use ecological inference in analyses of voting patterns the ability to easily compare, contrast and diagnose estimates across two different ecological methods that are recommended statistical techniques in voting rights litigation.
Today, although there is continued debate among social scientists (Greiner 2007, 2011; Cho 1998) — the courts generally rely on two statistical approaches to ecological data. The first, ecological inference (EI), developed by King (1997), is said to be preferred when there are only two racial or ethnic groups, and ideally only two candidates contesting office. However, Wakefield (2004) notes that EI methods can be improved with the use of survey data as Bayesian priors. The second, ecological inference R\(\times\)C (R\(\times\)C) developed by Rosen, W. Jiang, G. King, and M. A. Tanner (2001), is said to be preferred when there are multiple racial or ethnic groups, or multiple candidates contesting office. However, it is not clear that when faced with the exact same dataset, they would produce different results. In one case, analysis of the same dataset across multiple ecological approaches found they tend to produce the same conclusion (Grofman and M. A. Barreto 2009). However, others have argued that using King’s EI iterative approach with multiple racial groups or multiple candidates will fail and should not be relied on (Ferree 2004). Still others have gone further and stated that EI cannot be used to analyze multiple racial group or multiple candidate elections, stating that “it biases the analysis for finding racially polarized voting,” going on to call this approach “problematic” and stating that “no valid statistical inferences can be drawn” (Katz 2014).
As with any methodological advancement, there is a healthy and rigorous debate in the literature. However, very little real election data has been brought to bear in this debate. Ferree (2004) offers a simulation of Black, White, and Latino turnout and voting patterns, and then examines real data from a parliamentary election in South Africa using a proportional representation system. (Grofman and M. A. Barreto 2009) compare an exit poll to precinct election data in Los Angeles, but only compare Goodman’s ecological regression against King’s EI, using the single-equation versus double-equation approach, and do not examine the R\(\times\)C approach at all.
The challenges surrounding ecological inference are well documented. Robinson (2009) pointed out that relying on aggregate data to infer the behavior of individuals can result in the ecological fallacy, and since then scholars have applied different methods to discern more accurately individual correlations from aggregate data. Goodman (1953, 1959) advanced the idea of ecological regression where individual patterns can be drawn from ecological data under certain conditions. However Goodman’s logic assumed that group patterns were consistent across each ecological unit, and in reality that may not be the case.
Eventually, systematic analysis revealed that these early methods could be unreliable (King 1997). Ecological inference is King’s (1997) solution to the ecological fallacy problem inherent in aggregate data, and since the late 1990s has been the benchmark method courts use in evaluating racial polarization in voting rights lawsuits, and has been used widely in comparative politics research on group and ethnic voting patterns. Critics claim that King’s EI model was designed primarily for situations with just two groups (e.g., blacks and whites; Hispanics and Anglos, etc.). While many geographic areas (e.g., Mississippi, Alabama) still contain essentially two groups and hence pose no threat to traditional EI estimation procedures, the growth of racial groups such as Latinos and Asians have challenged the historical biracial focus on race in the United States (thereby challenging traditional EI model assumptions). Rosen, W. Jiang, G. King, and M. A. Tanner (2001) suggest a rows by columns (R\(\times\)C) approach which allows for multiple racial groups, and multiple candidates; however, their Bayesian approach suffered computational difficulties and was not employed at a mass level. Since then, computing power has steadily improved, making R\(\times\)C a realistic solution for many scenarios and accessible packages now exist in R that are widely used. These two methodological approaches are now both regularly used in political science; however, there is no consistent evidence how they perform side-by-side, and are different.
Ferree (2004) critiques King’s EI model, arguing that the conditions for iterative estimation (e.g., black vs. non black, white vs. non-white, Hispanic vs. non-Hispanic) can be considerably biased due to aggregation bias and multimodality in the data. In a hypothetical simulation dataset, Ferree shows that combining blacks and whites into a single “non-Hispanic” group in order to estimate Hispanic turnout can vastly overestimate Hispanic turnout, for example. However, the analysis did not provide any clues as to the specific conditions when and how R\(\times\)C is significantly better or preferred to EI. For example, if there are three racial groups in equal thirds of the electorate, does aggregation bias create more error in EI than a scenario in which two dominant groups comprise 90% and a small group is just 10%? Likewise, is EI’s iterative approach to candidates more stable when analyzing three candidates and far less stable when eight candidates contest the election? These questions have not been considered empirically. Instead, the existing scholarship uses simulation data to prove theoretically that EI might create bias and that R\(\times\)C is preferred. We argue that real election data should be considered in a side-by-side comparison.
Despite some critiques, other political scientists have defended ecological inference and even ecological regression using both simulations and real data. Owen and B. Grofman (1997) assess whether or not ecological fallacy in ecological regression is a theoretical problem only, a real problem for empirical analysis. In an extensive review, Owen and Grofman conclude that despite the valid theoretical concerns, linear ecological regression still holds up and provides meaningful and accurate estimates of racially polarized voting. A decade later, Grofman and M. A. Barreto (2009) again take up the question of how ecological models compare to one another using a combination of simulation, actual election precinct data, and an accompanying individual-level exit poll. Their analysis argues that there is general consistency across all ecological models and that once voter turnout rates are accounted for, ecological regression and King’s EI lead scholars to the same results. However, Grofman and Barreto did not consider R\(\times\)C in their comparison.
D. J. Greiner and K. M. Quinn (2010) combine R\(\times\)C methods with individual level exit poll data, and argue that this hybrid model can be preferable to a straight aggregation model. However, using exit poll data is not always available to all researchers and practitioners. Indeed, in most county or city elections, exit poll data does not exist which is why scholars often attempt to infer voting patterns through aggregate data. Herron and K. W. Shotts (2003) also criticize EI estimates when used for second-stage regression - given that error is baked into the second-level regression estimation. However Adolph and G. King (2003) respond by adjusting the EI procedure to reduce inconsistencies when estimating second-stage regressions. Nevertheless, these issues with EI do not speak specifically to R\(\times\)C methods.
J. D. Greiner and K. M. Quinn (2009) extend the 2\(\times\)2 EI contingency problem to 3\(\times\)3 and estimate voting preferences simultaneously for three candidates across three racial groups (but using counts instead of percentages). We extend this work by analyzing real-world datasets with sizes greater than 3\(\times\)3 (multiple candidates and at least three racial groups). In all of this, our main goal is to assess whether using iterative EI or simultaneous R\(\times\)C approaches change the conclusions social scientists can make from the data.
Finally, some have gone even further in arguing that EI is ill-equipped to handle complex datasets with multiple candidates and multiple racial groups, and that only R\(\times\)C can produce reliable results (Katz 2014). In explaining the theoretical reasons why EI cannot accurately process such elections Katz argues “adding additional groups and vote choices to King’s (1997) EI is not straightforward,” and also adds “given the estimation uncertainty, it may not be possible to infer which candidate is preferred by members of the group.” The argument against EI in multiple racial group, or especially multiple candidate elections is that EI takes an iterative approach pitting candidate A versus all others who are not candidate A. If the election features four candidates (A, B, C, D) critics state that you cannot accurately estimate vote choice quantities if you compare the vote for candidate A against the combined vote for B, C, D. The iterative approach would then move on to estimate the vote share for candidate B against the combined vote for A, C, D and so on, so that four separate equations are run. Katz (2014) claims that EI biases the findings in favor of bloc-voting stating “this jerry rigged approach to dealing with more than two vote choices stacks the deck in favor of finding statistical evidence for racially polarized.” Given these debates, our package allows scholars to quite easily make side-by-side comparisons and evaluate these competing claims.
While important advancements have been made in ecological inference techniques by King (1997) and Rosen, W. Jiang, G. King, and M. A. Tanner (2001) there is no consistency in which technique is used and how results are presented. What’s more, legal experts and social scientists often argue during voting rights lawsuits that one technique is superior to the other, or that their results are more accurate. There is no question that both social scientists and legal experts would greatly benefit from a standardized software package that presents both ecological inference results (EI and R\(\times\)C) simultaneously and metrics to compare each set of results. Thus, eiCompare was designed to compare the most commonly used methods today, EI and R\(\times\)C, but also incorporates Goodman methods. The package lets analysts seamlessly assess whether EI and R\(\times\)C estimates are similar (see King (1997) and Rosen, W. Jiang, G. King, and M. A. Tanner (2001) for a methodological description of the techniques). It incorporates functions from ei (King and M. Roberts 2013) and eiPack (Lau, R. T. Moore, and M. Kellermann 2012) into a new package that relatively quickly compares ecological inference estimates across the two routines.
The package includes several functions that ultimately produce tables
of results from the different ecological inference methods. Thus, in the
case of racially polarized voting, analysts can quickly assess whether
different racial groups preferred different candidates, according to the
EI, R\(\times\)C, and Goodman
approaches. The eiCompare package wraps the ei()
procedure (King and M. Roberts 2012) into
a generalized function, has a variety of table-making functions, and a
plotting method that graphically depicts the difference between
estimates for the two main EI methods (EI and R\(\times\)C). Below, we use a working example
of a voter precinct dataset in Corona, CA. To use the package, the
process is simple: 1) Load the package, the appropriate data, run the EI
generalized function, and create an EI table of results, 2) Run the
R\(\times\)C function (from
eiPack) and create a table of results, 3) Run the Goodman
regression generalized function if the user chooses, 4) Combine the
results of all the algorithms together into a comparison table, and 5)
Plot the comparison results. Before we conclude, we also compare EI and
R\(\times\)C findings against exit poll
data from a 2005 Los Angeles mayoral run-off election. The rest of the
paper follows this aforementioned outline.
To begin, we install (install.packages("eiCompare")) and
load the eiCompare package (library(eiCompare))
from the CRAN repository. First, we load the aggregate-level dataset
(data(cor_06)) into R, in this case a precinct (voting
district) dataset from a 2006 election in the city of Corona, CA.
Table 1 below displays the first five rows and
column headers of the dataset. This dataset includes all the necessary
variables to run the code in the eiCompare package. The
first column is "precinct", which essentially operates as a
unique identifier. The second column, "totvote", is the
total number of votes cast within the precinct. Columns three and four
are the two racial groups of whom we seek to determine their mean voting
preference. The rest of the columns are the percent of the total vote
for each respective candidate.
| precinct | totvote | pct_latino | pct_other | pct_breitenbucher | pct_montanez | pct_spiegel | pct_skipworth | |
|---|---|---|---|---|---|---|---|---|
| 1 | 22000 | 942 | 0.21 | 0.79 | 0.20 | 0.21 | 0.29 | 0.30 |
| 2 | 22002 | 1240 | 0.16 | 0.84 | 0.22 | 0.22 | 0.29 | 0.27 |
| 3 | 22003 | 1060 | 0.21 | 0.79 | 0.22 | 0.22 | 0.30 | 0.26 |
| 4 | 22004 | 1280 | 0.45 | 0.55 | 0.18 | 0.27 | 0.30 | 0.24 |
| 5 | 22008 | 1172 | 0.31 | 0.69 | 0.23 | 0.25 | 0.30 | 0.22 |
| 6 | 22012 | 1093 | 0.21 | 0.79 | 0.20 | 0.24 | 0.32 | 0.24 |
We are interested in how the four candidates (Breitenbucher,
Montanez, Spiegel, Skipworth) performed with Latino voters and
non-Latino voters (mostly non-Hispanic white), so we can asses whether
racially polarized voting exists. The process begins with the
ei_est_gen() function, which is a generalized version of
the ei() function from the ei package. Instead of
having to estimate EI results for each candidate and each racial group
separately, ei_est_gen() automates this process.
The ei_est_gen() function takes a vector of candidate
names, a character vector of tilde-prefixed racial group names, the name
of the column representing the total number of people in the
jurisdiction (e.g., registered voters, ballots cast), the
"data.frame" object holding the data, and the table names
used to display the results. The function also has four optional
arguments, rho, sample, tomog,
and density_plot. The former two can be used to adjust the
parameters of the ei() algorithm. These are especially
useful when the initial run does not compile or warnings are produced.
The latter two plot out tomography and density plots, respectively, into
the working directory but are set to off by default. These plots can be
used to assess the stability – and thus veracity – of the EI procedure
(see King and M. Roberts (2012) and (King 1997) for details). Finally, the
... argument passes additional arguments onto the
ei() function from the ei package.
One final note, given its iterative nature, the
ei_est_gen() function can take a while to execute. This
typically depends on features unique to the dataset, including the
number of candidates and groups, the amount of racial/ethnic segregation
within the city/area, as well as the number of precincts. This
particular example does not take especially long, executing in about a
minute on a standard Macbook Pro.
# LOAD DATA
data(cor_06)
# SET SEED FOR REPRODUCIBILITY
set.seed(294271)
# CREATE CHARACTER VECTORS REQUIRED FOR FUNCTION
cands <- c("pct_breitenbucher","pct_montanez","pct_spiegel", "pct_skipworth")
race_group2 <- c("~ pct_latino", "~ pct_other")
table_names <- c("EI: Pct Lat", "EI: Pct Other")
# RUN EI GENERALIZED FUNCTION
results <- ei_est_gen(cand_vector=cands, race_group = race_group2,
total = "totvote", data = cor_06, table_names = table_names)
# LOOK AT TABLE OF RESULTS
results
The call to the results object produces a table of results indicating the mean estimated voting preferences for Latinos and non-Latinos within the city of Corona (see Table 2). The results strongly suggest the presence of racially polarized voting, as Latinos prefer Montanez as their number one choice, whereas non-Latinos do not.
| Candidate | EI: Pct Lat | EI: Pct Other |
|---|---|---|
| pct_breitenbucher | 19.68 | 21.12 |
| se | 0.75 | 0.13 |
| pct_montanez | 35.95 | 20.13 |
| se | 0.03 | 0.08 |
| pct_spiegel | 28.43 | 31.01 |
| se | 0.57 | 0.23 |
| pct_skipworth | 18.64 | 26.84 |
| se | 0.71 | 0.23 |
| Total | 102.69 | 99.10 |
The R\(\times\)C builds off of code
from the eiPack package, where eiCompare simply takes
the former’s results and puts them into a similar
"data.frame"/"table" object similar to the
results from the ei_est_gen() function. First, the user
follows the code from the eiPack package (here we use the
ei.reg.bayes() function), and creates a formula object
including all candidates and all groups. The user must ensure that the
percentages on both signs of the \(\sim\) symbol add to 1. Thus, the initial
table() code is a simple data check to ensure that this
rule is followed. The R\(\times\)C
model is then run using the ei.reg.bayes() model. Users can
read the eiPack documentation to familiarize themselves with
this procedure. Depending on the nature of one’s data, the R\(\times\)C code can take a while to run.
Finally, the results are passed onto the bayes_table_make()
function, along with a vector of candidate names, and a vector of table
names, similar to what was passed to ei_est_gen().
# CHECK TO MAKE SURE DATA SUMS TO 1 FOR EACH PRECINCT
with(cor_06, pct_latino + pct_other)
with(cor_06, pct_breitenbucher + pct_montanez + pct_spiegel + pct_skipworth)
# SET SEED FOR REPRODUCIBILITY
set.seed(124271)
#RxC GENERATE FORMULA
form <- formula(cbind(pct_breitenbucher,pct_montanez,
pct_spiegel, pct_skipworth) ~ cbind(pct_latino, pct_other))
# RUN EI:RxC MODEL
ei_bayes <- ei.reg.bayes(form, data = cor_06, sample = 10000, truncate = TRUE)
# CREATE TABLE NAMES
table_names <- c("RxC: Pct Lat", "RxC: Pct Other")
# TABLE CREATION
ei_bayes_res <- bayes_table_make(ei_bayes, cand_vector = cands, table_names = table_names)
# LOOK AT TABLE OF RESULTS
ei_bayes_res
| Candidate | RxC: Pct Lat | RxC: Pct Other |
|---|---|---|
| pct_breitenbucher | 18.22 | 21.58 |
| se | 1.62 | 0.53 |
| pct_montanez | 34.96 | 20.44 |
| se | 1.72 | 0.56 |
| pct_spiegel | 28.24 | 31.05 |
| se | 1.08 | 0.35 |
| pct_skipworth | 18.61 | 26.91 |
| se | 1.73 | 0.56 |
| Total | 100.03 | 99.99 |
The results are presented in Table 3, and look remarkably similar to those presented in Table 2. Indeed, the exact same conclusions would be drawn from an analysis of both tables: Latinos prefer Montanez as their first choice and non-latinos prefer Spiegel as their top choice.
While many users will skip over the Goodman regression when
conducting ecological inference, given the documented issues with the
method (Shively 1969; King 1997),
eiCompare nevertheless has a Goodman regression generalized
function, similar to the ei_est_gen() function. This
function takes a character vector of candidate names, a character vector
of racial groups, the name of the column, a data object, and a character
vector of table names. Because Goodman is simply a linear regression,
the execution is very fast.
table_names <- c("Good: Pct Lat", "Good: Pct Other")
good <- goodman_generalize(cands, race_group2, "totvote", cor_06, table_names)
good
Table 4 shows the Goodman regression results. In this particular case, these results align quite closely with results from the two EI models. All three approaches essentially tell us the same thing.
| Candidate | Good: Pct Lat | Good: Pct Other |
|---|---|---|
| pct_breitenbucher | 17.51 | 20.34 |
| se | 3.18 | 3.74 |
| pct_montanez | 35.00 | 20.48 |
| se | 3.41 | 4.01 |
| pct_spiegel | 28.52 | 31.61 |
| se | 2.16 | 2.54 |
| pct_skipworth | 18.97 | 27.57 |
| se | 3.45 | 4.05 |
| Total | 100.00 | 100.00 |
The last two sections address the comparison component of the
package. The function, ei_rc_good_table(), takes the
objects from the EI, R\(\times\)C, and
Goodman regression, and puts them into a
"data.frame"``"table" object. To simplify comparison, the
table adds an EI-R\(\times\)C column
differential for each racial group. This format lets the user quickly
assess how the EI and R\(\times\)C
methods stack up against one another. The function takes the following
arguments: EI results object (e.g., results), an R\(\times\)C object (e.g.,
ei_bayes_res), and a character vector groups
(e.g., c("Latino", "Other")) argument. The
good argument for the Goodman regression is set to
NULL, and the include_good argument defaults
to FALSE. If the user wants to include a Goodman regression
in the comparison of results they need to change the latter to
TRUE and specify the the good argument as the
object name from the goodman_generalize() call.
| Candidate | EI: Pct Lat | RxC: Pct Lat | EI_Diff | EI: Pct Other | RxC: Pct Other | EI_Diff |
|---|---|---|---|---|---|---|
| pct_breitenbucher | 19.68 | 18.22 | -1.46 | 21.12 | 21.58 | 0.46 |
| se | 0.75 | 1.62 | 0.13 | 0.53 | ||
| pct_montanez | 35.95 | 34.96 | -0.99 | 20.13 | 20.44 | 0.31 |
| se | 0.03 | 1.72 | 0.08 | 0.56 | ||
| pct_spiegel | 28.43 | 28.24 | -0.19 | 31.01 | 31.05 | 0.04 |
| se | 0.57 | 1.08 | 0.23 | 0.35 | ||
| pct_skipworth | 18.64 | 18.61 | -0.02 | 26.84 | 26.91 | 0.07 |
| se | 0.71 | 1.73 | 0.23 | 0.56 | ||
| Total | 102.69 | 100.03 | -2.66 | 99.10 | 99.99 | 0.88 |
The results of ei_rc_good_table() is a new class
"ei_compare", which includes a "data.frame"
and groups character vector. This output is ultimately passed to
plot().
ei_rc_combine <- ei_rc_good_table(results, ei_bayes_res,
groups = c("Latino", "Other"))
ei_rc_combine@data
ei_rc_g_combine <- ei_rc_good_table(results, ei_bayes_res, good,
groups = c("Latino", "Other"), include_good = TRUE)
ei_rc_g_combine
Table 5 displays the output of a call to
the ei_rc_good_table() function for the first line of code
above. The user must include the code @data onto the
outputted table name to extract just the table. This table basically
summarizes the results of the EI and R\(\times\)C analyses. Clearly, very little
difference emerges between the two methods in this particular
instance.
Finally, users can plot the results of the EI, and R\(\times\)C comparison to more visually
determine whether the two methods are similar. Plotting is simple, as
plot methods have been developed for the "ei_compare"
class. The code below produces the plot depicted in Figure 1.
# PLOT COMPARISON -- adjust the axes labels slightly
plot(ei_rc_combine, cex.axis = .5, cex.lab = .7)
One possible question remains, whether or not ecological estimates line up with individual level estimates. Many studies have pointed out that ecological fallacy and aggregation bias can produce ecological inference results that are highly questionable. In this section we implement the eiCompare package for a mayoral election in a multiethnic setting in which an individual-level exit poll survey was also administered. The eiCompare package provides EI and R\(\times\)C results for the 2005 Los Angeles mayoral runoff election between Antonio Villaraigosa and James Hahn, and we also add results for the Los Angeles Times exit poll. Results are displayed in Table 6.
| EI: AV | EI: JH | RxC: AV | RxC: JH | Exit: AV | Exit: JH | MOE | |
|---|---|---|---|---|---|---|---|
| White | 45 | 54 | 48 | 52 | 50 | 50 | +/- 2.5 |
| Black | 58 | 40 | 50 | 50 | 48 | 52 | +/-4.2 |
| Latino | 82 | 17 | 81 | 19 | 84 | 16 | +/-3.6 |
| Asian | 48 | 51 | 47 | 53 | 44 | 56 | +/-6.1 |
The results presented in Table 6 demonstrate that not only do EI and R\(\times\)C produce remarkably consistent results, but they very closely match the individual level estimates for the Los Angeles Times. The EI R\(\times\)C estimates are all with the confidence range of the individual level data reported by the exit poll.
eiCompare is a new package that builds on the work of King and others that attempts to address the ecological inference problem of making individual-level assessments based on aggregate-level data. As we have reviewed above, there is considerable debate in the sciences about the utility and accuracy of ecological techniques. Despite these well documented questions, ecological inference is widely used in political science and will continue to grow in importance when the constitutionally mandated redistricting in 2021 occurs. The redistricting cycle will bring with it extensive academic, legislative, and legal research using ecological inference to assess racial voting patterns across all 50 states.
While this new package does not develop a new method, per se, it improves analysts’ ability to quickly compare different commonly used EI algorithms to assess the veracity of the methods and also produce tables of their findings. While R\(\times\)C has been touted as the method necessary in situations with multiple groups and multiple candidates, the results do not always demonstrate face validity. In these scenarios – and others – analysts may want to incorporate original EI methods so they can compare how the two approaches stack up. Ultimately, this approach provides a needed assessment between two commonly used methods in voting behavior research.