While the Republican Party presidential primary has largely degenerated into an unpredictable, messy, nasty, free for all, the Democratic Party primary has proceeded mostly along predictable lines. The race has an overwhelming establishment backed favourite candidate (Hillary Clinton) and an insurgent (Bernie Sanders) waging an improbable issue based campaign against her from the left. As it often happens in such contests, the underdog candidate has won a few races and has created a few flutters here and there, but the establishment candidate has been able to lock up the critical contests and is looking on course to register an easy victory in the end.
Political contests dominated by two major candidates are often predictive along demographic lines; i.e. one candidate remains the strongly preferred choice of a certain demographics and vice versa. It is no different in this year’s Democratic Primary. Clinton has been heavily favoured by African American voters and to a lesser extent by Latino voters whereas Sanders has drawn his votes mostly from white voters.
The following chart which shows the performance of Sanders vis-à-vis the black population in each of the fifteen states that have voted so far in the primaries sums up the relationship.
For every one percentage increase in the percentage of black population in a state, the vote percentage of Sanders comes down by roughly 1.5% points.
Sanders also does worse among Hispanics although the relationship is very weak. Further, from the limited evidence that we have had so far, he has done better in the New England states and in states that have conducted caucuses. Thus, we can build a slightly more complex model by incorporating the percentage of Hispanic population in each state and a dummy variable for the location of the state (1 for Vermont, the home state of Sanders, 0.5 for other New England states and 0 for other states) and a dummy variable indicating whether the vote shall be through a primary or a caucus (0 for primary and 1 for caucus). We can carry out a multiple regression with the voting percentage of Sanders in each state as the independent variable and the above mentioned variables as the dependent variables.
The results of the regression are shown in the following table:
It throws up the following simple equation:
Y = 0.49-0.83*X1+0.32*X2+0.12*X3-0.12*X4
Y = Percentage of Democratic voters likely to support Sanders in a particular state
X1 = Percentage of Black Population in the state
X2 = Dummy Variable indicating if the state is in New England
X3 = Dummy Variable indicating whether the state held a primary or a caucus
X4 = Percentage of Hispanic Population in the state
We can say with 90% confidence that each of these variables has significant relationship with the independent variable, apart from X4 which at best has a tenous relationship with the vote share of Bernie Sanders as of now. The variable has, however, still been included in the model as many of the states scheduled to vote later in the calendar shall have very high percentages of Latino voters.
The plot of the actual vote shares of Sanders as against the percentage vote share as predicted by the model is shown below:
The regression shows a strong relationship between the dependent variables and the independent variable. Around 89% of the variation of Bernie Sander’s vote share in the various states may be explained by this simple model.
If we use the equation thrown up by the model to predict the vote share of Sanders in the later voting states in the calendar, we get the following forecasts:
|Name of the State||Scheduled to Vote on||Delegates Offered||Predicted Vote Share|
|District of Columbia||14-06-2016||20||7%|
Now before we go any further, let us be clear about the problems with this model. First of all, the number of data points which is only 15 is woefully inadequate to make predictions with a high degree of confidence. Secondly, there is no guarantee the relationship that the regression analysis has come up with shall hold well into the future. Thirdly, the sample is not random. In fact, it features a disproportionate share of states from the North-east and the south and has very little number of states from the Mid West or the West. Fourthly, because of the low number of data points, the standard errors of the co-efficients are pretty high i.e. the 95% confidence interval of the predictions is pretty wide.
Even with these caveats in place, it makes sense to gather some insight from the voting share being predicted here. The table above shows that Sanders is strongest in the states of the West and in New England, especially in places like Kansas, Nebraska, Maine, Idaho, Utah, Alaska, Hawaii, Washington, Wyoming, Rhode Island and Connecticut, almost all of which are also holding caucuses instead of primaries. He is also expected to be competitive in a handful of states in the Midwest, the Appalachia and the West Coast.
It may be worthwhile to mention here that the Democratic Primary has two types of delegates – pledged and unpledged delegates. The pledged delegates are elected through the state wise primaries and caucuses and are bound to support a particular candidate at the convention, depending on the results of the primary and caucus of the state they are representing. The unpledged delegates (also called super delegates), on the other hand, are members of the party establishment, who are not bound to support any candidate at the convention. There are a total of 4051 pledged delegates and 712 unpledged delegates. Thus, in order to win the Democratic nomination, a candidate has to win at least the support of 2382 of the total of 4763 delegates on offer.
Now Clinton already has a lead of around 439 among the 479 super-delegates who have committed to support one of the two candidates. She also has a lead of around 160 among pledged delegates, on the basis of her performance in states that have voted so far. If we assume that the remaining unpledged super delegates and the US territories (like Virgin Islands, Northern Mariana Islands, etc.) shall offer their support to the two candidates equally (which is generous to Sanders considering that super delegates have pledged to support Clinton overwhelmingly so far), Sanders will need to win 1798 of the delegates in the states which are left to vote.
However, the problem for Sanders is that he is not very competitive in delegate large states like California, New York, Florida, Pennsylvania, Ohio, Michigan, etc. The Democratic Primary syatem largely allot delegates on a proportionate basis. If we assume that Sanders performs exactly the way the model has suggested, in various states as well as in the various congressional districts within each state, he will end up with around 1154 delegates from these states as opposed to 1843 delegates for Hillary Clinton.
So, how much does Sanders need to overperform relative to his performance till now. In the table below, I have calculated the number of delegates that Sanders may hope to win from the remaining states given the base case scenario (i.e. as predicted by the model) and increase in his voting percentage across states in slabs of 5%.
|Scenario||Delegates win by Sanders||Delegates win by Clinton|
|5% increase across states||1304||1693|
|10% increase across states||1454||1543|
|15% increase across states||1604||1393|
|20% increase across states||1754||1243|
|25% increase across states||1903||1094|
In order to reach the magical number of 1798 delegates from the remaining states, Sanders will need a vote swing by around 23% which is almost impossible. Even if we ignore super delegates, in order to overcome his deficit of 140 pledged delegates, Sanders will need to win 1579 delegates from the remaining states i.e. he will need a vote swing of almost 15% across states, which is no easy task.
In short, Sanders is facing an uphill climb. He has to dramatically increase his appeal to demographics which have not been so favourable to him till now and he will need to do it in a very short span of time. Otherwise, Hillary Clinton will easily become the Democratic Party nominee.