Beyond “Soda, Pop, or Coke”

Regional Dialect Variation in the Continental US

Click here to view the interactive dialect maps.

Using data from the Harvard Dialect Survey, by Bert Vaux and Scott Golder, we examine regional dialect variation in the continental United States. Each observation can be thought of as a realization of a categorical random variable with a particular parameter vector that is a function of location—our goal was to interpolate among these points in order to estimate these parameter vectors at a given location, making use of a combination of kernel density estimation and non-parametric smoothing techniques. Results in a smooth field of parameter estimates over the prediction region. Using these results, a method for mapping aggregate dialect distance is developed.

Special thanks to Bert Vaux for his original survey data, on which these maps rely, and to the entire team over at RStudio for working tirelessly to accommodate the insane traffic demands.

Dialect Map FAQ

Where are Alaska/Hawaii?

Unfortunately, including Alaska and Hawaii makes both the modeling process and the design of the mapping algorithms infinitely more complicated, because those two states are so far removed from everywhere else in the country. Sorry guys, maybe next time (I know you all must be feeling a bit left out).

Why are some answer choices listed that don't appear in the map?

The composite maps only show the most common answer at every location. The more clearly one answer dominates, the darker the color. But this doesn't tell the whole story. For that, you have to look at the individual maps... There you can see the variation in where each answer is more/less common. Take, for example, #44, on the pronunciation of “cream cheese”: cream CHEESE isn't estimated to be the most common anywhere... but looking at the individual heat maps you can see that it's relatively more common in West Virginia (36% in Charleston) than it is in New Jersey (11.5% in Parsippany).

What R packages did you use?

fields, maps, mapproj, plyr, RANN, RColorBrewer, scales, zipcode. And, of course, shiny.