Census Linking for Historical Political Economy
Between Juneteenth and Father’s Day, the past few days have been echoing thoughts and questions about generations and ancestors and family history. The impulse to figure out who your great-uncle was rambling about on the last family zoom holiday meal is probably not the most compelling way to pitch a research paper. But there is a lot that scholars of historical political economy can learn from genealogy, at least when we attempt it at scale. Yes, patient readers, I’m finally going to talk about census linking (well, more than I did here or here.
What is Historical Record Linking?
Historical record linking is, well, what it sounds like it is: linking records about individuals across data sources. Often, this comes in the form of census linking and we connect records from census to census. But anything with names (and a bit more identifying/disambiguating information) could be linked. The US federal censuses, whether from 1790 or 1940, are pretty amazing data sources on their own. But record linking can turn a soggy old cross-sectional dataset into a brand new and spiffy longitudinal dataset that gives us the ability to track people across decades, children from childhood to adulthood, and families across generations.
I’m not going to cover the (ever-evolving and somewhat contentious) details of *how* to link historical records here. And I’m certainly not going to rehash the linking wars (trust me, you’re better off not knowing what I mean if you don’t already). For more details, see my forthcoming JEL piece with Ran Abramitzky, Leah Boustan, Katherine Eriksson, and Santi Perez (and the papers we cite there). But the basic idea is simple. Imagine I want to link children from the 1910 census ahead to their adult selves in 1940, 30 years later. What do I know about the kids in 1910? Their names, their ages, their states of birth and whatever else is recorded in the 1910 census. So I go looking (well, a record linking algorithm goes looking) for people with the right name, the right age (+30 from the 1910 age), and state of birth. This has to be approximate because names and ages can change (nicknames to proper names or inconsistent initials) and wobble (enumeration or digitization errors and much more). As a fun example, I’m confident I spelled at least one of my coauthors’ names wrong at some point in this post and there’s a no-prize for anyone who finds it. The “right” or “best” or “acceptable” way to deal with the noise in linking is unsettled and probably more art than science but, eventually, you have individual-level historical longitudinal data. Now all the X’s we can see in the census in 1910 (where you live, who your parents or neighbors were, etc) are connected to the Y’s we can see in the census in 1940 (years of education, earnings, where you live, marital status, fertility, etc).
Economic historians of all stripes (and other hangers-on, I suppose) have made use of this data to learn about intergenerational mobility, immigration, migration, and the long-run effects of policies or natural disasters, and much much more. But I want to make a pitch about why scholars interested in historical political economy questions should be excited about historical record linkage. And how better than to flag a few recent papers (heavy on my own work and people wise enough to cite me because because).
Very Long Run Effects of Cross-Racial Childhood Exposure
Two weeks ago, I had a new paper with Jacob Brown, Ryan Enos, and Shom Mazumder published in Science: Advances. We wanted to study the (very) long-run effects of cross-race childhood exposure on attitudes. To do this, we linked the 1940 census to early 21st-century voter files. The match here was a bit different than others I’ve done in the past because we’re going over six or seven decades. Of course, we can see names in both the 1940 census and the voter file. But while voter files have precise birthdates, the 1940 census simply asks “how old are you?” To make our links plausible, we need a bit more information. Here, we got a bit lucky. Three contemporary voter files (North Carolina, Nebraska, and all 22M registered voters in California) record voters’ states of birth. Linking on name, age, and birth state isn’t perfect (requiring names not to change forces us to focus only on men and there are still bound to be false positives and false negatives), but those are the three key variables much of the historical literature has used as well.
Other than the cool factor of linking people so many decades later, what did we get from the matching? Well, if you are in the 1940 census, we can see you and your family. But we can also see all of your neighbors and their neighbors and their neighbors because census enumerators—more or less—wrote households down on manuscript pages in the order they walked neighborhoods. Trevon Logan made use of this feature of the census to study segregation with John Parman. So our big question about cross-race exposure now becomes a lot more specific: how does having a black neighbor when growing up affect a white kid? What we learn—zooming in on enumeration districts or even smaller geographies to try and control for selection bias—is that when we see these white kids (boys) from the 1940 census 70+ years later in the voter files, they are more likely to be Democrats if a black family lived next door to them in 1940.
Politicians and Bureaucrats in the Census
Voters are fine but what about politicians? Yes, census record linkage can also help us learn more about politicians too. In work with Dan Thompson, Andrew Hall, and Jesse Yoder, we link politicians to the 1940 census (I promise, I work with other censuses as well). We see that politicians are much better educated and have higher status occupations than the general public before they run for office. They are also better off (positively selected) compared to their brothers. And when we compare candidates who run and win with candidates who run and lose elections for the House, again, the politicians are better off than the attempted politicians they beat.
There are also important questions about why elected officials vote the way they do on legislation. The political economy literature here is deep and wide, but I think census linking can still help us shed new light. To understand a politicians’ motivation, it helps to better understand his or her background. In work with Max Palmer and Benjamin Schneer, we investigate the role that family background plays in shaping immigration legislation. By census linking members of congress who served in office from the 1910s to the 1960s, we can see the immigrant story of the men and women (though mostly men) who shaped US immigration policy, from closing the border in the 1920s to opening and reshaping immigration in 1965. For a “nation of immigrants”, the US has gone through repeated cycles of anti-immigrant and anti-immigration panics. Still, the US congress has often had many children and grandchildren of immigrants serving. As we show in that paper, these family backgrounds (and family stories?) matter. Immigrants, the children of immigrants, and the grandchildren of immigrants are more likely to vote for more liberal immigration policy. This holds true even when we control for district composition (districts with more immigrants do elect more immigrant-descended MCs and their MCs tend to have more liberal immigration voting records) or when we compare close-winners with close-losers who have sharply different family immigration histories.
Beyond politicians, two new papers take census linking to government bureaucracy. Diana Moreira and Santi Perez study the Pendleton Act and Abhay Aneja and Guo Xu follow the African American federal government workers who were resegregated by Woodrow Wilson. They are both super cool papers (the kind of cool that makes you wish you’d written them first… but you rationalize that you’ve been busy and they did it better than you could have).
Immigration in the Census
Some of the classic works in economic history, many from Abramitzky, Boustan, and Eriksson, apply census linking to questions about immigration. While many tackle labor questions—the economic selection of who immigrates or returns or the labor market returns to immigration—others have a more PE flavor. Census linking can help us estimate the effects of time in the US on culture, as measured by the names given to the children of immigrants. Census linking can also trace the effects of assimilation schemes—like the Industrial Removal Office that tried to move Jews out of the Lower East Side or the Galveston Movement that tried to get Jews to the west by emigrating them through the port of Galveston—on the affected immigrants. In a very new working paper, Costanza Biavaschi, Corrado Giulietti, and Yves Zenou link immigrants from 1930 to 1940 to understand the role social networks play in naturalization decisions.

This was a quick and incomplete round-up of some new and brand new scholarship that makes use of census linking to try and answer political economy questions but even with all the great papers I missed included we are only scratching the surface. Clever census linkers are putting together matched data outside the US and for other time periods (this won’t be the Broadstreet post of mine that bucks the trend as everything I’m talking about is US-based) and other good souls are doing the linking and sharing them widely so you can use the data without becoming a linking jedi. Censuses are restricted for 72 years after they are taken so we’re only a year away from the release of the 1950 census. Linked data has become a huge part of economic history and it should be more and more a part of historical political economy.