Hullo & hi!
Sorry for the wait, imagined audience, but it’s finally time for the next story of the data regarding my stories—AKA a story data story. This one covers my collection, organization, and analysis of data regarding Black names. It’s been bigger than expected so I’m going to break it up into parts.
Learning data scraping was harder than I thought, but now I have the increased ability to work with datasets so big that my Google Sheets froze and crashed.
Once again, we'll start by looking at the spreadsheets I already have, and then recreate a more robust version of the process through this blog entry. The starting spreadsheets are my lists of thousands of possible character names developed with a select-all/copy/paste/clean workflow, but the result will be tens of thousands of possible names, with the potential for more.
We’re going to keep using the Google framework for data analysis because I love a good structure. (Fun fact! I often use the Snowflake method to plot my way into a new story, though I start pantsing once I get into a groove.)
All the same, let's get into it!
Ask:
I originally started these lists with the question of “What are the characteristics of diasporic African American names and how do they compare with African names versus colonizer/ European-American names?”
Or in short,
“What makes a Black name Black?”
I’m a Black Muslim(ish) kid with an Arabic name, and I know the history and culture behind the names in my family. My parents and their relatives were mostly given generic European American names like “Allen” and “Kim”, and I mostly understand the history and culture behind those kinds of names even if they don’t fully resonate with me. I also understand when people take on names from the cultures of African nations, even when unsure which tribes they have genetic ties to. I wouldn’t do that (afraid of appropriation) but I get it.
Personally, I’m the kind of middle-class-coded Black person who would give their kids meaningful names after Black cultural figures, so that’s another category of name I understand. But there’s a whole class of Black names I don’t understand, names like “DeAndre” or “Shameka” that I know through cultural osmosis but have unclear meanings or etymologies.
With this in mind, I had already started to collect Black names as a resource for developing my writing.
Again, many of these lists were created 10 or more years ago, before I decided to make this pivot to data analytics, so I wasn’t the best at tracking sources.
But here’s a look at a table of Black surnames I got from the US Census and used to inform character last names when I was stuck.
The calculations are mine and used to calculate how common each last name is in the population versus how common it is among Black Americans. And yeah, these last names sound pretty Black to me! I can think of at least one Black friend of public figure bearing the top twenty.
My surname is pretty far down there, though: 997th place in 2010 with only 4,000 Black Becks to be found. They all cousins though.
Here are the Census numbers from which I pulled the Black surname data. I can use this to develop characters of other ethnicities as well.
(I feel like I am so afraid of accidentally doing a racism and getting a cancellation that I over-research character backgrounds. You’re never going to catch me with names like “Cho Chang”.)
Here’s the result of my manually scraping Wikipedia for the names of notable Black American figures. The sources are tracked here, under Sources.
The two gendered lists were data sources for me to construct my own names.
I know the rules of Black constructed names well enough to do that, but you know me. I wanted to use machines to surprise me. I’m like Salvador Dali, nodding off with a key in my hand so that its clatter awakens me during hypnagogia.
(Side note: given that a lot of fantasy stories have constructed names and a significant part of Black culture is constructing names, there should be more crossover between Black linguistics/onomastics and second-world fantasy linguistics/onomastics. I’m behind on my speculative novel TBR pile right now, though, and recently added three more middle grade books with Black writers and protagonists, so hopefully there’s more than I thought!)
Lastly, here are some of the constructed names that came about from the process above.
I wanted names that were both familiar and surprising, and I think these hit that sweet spot.
Some of these names turned out to belong to real people, which leads me to my character naming pro-tip:
If you're unsure about a name you're using, put it into Google image search and it should come up with a couple people whose images indicate the demographic that name belongs to.
If you don't see any human faces in the results, that makes it a new name, a unique one, or a fake one--you decide! What fits your character better?
If you see people of an unexpected ethnicity, you must have stumbled upon a name in their culture. Do more research before you use it.
If you only see one person, it's their name and you probably shouldn't take it. If said person is a celebrity, there is a chance you can write a character named for the celebrity, but you would still need to do quite a bit of culturally-specific research.
So that’s what we’re starting with. What are we working towards?
If I had the time, funding, and background, I would use Word2Vec (shout-out to Prof Myerston for introducing me to these technologies) to chart African American names versus African and Arabic names versus European-American names, but it looks like others have already been doing that work.
I got pulled into a whole ‘nother rabbit hole just reading those other academics.
Some of the standout conclusions include:
“…African and African American cultures share similar rhetorical strategies in verbal exposition in creating new personal names reflective of their sociopolitical environment. […] the coining of new names from old morphological roots is an element of syncretism, which is very characteristic of both African and African American cultures. […] African Americans have retained in their speech African linguistic roots used in naming, as well as the ability to fundamentally manipulate the base name-stem of a language to construct new names and encode them with the relevant semantic import through the affixation process,” by Lupenga Mphande in “Naming and Linguistic Africanisms in African-American Culture,” a paper that discussions how the coinage of African-American names follows similar rules to African names.
“There are no commonly used prefixes in traditional US names, most prefixes that occur are isolated names taken from surnames with attached prefixes that mean of, from, or son of. […] The use of these specialized prefixes and suffixes, which will be denoted as freefixes from this point forward, has transformed Afro-American naming practices beyond simple etymology. […] The data above shows that constructive names have, since the early 1970’s, been used widely by the Afro-American population in the United States, and for the most part used exclusively by that population,” by Clara Senif in “LaKesha and KuShawn: A cultural-linguistic approach to Afro-American onomastics,” a paper that further identifies and examines the rules of Black name structures and practices.
“… [In] the 18th but also still in the 19th century, Anglo-Americans reserved names for themselves and chose names from a different name pool for their slaves. This name pool indicated in an onomastic way who was enslaved and who was free. […] Therefore, it is not surprising that free AfroAmericans in an onomastic way also separate themselves from their slave contemporaries by avoiding typical slave names and orienting themselves on the naming of the ‘Whites’,” by Anna-Maria Balbach in “Caesar, Jack, and Cuffee: African-American fugitive slave names in the 17th to the 19th century” which does some admirable analysis of the differences between the names of free, fugitive, and enslaved Black folks and how there was already some pushback against Euro-American naming practices. The Antebellum Roots of Distinctively Black Names by Trevon Logan, Lisa Cook, and John Parmandoes similar work by looking at post-war versus pre-war Black names.
In addition to those readily accessible theses and conference paper, there are also their source/foundational texts Black Names, by J. L. Dillard, Black Names in America: Origins and Usage by Newbell Puckett, Africanisms in American Culture by Joseph Holloway, “The Causes and Consequences of Distinctively Black Names” by Roland Fryer Jr and Steven Levitt, “Onomastic divergence: A study of given-name trends among African Americans” by Pauline Pharr, and “Distinctive African American names: An experimental, historical, and linguistic analysis of innovation” by Stanley Lieberson and Kelly Mikelson.
Lastly, I’m inspired by the analysis presented by the BI company Sistense in What Baby Names Tell Us About Ethnic and Gender Trends.
Apologies for pulling you down the rabbit hole I have dug, imagined reader, but these folks have done work that I want to finish reading through rather than retread. Instead, we’ll do a database comparison between my favorite baby name sources and see what aspects of Black naming culture are represented in my fave/selected sources.
Those sources are:
Five sets of names scraped from my favorite character-naming resource Behind the Name:
Every name categorized as “African” (782 names),
Every name categorized as “Arabic” (1,068 names),
Every name categorized as “African-American” (177 names),
Every user-submitted name categorized as “African” (6,800 names),
Every user-submitted name categorized as “African-American” (3,274 names),
Every name scraped from the Black-name-focused website The Black Names Project (3,939 names),
The US Census’ top 1000 baby names by gender for the years 2020, 2010, 2000, and so on through 1920; courtesy of GitHub user aruljohn (21,000 entries with many dupes), and
New York City’s Most Popular Baby Names by Sex and Ethnic Group from the years 2011-2019, courtesy of NYC Open Data (18,054 entries, including dupes).
Two naming sources that are still in progress include the data from The Trans-Atlantic and Intra-American slave trade databases (which I leaned on quite a lot when naming characters in my thesis, hoping to reclaim and adapt the names of enslaved people) and the pdf copy of Proud Heritage: 11001 Names for Your African-American Baby by Elza Dinwiddie-Boyd that I’m attempting to scrape. I want to figure out how to use both these datasets because I’m having too much fun but I am ultimately doing this for a work portfolio and need to balance that fun with making sure I find a job before I run out of money.
Some questions I’m particularly curious about:
Which user-submitted names are absent from Behind the Name but are represented in the top names on the US census?
We can already see that there are 18 times as many unofficial than official Black names on this key naming resource website. I think it’s emblematic of how Black names are considered less valid even when they are quite popular.
Side note: I’ve long been fascinated by the international popularity listings of this site, and I’ve been particularly drawn to the nations whose data is missing and the popular names not listed on the site. There’s a lot of clear bias on this website, however, including in the user rankings of African-American names.
What percentage of Black names are just African or Arabic?
This one is pretty easy to track, but I’m curious about the percentage of constructed names that are deemed Black versus loan-names that are Black. I’m already seeing some patterns where some Arabic names like Jamal are also listed as African American names.
What percentage of constructed Black names have African roots versus Arabic roots versus European roots?
This one might be harder to track but Behind the Name and Black Name Project both track name origin and whether names are variants. So yes, Jamal may be both Black and Arabic but are Jamar or Kamal? Does DeAngelo count as Italian?
What can we learn about Black naming trends as a whole by using NYC as a representative sample?
The newness of the sample may affect its usefulness, but it’s still the best source we’ve got for tracking names by ethnicity—the US Census/SSA doesn’t track that, sadly. Similar to the work done by Sistense, we can extrapolate upon their data to study the commonness of Black names in the US population.
I’m going to call it a day after all that.
I already have the spreadsheet up and running with a cool little VLOOKUP matrix tracking the overlaps between sheets. Maybe tomorrow I’ll start running Pivot tables and crafting charts. Maybe I should already switch to doing all this in Tableau?
I gotta go help support some beloved protestors and call some officials but see you next time!