Santa, Gender and Bad Data
Today's entry in the "that's not what those numbers mean" category is: the outrage at the possibility of a gender neutral or female Santa. You may have seen something like this "article" in recent days and thought, "hmm that is interesting", or perhaps you were offended in some way shape or form by the possibility of changing the identity of a make believe character that you haven't taken as real since you were a child. If you took the headlines at face value, then regardless of your reaction, you are probably wrong in your conclusions. As it turns out, the latest deadly strike in the bitter War on Christmas, is little more than shitty non-representative data that has been widely misinterpreted by non-reliable rumor mills (such as the breitbart link unfortunately shared above).
At this point you may find yourself saying "but they have percentages, and the reporters talked about a sample, so how could these numbers not mean that '27% of Americans want a gender neutral or female Santa'". However, if we look closely at the actual study, we can see a few problems that tank the reliability and validity of these data. For instance, the initial data was collected from only "400 respondents from both the US and UK". I don't if this means that there were 400 hundred respondents in the US and another 400 in the UK, or if there were only 400 split between the two countries, but it doesn't really matter as either way we are looking at a woefully insufficient sample size in order to make a generalizable conclusion across any substantial population. It isn't necessary to survey millions of people if we want to get the pulse of a population on any specific issue. Indeed, many nationwide polls are done in a pretty valid fashion with only a few thousand respondents. However, there are certain thresholds that must be met in order to guarantee that the results of a study come close to being representative of the larger population, and the study upon which our discussion focuses here does not meet those criteria. Having a sample of more than 400 people is one of those criteria. The larger the sample size, the higher our confidence in the results, though there tend to be diminishing returns after a few thousand respondents which lead most researchers to accept defined potentials for error in the results (you have see these reported as confidence intervals such as +/- 3%).
Another important criteria in order to ensure a representative sample, and generalizeable results, is random selection of respondents. A good survey methodologist takes all of the known members of a population (such as people with a phone number in the US, or students with an email address on a college campus) and randomly selects form this a sample of the requisite size. The logic of sampling is based on the principle of randomizaton, without it we have increased chances for a biased result. For instance, if instead of selecting people from a list of telephone numbers or email addresses, you just called everyone in your phone (or sent an email to everyone in your list), your responses would come from a group of people who may be more likely than the general population to be similar you (such as favorite sports teams, or political ideology). We can see from the study discussed here, that the respondents took part in an "open survey". In other words, a link to an online survey was created and parked somewhere for people to find, or perhaps mailed out to clients, or those in the email lists of the marketing firm. This means that there appears to be a huge chance of bias in the original survey which was used to generate "suggestions on how to change Santa".
You may also notice in the fine print, that the numbers reported in the company's infographic, and the data presented above, do not come from that original collection of suggestions from 400 (?) self selected respondents. Rather, the numbers that are being reported on the news come from a follow up "study" in which some "4,000 people across the US and UK "voted" on their favorite Santa re-branding ideas that were generated in the initial study. Thus we see that the original small sample was merely an exploratory study (which tend not to use random sampling because they are just looking for ideas), and it is the larger sample that were are being told to take as a representation of the US population. However, with only flimsy documentation of the method used in this larger "vote", we can only infer that they were as non-rigorous and self-selective as those in the initial study. Indeed, it appears that the question generating all of the controversy on the news was based on the responses of 1015 respondents (not 4000). Thus, instead of 27% of the US population feeling a certain way about re-branding Santa, what we more likely have is 27% (n=150) of people who were interested enough in re-branding Santa to take a survey about it, feel that a modern conception of Santa should be a different gender. It may not sound as good as a headline, but that would appear to be the other part of the problem.
It is worth noting that there are plenty of very good scientific studies, with valid findings, that are not based on the use of random sampling or large sample sizes (indeed most qualitative work). For instance, Howard Becker's work on marijuana use (which has informed decades of further research and understanding) was carried out with a series of fifty interviews. The important thing to note however, is that very seldom does any researcher use limited data (which in many cases is all we can get a hold of) and make broad reaching statements of fact regarding the entire US population. These things are often speculation or theorization as part of a discussion of results, but you will be hard pressed to find a social scientist who will make such bold claims on incredibly flimsy data as we see in this case
(and if they do in a formal fashion, like a journal publication, there are ways to make sure the claims are verified).
And thus we come to the other half of the problem, the misinterpretation of the results by just about everyone. In the modern day, with the ability to fact check a claim such as this with relative ease, we seem to have created a trap for ourselves, in assuming that someone else must have checked the data or sources. Though as we clearly see in this case, that is not always the case. And with the speed at which something like this can spread on social media and across the Internet, being taken up as "truth" by reporters, bloggers and everyday people re-posting the story, all it takes is one sensational headline to throw us into a sad saga of mass-reproduced bullshit without a valid basis of fact. The other big factor to consider here is the ease by which we take a headline like this and turn it into a "fact" in our view of the world. For instance, while we have established that this "story" is bogus, there are apparently plenty of people who have taken it as reinforcement of the view that millions of liberals are out to destroy everything "traditional" in the world, and/or that this constitutes a continued persecution of Christians via the War on Christmas (if you need proof of this, and can stomach it, just read through the nearly 2,000 comments on the original breitbart story linked to above). In reality, is is entirely possible (just like with the lyrics to Christmas songs, or depictions in the movies) most people could care less about changing this stuff (as is noted in the flawed data which finds that 70%, including supposedly the majority of liberals, would keep a male Santa).
Perhaps most importantly, with all of the scientific issues put aside (because chances are most won't car about those for too long), we should consider the ridiculousness of being offended by the possibility of a re-branded Santa who could be female, gender neutral, trans, non-binary, or any of the other dozens of gender identities from around the world. Is it so hard to imagine an imaginary character taking on a different identity? If you travel the world you will see, for instance, Black Jesus, White Jesus, Asian Jesus, Hispanic Jesus, and many other depictions of a mythical figure taking on the image of those who worship him (if you believe the stories, Jesus was from the middle east, so he probably didn't look much like the pale man depicted in many European and North American churches). Or perhaps the struggle is with the idea that an imaginary character could be anything other than a male, or the far more terrifying possibility that this character you don't believe in anymore could fall somewhere else on the gender spectrum, outside of the female-male dichotomy. If that is the problem, then perhaps it is not liberals who are the demons here...
And so, as the War on Christmas continues to roil in the imaginary realm of Santa, Rudoplh and Frosty (it should be noted that the vast majority of our Christmas imagery is male, which is Ironic if it is really supposed to be a celebration of the birth of a child, which last I checked couldn't happen as per biblical portrayals without a mother), we are left to scratch our heads and wonder where our information is coming from. Keeping asking questions about your data my friends, and Happy Holidays!
*Important Note: none of this appears to be the fault of the marketing firm GraphicSprings, as I very much doubt they intended their marketing study to be misinterpreted. However, it does point quite clearly to the importance of using proper scientific rigors when conducting any kind of research (and making sure that if you run a company you employ at least one person with survey methods training), as you never can tell when the results of a study will go viral.
study come