The internet was aghast last week to learn that Facebook manipulated the news feeds of nearly 700,000 users for the purposes of a scientific research study on “massive-scale emotional contagion.” The resulting paper by Kramer, Guillory, and Hancock, “Experimental evidence of massive-scale emotional contagion through social networks”, was published in PNAS and is freely available to read. In short, the study authors report that by manipulating whether positive or negative posts were filtered through users’ feeds, the users were subsequently more likely to post correspondingly positive or negative posts themselves. The interpretive claim is that the emotional valence of one’s social network is “contagious” — if you see a lot of negative posts you start to feel negatively, and vice versa.
People are upset about this for primarily two reasons: 1) objections over the ethicality of Facebook altering users’ experience unbeknownst to them, and 2) the broader implication that Facebook and other social networks have the power to change how we feel. Many news sources from small tech blogs to major publications immediately reported the study’s findings as fact and began to castigate Facebook for their fiendish, Milgram-esque toying with human emotions. Not surprisingly, few actually paused to look at the study’s statistics or discuss its methodology. Let’s take the time to do so, shall we? Then we can make an informed judgment on whether 1) the researchers and Facebook acted unethically, and 2) we are descending into a dystopian hellscape in which computer algorithms dictate our thoughts and feelings.
That old stick-in-the-mud, validity
One important fact should be stated right off the bat: The Kramer et al. study does not include any measurement of participants’ emotions. Data is based on linguistic analysis of Facebook posts, in which words are classified as “positive” or “negative” based on a preexisting body of empirical research. This is an important point in assessing the study’s validity — that is, determining if the study actually measures what it claims to be measuring.
In psychological research we can rarely measure something directly (has anyone seen my “happiness” yardstick lying around?), so we use proxies. A self-report questionnaire about your recent thoughts and behaviors may be one method of assessing your level of depression, for instance. The meaning of our research is only as good as how closely our measurements approximate the psychological phenomena we actually care about. Does someone’s use of positive or negative words represent their current emotional state? I don’t know, and the Kramer article doesn’t spend any time arguing that it does. So the subsequent news articles with headlines like “Facebook is manipulating your emotions” would be a lot more accurate if they read, “Facebook is manipulating your word choice, which some researchers seem to be assuming is a reasonable proxy for your emotions.” Not as catchy?
Mind your ps and ds
Statistics are not sexy. Which is probably why they don’t find their way into many popular news stories — even if they are the basis for everything that goes into the provocative news story byline. For the Kramer et al. article, the eye-catching claim is that “emotions expressed by others on Facebook influence our own emotions…” But what exactly does it mean to say that one thing has “influence” on another? This study featured a clean experimental design with a large sample size consisting of experimental groups (in which news feed behavior was manipulated) and control groups (in which Facebook behaved normally). When judging “influence” we should be interested in three things: the difference between experimental and control groups, the statistical significance of that difference, and the practical significance of that difference.
Before even getting into what “significance” signifies, let’s look at the objective group differences in this study. The most dramatic finding was written up thusly: “When positive posts were reduced in the News Feed, the percentage of positive words in people’s status updates decreased….” Sounds troubling. Fewer positive posts on users’ screens led those users to write fewer positive posts themselves. How steep was the decrease in positive word use between the control and experimental groups?
One-tenth of one percent. The other key findings reported were fractions of that.
Suddenly this study seems a lot less monumental. But let’s keep going to see if such a small shift in word choice is actually worth being concerned about. After all, I don’t have a good metric off the top of my head of what it would mean for my word selection to change by 0.1%. I talk a lot; maybe it’s a huge deal. First we’re interested in statistical significance, which essentially tells us how confident we are that this group difference is due to an actual difference between two populations (as represented by our experimental and control sample groups) rather than being an artifact of some kind. This notion is represented in the study (and most others) by the p value. The Kramer study reports p values ranging from .001 to .007 for the major statistics. Most of the time you will hear this translated as “there is only a .001 to .007 percent chance that the findings were due to random chance.” This is not actually how p values work. The more accurate phrase would be (thanks to my old stats professor Brian Wallace for the following sentence): “If the null hypothesis is true, and all assumptions [of the statistical procedure being used] are met, the p-value represents the probability, when taking a random sample (of a given size) from our population, of observing an inferential statistic of this magnitude (or something even more extreme).” Again, not too sexy. But in general p values from .001 to .007 are considered adequate in psychological research for a finding to be regarded as likely “real.”
Now we move on to practical significance, which has received much more prominence in research in recent years. To recap so far: There is a fraction of a percent change in word use between sample groups, and that change likely represents an actual difference between the populations those sample groups represent. What we really want to know now is, how big is that difference, in practical terms? Just because two groups are statistically different from one another does not mean that the difference is any way impactful or even worth noting. Practical significance — also known as effect size — is especially important for a study like Kramer et al.’s, as statistical significance can be manipulated by having a large sample size. And 689,003 seems like a pretty large sample to me.
Effect size is commonly measured by a statistic called Cohen’s d, which represents a standardized mean difference between groups. Interpreting d values is not a hard-and-fast task, but in general 0.2 is considered a small effect, while 0.8 is considered large. More dramatic effects are represented by much larger numbers. So, what was the effect size of the statistically significant differences in word use (which may or may not be a valid representation of emotion) between groups in the Kramer study?
The highest d value was 0.02. The lowest was 0.001. The authors speak to this toward the end of the paper: “Although these data provide, to our knowledge, some of the first experimental evidence to support the controversial claims that emotions can spread throughout a network, the effect sizes from the manipulations are small….” This is a misnomer. The effect sizes are minuscule. If the subject matter of the study were not so controversial and — dare I say again — sexy, this is the kind of practical insignificance that ought to be severely criticized during the peer review process of a rigorous academic journal like PNAS.
Was it ethical?
We’re not sure the researchers were actually affecting emotions, and even if they were the effect of their manipulations was tiny. Does this excuse the deception involved, in which they parsed, altered, and collected data on hundreds of thousands of users’ Facebook experiences without informed consent? Facebook’s initial reaction to the expressed outrage was to point out that embedded deep in the agreement that all users sign off on in order to use the service is a clause that implicitly grants informed consent for any research purposes using Facebook data. It was then quickly pointed out that that clause did not exist in the end user license agreement when the Kramer et al. study began. Then Facebook kind of apologized.
Certainly something smells fishy here, but again let’s try to keep our heads about us. In the APA Ethics Code there are provisions made for studies that require deception as part of their design, when giving informed consent at the outset would interfere with the purpose of the research. This clearly applies to the Kramer study — if users knew their news feeds were going to be changed ahead of time it would have completely invalidated the whole endeavor. In a case such as this in which deception was necessary, APA provides two criteria to maintain ethicality:
1) The study procedure cannot be “reasonably expected to cause physical pain or severe emotional distress.”
2) Participants must be fully debriefed about the study after their participation has concluded, including an explanation of the need for deception.
The latter, as far as I could tell in researching this article, did not happen, or happened very clumsily and later than it should have. The former — that the procedure is unethical if it could reasonably be expected to cause pain or severe distress — has ironically been called into greater question due to the study authors’ own bombastic claims of “massive-scale contagion” which have been picked up so readily by the news media. As we saw above, the impact on participants was actually quite minimal, even imperceptible, and there is no reason why the researchers would or should have anticipated otherwise. But in order to make the headline-grabbing claim that online social networks can alter how you feel, Facebook and the study authors open themselves up to (further) accusations of unethical behavior.
The bigger picture
The Kramer et al. study and consequent hubbub is just the latest in countless examples of scientific research being disseminated to the public without being properly vetted or explained. A large part of the problem, as I have intimated, is due to the ravenous internet news cycle that picks up and explodes a story before properly researching it or considering that the most sensationalist and scandalous slant may not actually be the truest. But an equal offender can be found in the scientific community. Putting aside methodological flaws for a moment, authors and publishers should be put to task for the bloodless and overwrought style that is now endemic to academic writing. The Kramer article may be freely available online for anyone to read, but I would be impressed if a typical reader would willfully make it through and be both awake and clear about what they had read. (I had difficulty trudging through with six years of professional scientific reading experience.)
Alongside the call for Facebook and similar corporations to be more open and transparent about their practices, I would issue a similar call to psychological researchers to be more open and transparent about their work, which perhaps above all would involve writing in a more expressive, less jargon-y fashion. Such changes would go a long way toward preventing the kind of “social contagion” that occurs when dry research with a controversial bottom line is filtered through a reactionary news media system.