social psychology’s crisis of confidence

A recent NYT Magazine article has prompted colleagues and friends alike to ask me, what’s going on in your discipline? Perhaps you’ve heard that there’s a “crisis” in social psychology. It’s been covered prominently–e.g.,  NYT, AtlanticSlateWikipedia. This essay is my attempt at explaining.

Introduction

The present crisis in social psychology can be traced to two highly publicized events in 2010 and 2011—publication of impossible findings using accepted methods of rigorous psychological science (Bem, 2011; Simmons, Nelson, & Simonsohn, 2011), and cases of fraud, notably Diederick Stapel (Finkel, Eastwick, & Reis, 2015; Yong, 2012). These events prompted numerous special issues on methodological rigor, replication, and transparency (e.g., Ledgerwood, 2016; Stangor & Lemay, 2016), large-scale efforts to replicate findings in flagship journals (Open Science Collaboration, 2015), and ominous commentaries from leaders of the field (e.g., Kahneman (2012), “I see a train wreck looming”). The current crisis echoes that of prior decades (Elms, 1975; Gergen, 1973; McGuire, 1973), but has notable differences (Hales, 2016; Spellman, 2014). First, I discuss how common research practices undermine our ability to make valid inferences. Second, I elaborate on why the field is grappling with these issues, and how the current crisis differs from those of the past. I conclude with recommendations for moving forward.

Common (and “Questionable”) Practices

Many research practices in social psychology (e.g., selectively reporting a subset of measures used) have long been recognized as “questionable” because they increase false inferences (e.g., Greenwald, 1975; Rosenthal, 1979). Yet, these practices remain surprisingly common (John, Loewenstein, & Prelec, 2012), due to perverse incentives, norms, or lack of awareness (Nosek, Spies, & Motyl, 2012). Many questionable practices are justifiable sometimes (particularly when reported transparently), though all of them increase the likelihood of false inferences (Nosek et al., 2012 for review). Here, I focus on the practice I see as most central to the current crisis.

The principle common research practice to the present crisis is opaque and misleading reporting of researcher degrees of freedom (Simmons et al., 2011). Researcher degrees of freedom are the set of possible methodological and statistical decisions in the research process. For example, should outliers be excluded? Which items should be used? It is rare, and sometimes impractical, to have a priori predictions about how to make all, or even most, of these decisions. Thus, it is common practice to explore alternatives after seeing data. In a given dataset, slightly different alternatives can lead to vastly different conclusions, and there may be no objective justification for taking one alternative over another (Gelman & Loken, 2013). For example, imagine a test that is non-significant when data are log-transformed, and significant when they are truncated. These two approaches may be equally justified for skewed data. However, we often rationalize in favor of alternatives that meet our expectations, in this case, statistical confirmation of our hypothesis (John et al., 2012). There are many other biases that lead us to favor positive alternatives (e.g., motivated reasoning or hindsight bias). Recall Richard Feynman’s advice to Caltech’s class of 1974, in science “the first principle is that you must not fool yourself – and you are the easiest person to fool.”

Furthermore, bias-prone decisions compound to exacerbate false inferences, even when decisions are seemly bias-free. By way of analogy, imagine the research process is a garden of forking paths. Each fork in the path represents a decision (e.g., truncating data), which eventually leads to an outlet (representing the conclusion). The long and winding path taken through this labyrinth may be justified by scientific logic at each juncture. However, because there are so many junctures, it is improbable that any two scientists (or even the same scientist a year from now) would take the same path through the garden. Deviation at a single fork can lead to disparate outlets, because new decisions are informed by data that were altered by previous decisions (Gelman & Loken, 2013). This is how 29 research teams can examine the same dataset with the same hypothesis, and come to 29 different conclusions (Silberzahn et al., 2017). When decisions are not determined a priori, they are inevitably guided by data and biases that influence the validity of inferences.

Research degrees of freedom increase the likelihood of false inferences, however they do not intrinsically undermine scientific progress. Nonetheless, it is not only common practice to maintain flexibility in design and analysis (Gardner, Lidz, & Hartwig, 2005; Ioannidis, 2005), it is also common to publish results as if only a single path was explored, or even as if a single path was predetermined (Begley & Ellis, 2012; Bem, 2003; Giner-Sorolla, 2012). Such presentation makes it challenging to distinguish between confirmatory (more reliable) and exploratory (more tentative) research. Without reliable representation of the current evidence, it is difficult to determine the degree to which an effect is understood and valid, as well as where to place future research efforts. The regularity of many researcher degrees of freedom accompanied by opaque or misleading reporting is central to the current crisis.

Why we are Reeling

Social psychology is grappling with a crisis (again), because formerly theoretical concerns about replicability (Elms, 1975; Gergen, 1973; McGuire, 1973), have been made tangible by empirical findings (Bem, 2011; Simmons et al., 2011) and fraud (e.g., Stapel)—both of which received considerable attention beyond ivory towers. A Google News search of “replication crisis and social psychology” reveals over 7,000 articles in the last few years including prominent outlets such as NYT, BBC, and WSJ. Scholars agree that outright fraud is a problem, but a rare one, and thus, not a primary concern. In contrast, questionable research practices are concerning because they are so common (John et al., 2012) and can result in impossible findings (Simmons et al., 2011). Many point to Daryl Bem’s (2011) paper on “precognition” as the catalyst of the present crisis. The paper, published in JPSP, appears to show that people have extrasensory perception. The distinguished Lee Ross, who served as peer reviewer, said of it, “clearly by the normal rules that we [used] in evaluating research, we would accept this paper… The level of proof here was ordinary. I mean that positively as well as negatively. I mean it was exactly the kind of conventional psychology analysis that [one often sees], with the same failings and concerns that most research has” (Engber, May 2017). Bem empirically arrived at an improbable conclusion (ESP exists) using common practices for entry into our flagship journal. This prompted Simmons and colleagues (2011) to use the same common practices to conduct an experiment that came to an impossible conclusion (that listening to certain songs can change the listeners’ age). These events led many social psychologists to question common practices, and revisit theoretical concerns of the past.

This Time is Different

The current crisis echoes that of prior decades (Gergen, 1973; McGuire, 1973), even centuries (Allport, 1968; Schlenker, 1974), in that it is concerned with replicability (Stangor & Lemay, 2016)—and rightfully so. The transparent communication of methods that enables scientific knowledge to be reproduced is the defining principle of the scientific method, and perhaps the only quality separating scientific belief from other beliefs (Nosek et al., 2012; Kuhn, 1962; Lakatos, 1978; Popper, 1934). Just as replicability is a sign of a functioning science, so too may be the perpetual self-conscious grappling with claims for scientific status. Psychologists and philosophers of science have long debated the scientific status of social psychology (Schlenker, 1974). In fact, such self-critical angst can be traced to the historical origin of the discipline when we differentiated ourselves from philosophy (Danziger, 1990). Yet, there are notable differences between the “crisis of confidence” in the 1970s (Elms, 1975), and that of today.

First, the former crisis was largely characterized by concerns about external validity, whereas today’s crisis in primarily concerned with threats to statistical conclusion validity (Hales, 2016). For example, McGuire (1967, 1973) worried that our focus on the “ingenious stage manager” of the laboratory produces conditions that render null results meaningless and positive result banal, while at the same time being unlikely to replicate outside the laboratory. Another example is found in Gergen (1973), who argued that social psychological effects are hopelessly dependent on the historical and cultural context in which they are tested, and thus impossible to generalize to principles in a traditional scientific sense.

In contrast, today’s crisis is concerned with the validity of statistical conclusions drawn from an experiment (Hales, 2016). Instead of asking, “does the effect generalize?” We are now asking, “does the effect exist at all?” In the previous crisis, Mook (1983) famously argued in defense of external validity. Laboratory experimentation need only concern itself with “what can happen” (as opposed to “what does happen”). It is the theory tested by a particular experiment that generalizes, not the experiment itself. A compelling defense, however, the assertion rests on the validity of statistical conclusions. The contemporary crisis is grappling with the assertion that common practices not only demonstrate “what can happen,” but that they can be used to show that “anything can happen.” If anything can happen in our laboratories, what differentiates our science from science fiction?

A second way in which the current crisis is different is related to changes in technology and demographics (Spellman, 2014). Technological changes are eliminating space concerns, and increasing speed and transparency of communication. One consequence of which is that people who fail to replicate research can more readily share that information, and see that they are not alone. Thus, it is easier to be critical of the finding itself rather than assume a methodological mistake was made (McGuire, 1973). Similarly, increases in diversity of the field have precipitated more critical questioning of the status quo. In brief, today’s crisis has elements of a social revolution that were missing from prior crises (Spellman, 2014). These factors will fuel a more persistent push for change this time around.

Recommendations

I conclude with recommended changes to improve confidence in our science. In fear of presumption, I follow McGuire (1973) in submitting my suggestions as koans—full of paradox and caveat; they are intended to be at once provocative and banal.

Koan 1:“Does a person who practices with great devotion still fall into cause and effect?…No, such a person doesn’t.”

Preregister

In 2000, the National Heart Lung and Blood Institute (NHLBI) initiated a policy requiring all funded pharmaceutical trials to prospectively register outcomes in an uneditable database, ClinicalTrials.gov. After the policy went into effect, the prevalence of positive results reported in NHLBI-funded trials dropped from 57% to 8% (Kaplan & Irvin, 2015). Preregistration improves confidence in published findings because it reduced selective reporting. More broadly, preregistration makes researcher degrees of freedom more apparent, reduces opaque and misleading reporting (Nosek, Ebersole, DeHaven, & Mellor, 2017), and allows us to better distinguish between confirmatory and exploratory research (Nosek et al., 2012).

Koan 2: “Having our cake and eating it too.”

Explore Small, Confirm Big

There is growing recognition that “small sample sizes hurt the field in many ways” (Stangor & Lemay, 2016), because it undermines both statistical confidence and the perception of rigor (Button et al., 2013). However, there is a trade-off to reckon with—it is resource expensive and unreasonable to test all hypotheses with large samples (Baumeister, 2016). We can have our cake and eat it too if we instead explore new questions with small samples to determine which are worth putting to larger confirmatory tests (Sakaluk, 2016). True, so long as we call a spade a spade. Small-N studies should leave the reader with the impression that the effect is tentative and exploratory, and then attempt to confirm “big” (Baumeister, 2016; Dovidio, 2016). Though, there is disagreement over implementation. Should there be separate journals for small-exploratory and large-confirmatory studies (Baumeister, 2016)? Should those studies appear in sequence in the same paper (Stangor & Lemay, 2016), or in different sections of the same journal (Dovidio, 2016)? My contention is that any of these approaches will be better than the status quo, so long as “truth in advertising” is maintained.

Koan 3:“He who pays the piper calls the tune.”

Gatekeepers and Replicators

Editors and reviewers tacitly agree that replicability is foundational to confidence and scientific progress, yet few journals incentivize replication. A recent study found that, of 1151 psychology journals reviewed, only 3% explicitly stated that they accept replications (4.3% of 93 social psychology journals; Martin & Clarke, 2017). If researchers could be assured that replications get published, more would be conducted. However, what makes for a constructive replication is widely debated. A promising approach is to test hypotheses as exactly as possible, while simultaneously testing new conditions that refine and generalize (Hüffmeier, 2016). Publishers must provide carrots to replicate, preregister, increase sample size, etcetera, or, as Nosek and colleagues suggest (2012), let us do away with them. Make publishing trivial and engage in post-publication peer review, they say. This allows researchers to decide when content is worth publishing and shifts the priority of evaluators to methodological, theoretical, and practical significance, and away from apparent statistical significance. Registered reports prompt a similar shift by enabling results-blind peer review (Munafò et al., 2017). Publishers could act as managers of peer review, focusing solely on bolstering confidence and rigor in the process, instead of also engaging in dissemination, marketing, and archiving. This is a worthy and feasible objective in the internet age (Nosek et al., 2012).

Koan 4: “What is the way? …An open-eyed man falling into the well.”

Transparency

The ultimate solution to our confidence dilemma is openness (Nosek et al., 2012). Make more information from our studies available. Preregistration helps make the research plan transparent, but the field would also benefit from changing norms around sharing and archiving data, materials, and workflows (Simonsohn, 2013; Wicherts, Bakker, & Molenaar, 2011; Wicherts, Borsboom, Kats, & Molenaar, 2006). More transparency not only addresses fabrication, it also enables verification, correction, and aggregation of knowledge—all of which bolster confidence in (and progress of) science. There is concern that greater transparency unveils the messy complexity and conflicting evidence of our science. That it enables science deniers and other malevolent critics in their efforts to mislead the public. To this I say, “fools believe and liars lie,” regardless of truth or access. In my admittedly optimistic view, earnestly open presentation wins confidence in the long run. For example, scientists who concede failures, explore reasons for failure, or are transparent in their publication of failures (as opposed to denying their validity, hiding them, or not acting) are perceived as more able and ethical (Ebersole, Axt, & Nosek, 2016). Scientists overestimate the negative consequences of a failed replications and transparent reporting (Fetterman & Sassenberg, 2015).

Conclusion

The present crisis is not entirely new, but it has critical difference. If we can use common research practice to find the impossible, where does that leave our science? I venture that these koan may move us to embrace our science not as history entirely (Gergen, 1973) but perhaps as evidence-based history. So too, in the style of Rozin (2001), may we start to embrace the exploratory and narrative nature of our present science. Perhaps then, we will again find our confidence.

References (click here)

 

Advertisements

Meehl on theory testing, never gets old.

The position of Popper and the neo-Popperians is that we do not “induce” scientific theories by some kind of straightforward upward seepage from the clearly observed facts, nor do we “confirm” theories as the Vienna positivists supposed. All we can do is to subject theories—including the wildest and “unsupported” armchair conjectures (for a Popperian, completely kosher)’— to grave danger of refutation…

A theory is corroborated to the extent that we have subjected it to such risky tests; the more dangerous tests it has survived, the better corroborated it is. If I tell you that Meehl’s theory of climate predicts that it will rain sometime next April, and this turns out to be the case, you will not be much impressed with my “predictive success.” Nor will you be impressed if I predict more rain in April than in May, even showing three asterisks (for p < .001) in my t-test table! If I predict from my theory that it will rain on 7 of the 30 days of April, and it rains on exactly 7, you might perk up your ears a bit, but still you would be inclined to think of this as a “lucky coincidence.” But suppose that I specify which 7 days in April it will rain and ring the bell; then you will start getting seriously interested in Meehl’s meteorological conjectures. Finally, if I tell you that on April 4th it will rain 1.7 inches (.66 cm), and on April 9th, 2.3 inches (.90 cm) and so forth, and get seven of these correct within reasonable tolerance, you will begin to think that Meehl’s theory must have a lot going for it. You may believe that Meehl’s theory of the weather, like all theories, is, when taken literally, false, since probably all theories are false in the eyes of God, but you will at least say, to use Popper’s language, that it is beginning to look as if Meehl’s theory has considerable verisimilitude, that is, “truth-like-ness.”

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: The slow progress of soft psychology. Journal of Consulting and Clinical Psychology, 46, 806–834. doi:10.1037//0022-006X.46.4.806

Can Theory Change What it is a Theory About?

In Beyond Freedom and Dignity B.F. Skinner writes, “no theory changes what it is a theory about; man remains what he has always been.” By this Skinner means that the underlying rules or processes that guide human behavior are constant, and that knowledge of these processes does not change their nature. However, throughout the social psychological literature we see suggestions of just the opposite—knowledge of a psychological process can change the psychological process. For example, Schmader (2010) provides evidence that simply teaching people about stereotype threat may “inoculate them against its effects.” The theory of social identity threat postulates that people are sensitive to contexts that threaten their identity, and when such a situation is detected people engage in ruminative conflict that can distract them enough to undermine their performance in that setting. Schmader is claiming that giving people knowledge of psychological processes predicted by theory changes the processes that unfold. This point raises several important questions: what is a psychological theory? Does psychological theory describe stable processes in the Skinnerian sense? Can we think of psychological theory in the same way that we think about theories of say physics or biology? If we believe theory must have some element of stability (e.g., if we believe light traveled at the same speed in the middle ages as it does today), and that theories exist out side of and are independent from our knowledge of their existence (e.g. the theory of special and general relativity existed before Einstein identified them, and his discovery did not change their quality), then can we classify social psychological theories as theories? My sense is no. Or maybe we need to modify our definition of what qualifies as a theory. Or perhaps our definition of stability in the processes that underlie phenomena and our belief that observation is independent from underlying processes needs modification.

References

Schmader, T. (2010). Stereotype Threat Deconstructed. Current Directions in Psychological Science, 19, 14–18. doi:10.1177/0963721409359292

 

 

a thought on personal record keeping

The weight of what has gone undocumented can be burdensome. If you are like me, you may find the struggle to find a balance between experiencing and recording life taxing. However, I believe the record keeping process holds the potential to enrich what has been lived and release the pressure of experience, the pressure to hold onto memories. Writing relieves the mental strain required to remember, clearing the way for fuller experience of the current moment. But too much record keeping is like watching the sun set over the Taj Mahal through a video camera–You’re so busy recording what’s happening that you fail to truly experience the happening. Writing about past experiences is never an act of transcription nor is it an act independent of the present. The act recreates the memory, it attempts to reflect what was felt, and in doing so it reshapes your present moment.

 

impression management and open science

I love this Charles H. Cooley (1902, p. 320) quote on how self-presentational concerns have institutional and professional forms (including in science, gasp!)

If we never tried to seem a little better than we are, how could we improve or “train ourselves from the outside inward?” And the same impulse to show the world a better or idealized aspect of ourselves finds an organized expression in the various professions and classes, each of which has to some extent a cant or pose, which its members assume unconsciously, for the most part, but which has the effect of a conspiracy to work upon the credulity of the rest of the world. There is a cant not only of theology and of philanthropy, but also of law, medicine, teaching, even of science—perhaps especially of science, just now, since the more a particular kind of merit is recognized and admired, the more it is likely to be assumed by the unworthy.

The unveiling of fraudulent research among highly acclaimed scientists along with the advent of new computing and archiving technologies has driven a recent (depending on how you measure it) push from within the scientific community for more “open” practices. The debate around open science and reluctance in adopting its practices are rarely discussed in terms of interpersonal processes. However, discussions of open science are discussions about the presentation of scientific research to other scientists and the public. I think the relevance of impression management processes to calls for more openness in science is an area worth exploring in more detail. I’d like to write more on this, please post in the comments if you know of anyone who has written on this topic.

References

Coole, C.H. (1902). Human nature and the social order. New York, NY: C. Scribner’s sons.

thoughts on impression management, feminism, pronoun use, and social justice

Deegan (2013) critiques Goffman’s (1959) The Presentation of Self in Everyday Life as a model of patriarchal use of language and irony, “to perpetuate gender inequality” (p. 79). In Deegan’s conversations with the sociological giant, Goffman laments that he struggles to find a good alternative to the pronoun “he” and that feminists have missed the irony in his use of examples that stereotypically depict woman as subordinate. In turn, Degan suggests using “he/she” or gender neutral pronouns such as “congressperson”. She also makes a strong case for how irony in a patriarchal world perpetuates oppression. Such “joking around” fails to challenge repressive behavior and places the repressed in the awkward position of feeling obligated to laugh at a joke at their own expense, which validates repressive social structures. Irony used in this way is an impression management strategy used by the patriarchy to explicitly acknowledge social injustice while simultaneously reinforcing the power structure.

The feminist movement has made substantial strides since the 1970s due, in no small part, to rethinking how to use language in a way that questions established social hierarchy. This feminist approach to social change is paralleled in other social movements with similar results. Take, for example, the fight for marriage-equality. Whether it was strategic or a consequence of the legal and language structures of the time, the homosexual community adopted terms such as “partner” or “life-partner” to describe what they viewed to be a relationship of equivalent status to that of “husband” or “wife.” Similar to “congressperson,” “partner” is a more general noun that neutralizes the (often fast and strong) urge to code gender. The effect has three consequences for the role of impression management in social change. First, when conversations about relationship status are unavoidable, it helps the homosexual actor retain control and power over the observers’ impression of their relationship and sexual orientation. Generally, it is rude to pry further into their relationship, thus using “partner” to answer relationship questions maintains the “line” of conversation and the actor’s power. Second, the use of more general nouns subtly cues the audience to question the status quo. By neutralizing gender, the noun “partner” introduces a state of uncertainty in the mind of the observer, which naturally leads to questions such as “Am I using the right noun?” Or “Why did they say it that way?” These questions break the automaticity of oppressive assumptions about the relationship between sexual orientation, language, and status. The third consequence of adopting a more general noun is that it enables subtle displays of solidarity with the movement. Many heterosexual couples started using the noun “partner” to describe their wife or husband. Couples that use “partner” synonymously with “husband/wife” are both reshaping the meaning associate with the word and signaling that they endorse the movement. These couples are also normalizing this meaning of “partner,” which blurs the social order repressing the homosexual community.

This analysis sheds light on how strategies of impression management have social justice implications. How can other contemporary groups facing social repression, such as the transgender community, manipulate language and gesture to effectively manage and reshape the impression of others? In a recent conversation, a friend expressed annoyance with what he called, “political correctness” training. His employer required a discussion about changing perspectives on the use of gendered identifiers, such as “he/she”, in the direction of third more neutral term such as “they.” To my surprise, he lamented that this would completely change how we use language. The transgender community faces a different (and perhaps more difficult) set of challenges than those faced by feminists and marriage-equality advocates, but I suspect that the use of “they” or similar terms will follow a similar trend to the use of “partner” and “congressperson.”

References

Deegan, M. J. (2013). Goffman on gender, sexism, and feminism: A summary of notes on a conversation with Erving Goffman and my reflections then and now. Symbolic Interaction, 37, 71-86. doi: 10.1002/symb.85

Goffman, E. (1959). The presentation of self in everyday life. Garden City, NY: Anchor Books.

Where are data on gun violence?

Much of the recent coverage of gun violence in this country points to a lack of data available on the topic. The absence of these data, or at least the inaccessibility of them, points to inherent prejudice. In an age where we collect data on literally everything and use it daily to help explain phenomena and change our world it is telling that it is hard to find good data on gun violence, particular gun violence as it relates to race, sex, age and mental health.

There are some projects working to remedy this. I’d like to see the gun violence archive project expanded. The project started in 2014 as an offshoot of a crowdsourced initiative by Slate, which documented incidents of gun violence after Newton. We need a tool on this website to visualize the data they collect. Maps of incidents that can be tabulated by different variables would help bring to light the normality of gun violence and the prevalence of racially charged incidents. In light of recent events it is noteworthy that this project collects data on “officer involved shootings”. However the project fails to capture officer involved shootings of unarmed person(s). Instead the project counts the following categories under “officer involved shootings”…

  1. Officer shot
  2. Officer killed
  3. perpetrator shot
  4. perpetrator killed
  5. perpetrator suicide at standoff

This is problematic because the method of collection presumes that someone shot or killed by an officer is a perpetrator (someone who has committed a crime). While the project has an “armed” category described in their glossary it doesn’t collect data on “unarmed” incidents. Further, race/ethnicity, age, sex, and mental health status are conspicuously absent from the glossary for this project. These data should be collected!

The data we collect and how we collect it tells us a lot about what we value.

We need to value data on gun violence with an eye toward race, sex, age, and mental health. We need to translate data into graphics and stories to help explain what the heck is going on. And we need to use data and story to inform how we change. Otherwise, I’m afraid outrage will fade, and the status quo will resume until the next everyday tragedy goes viral.