Data Science and Innovative Research Methods: Where are the social scientists?

Lately I’ve been thinking about innovative, novel and alternative research methodolgy. With the advent of new technologies and an abondance of raw information available through these technologies, an up-and-coming field of social science will most definitely be some mixture of social methodology and the emerging field of, what people are calling, data science.

A multidisciplinary field, data science has taken hold in the private sector as companies seek data-driven approaches to improve how they conduct business. Generally, data scientists use statistics and computer science to turn large amounts of nonsensical data into orderly pieces of knowledge or advise that is comprehensible to the non-expert. I appreciate the mission of these scientists. Too often you here glorified rhetoric on the availability of data, or “big data”, as if having lots of data at your fingertips is valuable in itself. The assumption and mistake here is to equate data with knowledge.  I’d argue that while access to vast amounts of data is opening up new possibilities, we don’t truley understand what the possibilities are until these data are turned into knowledge. So how are these data turned into knowledge? The answer is most often, “well, it’s very technical, but here is what we found”. This is where the story telling comes in. A successful data scientist has to be good story teller, this is their value added. Usually companies don’t care about how the scientist got to their answer they care about what the answer says and how it is described to them. The issue with this is that with a large enough dataset and without a careful and critical eye toward methodology, it is easy enough for the data scientist to tell whatever story (i.e. create whatever knowledge) they please. In many ways the method for analyzing the dataset will actually determine the discovery, rather than the “data mining” discovering some underlying property of the dataset. In reality the method is the story.

This is where social science comes in. While data science is about data, these data are, more often then not, about people–how they behave and interact, what they want and feel. If you buy this logic, data science should perhaps fall under the social/behavioral umbrella. If you take a moment to think about it, you can see how social science methodology and theory would add considerable value to the stories data scientists tell, and the method they employ. Further I believe the social science community has a lot to gain by tapping the resources and methodology of the data scientist. For example, rarely do social scientists work with real-time or crowd-sourced data. Though, crowd-sourcing models are in use among some scientists (e.g. Galaxy ZooFolditStudy at Washington UniversityLucien Engelen: Crowdsource your health, and curetogether), and these provide a promising example of not only how to collect large quantities of meaningful data but also how to analyze it (see Figure 1). There is still work to be done to explore how these methods can be employed to advance social study. I believe it will be fruitful for social scientists to thoroughly vet the available strategies, take some lessons from the private sector or data science, rework our current methods, and brainstorm some new ones.

Capitalizing on the “quantified self” movement (wikipedia definition) may be one strategy to consider for social researchers. In light of new technologies a rapidly growing number of individuals are quantifying, tracking, and analyzing data relevant to their lives. This trend toward the quantified self provides an opportunity for scientists to create applications or develop projects that allow communities to track their own progress on any number of measures–from mood to bites of food. There are even some apps ambitious enough to try to track and quantify all areas of life. For social scientists to tap these new data sources, we may need to be more will to develop private-public partnerships, which would enable the develop of technologies required for this kind of study. Another reason to partner has to do with participant/consumer engagement. While individuals sometimes contribute their data, time and money to existing crowd-sourced platforms with no expectation of compensation, there is often an exchange agreement (in writing or otherwise) in crowd-sourced technologies that trades information for service or entertainment. It is becoming clear that one of the largest barriers to collecting useful crowd-sourced information is engaging the crowd. There are organizations, companies, and online communities that work to engage audiences in self tracking and explore the benefits (see Quantified SelfMassolutions, and crowdsourcing.org) and social scientist should look to develop these partnerships and assess their potential value. While these new methods for collecting data require refinement, the measures of what to collect and how to create controls also need to be thought through.

This is all to say I think there are great possibilities in exploring innovative and alternative social methodology, that utilizes lessons and strategies from data science, private enterprise and new technologies. If the field wants to stay with the times and continue to guide how our government and society makes decisions we better start expanding and evolving our methods.

Figure1: Crowd-Sourced Research Methodologies