When it comes to social media statistics, there are often impressive sounding metrics that upon closer scrutiny may be padded or unimportant. Similarly, there appears to be weak methodology in scientific studies of users on social media that may lead to the data being unsound and unscientific.
Sampling the population via social media is problematic because of its ubiquity and lack of standards. “[M]ounting evidence suggests that many of the forecasts and analyses being produced misrepresent the real world,” according to Juergen Pfeffer and Derek Ruths, authors of a recent paper titled “Social media for large studies of behavior,” which appeared in Science.
One of the main problems identified in the paper was sampling bias. Sampling bias occurs when users are polled or observed without taking into account the user demographics. Not all social networks are patronized equally: LinkedIn users tend to be more educated, Twitter’s base is starting to swing toward more male users, Facebook skews significantly female and Pinterest users are generally wealthier.
Another problem is how the data is collected. In certain circumstances, the data is passed through proprietary filters put in place by the network before it reaches researchers. The ubiquity of bots, PR accounts, fake followers and other non-genuine accounts can also skew results. This problem may be partially solved by accessing Twitter’s fire hose of data, for example, but eliminating bias and false positives is still difficult.
The issue with collecting data without applying proper scientific rigor is that we end up with bad data and false positives. Selection alone can serve up a desired outcome that is unlikely to be true, as we saw when Princeton and Facebook dueled with data.
Another issue with large-scale data analysis like this is the question of ethics. Facebook’s mood study, which manipulated users’ feeds in an attempt to determine whether or not moods can be changed. Not only was this widely regarded as unethical, the changes made only impacted user behavior by about 0.07 percent .
Big data seems like a great idea, but without the rigors of the scientific method, social media studies can distort our perceptions of network use and human behavior. The research-for-profit model may remain the dominant force in the market, but perhaps it’s time to give scientific researchers better access to data if we want clearer and scientifically sound results.