Proceedings

EPJ Data Science Highlight - Social media trending: real or manufactured?

Pixabay, CC0 Public Domain.
Pixabay, CC0 Public Domain

The era of "fake news" is upon us. Navigating social media is a constant exercise of judgement, but data science can be a helpful to distinguish real from fabricated trending topics. In EPJ Data Science, Emilio Ferrara and team set out to determine from very early on whether information is being organically or artificially disseminated on social media.

Guest post by Emilio Ferrara, originally published on SpringerOpen blog

Every day, billions of individuals participate in online social media platforms. These digital ecosystems expose their users to tailored information based on individual interests, friendship networks, and the news from the offline world. Each “story”, which in concert with related ones forms a “meme” or information campaign, can emerge organically, from grassroots activity, or in some cases sustained by advertisement or other coordinated efforts.

Most information campaigns are genuine and benign; however, we recently witnessed the emergence of “bad actors” exploiting social media to alter public opinion, with the intent to deceive, or just create chaos. For example, our research showed that before the 2016 US presidential elections fake news became the vehicle to spread disinformation, attack candidates, and generate confusion online. Similarly, we demonstrated how ISIS and other extremist groups exploited Twitter for terrorist propaganda and recruitment purposes.

It is therefore of paramount importance to be able to detect, in their early stage, memes and information campaigns that are artificially sustained, and separate them from the organic ones. This problem has important social implications and poses numerous technical challenges, in part due to the scarcity of large scale annotated datasets with examples of both types of information campaigns.

In EPJ Data Science, we make progress in the direction of discriminating between trending memes that are either organic or promoted by means of advertisement. This classification proves very challenging: ads usually cause bursts of collective attention that can easily be mistaken for those yielded by organic trends. Fortunately, we can rely on Twitter for labeled examples: when a hashtag is promoted by an advertiser, Twitter clearly states so. This feature allowed us to collect a dataset of millions of tweets belonging to promoted information campaigns, as well as millions of tweets belonging to organic trends.

We propose a machine-learning framework and new techniques to classify such memes. Our algorithm exploits hundreds of time-varying features to capture changing network and diffusion patterns, content and sentiment information, timing signals, and user metadata.

We conceptualize two different prediction problems: the early detection of promoted information campaigns right at trending time poses significant challenges due to the minimal volume of activity data available for prediction prior to trending; campaign detection after trending is easier due to the large volume of activity data generated by the many users joining that conversation.

Our framework achieves 75% accuracy for early detection, increasing to above 95% after trending. We evaluate the robustness of the algorithm by introducing several factors, such as random temporal shifts on trend time-series, to reproduce situations that may occur in the real world. We finally explore which features predict promoted campaigns best, finding that content cues provide consistently useful signals; user features are more informative for early detection, while network and timing features are more helpful once more data is available.

In the future, we will extend this framework to monitor social media to detect coordinated information efforts such as fake news, conspiracy theories, anti-vaccination campaigns, etc.

This was our first experience of publishing with EPJ Web of Conferences. We contacted the publisher in the middle of September, just one month prior to the Conference, but everything went through smoothly. We have had published MNPS Proceedings with different publishers in the past, and would like to tell that the EPJ Web of Conferences team was probably the best, very quick, helpful and interactive. Typically, we were getting responses from EPJ Web of Conferences team within less than an hour and have had help at every production stage.
We are very thankful to Solange Guenot, Web of Conferences Publishing Editor, and Isabelle Houlbert, Web of Conferences Production Editor, for their support. These ladies are top-level professionals, who made a great contribution to the success of this issue. We are fully satisfied with the publication of the Conference Proceedings and are looking forward to further cooperation. The publication was very fast, easy and of high quality. My colleagues and I strongly recommend EPJ Web of Conferences to anyone, who is interested in quick high-quality publication of conference proceedings.

On behalf of the Organizing and Program Committees and Editorial Team of MNPS-2019, Dr. Alexey B. Nadykto, Moscow State Technological University “STANKIN”, Moscow, Russia. EPJ Web of Conferences vol. 224 (2019)

ISSN: 2100-014X (Electronic Edition)

© EDP Sciences