Fact-checking Effect on Viral Hoaxes: A Model of Misinformation Spread in Social Networks Marcella Tambuscio1 ∗ , Giancarlo Ruffo1 , Alessandro Flammini2 , Filippo Menczer2 1 Computer Science Department, University of Turin, Italy 2 School of Informatics and Computing, Indiana University, Bloomington, USA ABSTRACT The Internet and online social networks have greatly facilitated and accelerated information diffusion processes, but at the same time they provide fertile ground for the spread of misinformation, rumors and hoaxes. The goal of this work is to introduce a simple modeling framework to study the diffusion of hoaxes and in particular how the availability of debunking information may contain their diffusion. As traditionally done in the mathematical modeling of information diffusion processes, we regard hoaxes as viruses: users can become infected if they are exposed to them, and turn into spreaders as a consequence. Upon verification, users can also turn into non-believers and spread the same attitude with a mechanism analogous to that of the hoax-spreaders. Both believers and non-believers, as time passes, can return to a susceptible state. Our model is characterized by four parameters: spreading rate, gullibility, probability to verify a hoax, and that to forget one’s current belief. Simulations on homogeneous, heterogeneous, and real networks for a wide range of parameters values reveal a threshold for the factchecking probability that guarantees the complete removal of the hoax from the network. Via a mean field approximation, we establish that the threshold value does not depend on the spreading rate but only on the gullibility and forgetting probability. Our approach allows to quantitatively gauge the minimal reaction necessary to eradicate a hoax. Categories and Subject Descriptors [Human-centered computing]: Collaborative and social computing—Social networks; [Computing methodologies]: Modeling and simulation Keywords Misinformation spread; fact-checking; viral hoaxes; epidemiology; information diffusion models ∗Contact author. Email: tambuscio@di.unito.it Copyright is held by the International World Wide Web Conference Committee (IW3C2). IW3C2 reserves the right to provide a hyperlink to the author’s site if the Material is used in electronic media. WWW 2015 Companion, May 18–22, 2015, Florence, Italy. ACM 978-1-4503-3473-0/15/05. http://dx.doi.org/10.1145/2740908.2742572. 1. INTRODUCTION In a hyper-connected world where information spreads increasingly fast, online social networks and new media play a crucial role in the diffusion of misinformation, i.e., false claims that are mostly spread unintentionally. Few people seem to check the reliability of news before sharing them with their friends and potentially with millions of others. This is mainly due to the fact that the Internet, in particular social networking services, provide a complete decentralization of information on a large scale: every user is potentially a news source, and often it is not trivial to establish the truth. In the last few years there has been a growing interest in this topic, with different approaches and techniques. Several publications have focused on studying characteristics of rumor propagation, analyzing features [5, 11, 18, 7] and proposing diffusion models [4, 12, 21, 20, 1]. Great effort has been devoted to the creation of effective classifiers to detect false content or fake accounts, highlighting recurrent patterns [10, 6, 14, 18]. On the other hand, several theories have been proposed to limit the diffusion of hoaxes, identifying the most influential users or working on prejudices and personal beliefs, from a psychological point of view [3, 9]. Here we focus on analyzing how long a hoax survives in a network, i.e., the duration of time in which there are users who believe it true. Intuitively, hoaxes are very similar to common viruses: people, as nodes in the social networks, can become “infected” and believe fake news upon coming into contact with other infected nodes, or “recover” with a simple fact-checking action. The virus spread problem has been studied extensively and many epidemic models were proposed since the 1920s. Later scientists realized that those mathematical models could describe a large range of other phenomena like social contagion, information spread, and computer virus attacks [7, 21, 13]. We consider compartmental epidemic models, like SIR (Susceptible-InfectedRecovered) and SIS (Susceptible-Infected-Susceptible) [2], in which nodes are characterized by different behaviors represented by states, and the dynamical evolution of the system is ruled by transition rates among the states. We propose a stochastic epidemic model to describe the simultaneous diffusion of a hoax and its relative debunking: it can be seen as a SIS model in which the Infected status is split in two sub-compartments, believers (B) and fact checkers (F), and the transition I → S can be interpreted as a forgetting process. Moreover we have the transition B → F with a fixed probability pverify that indicates the fraction of infected users that check the reliability of the informa- 977 tion received, revealing the hoax. Empirical observations suggest that some hoaxes seem to become endemic in social networks, since periodically same false news re-emerge and “infect” other users, even many years after their first appearance. We are interested in comparing our model with SIS for what concerns the absence of an epidemic threshold (infection can become endemic) in scale-free networks [17]. Furthermore we analyze the role of a fact-checking activity: can we find a threshold for pverify, i.e., a value that assures the complete removal of the hoax from the network? 2. THE MODEL We want to simulate the spread of a hoax and its debunking at the same time, assuming that some users believe the fake news and some others do not, because they decide to verify the information or because they already know it is not true. Therefore we build upon a model for the competitive spread of two rumors [19] to describe the competition among believers and fact-checkers. We extend the model by introducing verifying and forgetting processes. We consider a network represented by a graph G = (V, E). Each node i is associated to a triple of binary indicators, representing its state at time t, that can assume one of three possible values: ∀i ∈ V si(t) = [sB i (t), sF i (t), sS i (t)] =    [1, 0, 0] [0, 1, 0] [0, 0, 1] (1) corresponding to the three possible behaviors for agent i: • sB i (t)=1 (Believer): i believes the fact is true, • sF i (t)=1 (Fact checker): i believes the fact is false, • sS i (t)=1 (Susceptible): i is neutral. We represent the following three phenomena: • spreading [S → B, S → F]: each agent modifies with some probability its state considering the points of view (states) of its neighbors; • verifying [B → F]: each agent can fact-check the hoax with a fixed probability pverify; • forgetting [B → S, F → S]: each agent, regardless of belief state, forgets the news with a fixed probability pforget. Let pi(t) be the probability mass function of i at time t: pi(t) = [pB i (t), pF i (t), pS i (t)] (2) representing the probability of assuming each of the possible behaviors (B, F, S) at time t. The dynamics of the system are given by a random realization for pi(t + 1): si(t + 1) = MultiRealize[pi(t + 1)]. (3) The probability mass function pi(t + 1) is defined by: pB i (t + 1) = fisS i (t) + (1 − pforget − pverify)sB i (t) pF i (t + 1) = gisS i (t) + pverifysB i (t) + (1 − pforget)sF i (t) pS i (t + 1) = pforget(sB i (t) + sF i (t)) + (1 − fi − gi)sS i (t) (4) Figure 1: States and transitions of the model where pforget, pverify are constant probabilities and fi, gi are the “spreading” functions that provide the network effect, describing how the hoax (fi) and its debunking (gi) disseminate among the immediate neighborhood of a vertex. These “spreading” functions are given by: fi(t) = β nB i (t) · (1 + α) nB i (t) · (1 + α) + nF i (t) · (1 − α) gi(t) = β nF i (t) · (1 − α) nB i (t) · (1 + α) + nF i (t) · (1 − α) (5) where β ∈ [0, 1] is a constant parameter for the spreading rate, α ∈ [0, 1) is a constant parameter for the credibility of the hoax (or agents gullibility), and nB i , nF i are the number of neighbors of i that are believers or fact checkers at time t, respectively. Notice that fi(t) + gi(t) = β, i.e., the infection rate, as in the SIS model: indeed, if we consider the two states Believer and Fact checker as a unique state Infected, we recover the SIS model exactly, where pforget is the recovery probability usually denoted by µ. In summary, we have four parameters: the spreading rate β, the gullibility α, the probability pverify to fact-check a hoax, and the probability pforget to forget one’s current belief. We consider here values of pforget and pverify such that pforget + pverify < 1. The model is illustrated in Figure 1. 3. RESULTS To explore whether the network topology plays a role in the persistence of infections, as it happens in SIR/SIS cases [17, 16], we tested our model on different types of networks: random, scale-free, and a real social network from Facebook. In the rest of the paper we will denote with B∞, F∞ and S∞ the densities of believers, fact checkers and susceptible nodes in the infinite-time (equilibrium) limit. 3.1 Scale-free and random networks Let us consider Barabasi-Albert (BA) and Erdos-Renyi (ER) networks with the same size (N = 1000) and same mean degree ( k = 6). To understand the influence of fact-checking activity, in these simulations we fix the values of the spreading rate (β = 0.5) and forgetting probability (pforget = 0.1), varying only pverify and the gullibility parameter α. In Figure 2 we show results obtained with 978 BA network BELIEVER FACT CHECKER SUSCEPTIBLE ER network Figure 2: Model behavior in BA and ER networks with N = 1000 and k = 6. Each line represents number of nodes of same compartment, averaged on 30 iterations, fixing β = 0.5 and pforget = 0.1. α = 0.3 and α = 0.8, pverify = 0.05 in BA and ER networks. The obtained behaviors are similar to those of known epidemic models. We find that: • S∞ does not depend on topology, pverify or spreading rate β, as we will prove with mean-field equations; • α and pverify rule the dynamics of believers and fact checkers, determining the victory of one of the two behaviors; • increased fact-checking (pverify) has the power to remove the hoax (B∞ = 0), however the total infection (intended as the sum B∞ + F∞) remains active in the network. Let us focus on the last point: even with an high gullibility (α ≈ 1), it seems always possible to find a value of pverify such that B∞ = 0. Let us build a phase diagram in which we vary only α ∈ [0, 1) and pverify ∈ [0, 0.3] (the other parameters remain fixed as before). For each point (α, pverify) in the diagram we indicate the value of B∞, using a color palette to show the density of believers. Figure 4 illustrates two phase diagrams for BA and ER networks. They confirm that a relatively small fact-checking activity can cancel the hoax, even when users tend to believe it with high probability. We will derive analytically a theoretical threshold for pverify to guarantee the disappearance of fake content. We tested the model also on real networks, and results confirm the behaviors we obtained with the synthetic networks. In Figure 3 we show results of simulations on a real Facebook network with N ≈ 4000 [8] and BA/ER networks of the same size, with fixed parameters as in Figure 2. When the hoax is more credible, or the agents are more gullible, the population of believers is larger, as expected. FB network BA network ER network Figure 3: Comparison of simulation results on different network types with same size. Parameters: β = 0.5, pforget = 0.1, pverify = 0.05, α = 0.3 (top) and 0.8 (bottom). BA network ER network Figure 4: Phase diagram showing B∞, varying α (hoax credibility or agent gullibility) and pverify. Each point is the averaged result on 30 simulations. 3.2 Comparison with SIS/SIR As explained earlier, our model can be interpreted as an SIS model in which the Infected status is split in two subcompartments, therefore we can investigate analogies between the two models. In Figure 2 and Figure 4 it is immediately evident that the model has basically the same behavior in BA and ER networks. In particular, even when the hoax is removed, its debunking keeps spreading in the network: this means that the “infection” (believers and fact checkers) is still active. This is not surprising: a classic result in epidemic theory [16] says that there is no difference of behavior in heterogeneous and homogeneous networks when the reproduction number R0 = β · k pforget is greater than 1, as in our configuration (R0 = 30 1). But when R0 < 1, traditional SIR and SIS models perform differently depending on the network topology: in a random network, the virus expires; in a scale-free networks, under the right assumptions, the epidemic threshold goes to zero and the infection can reach an endemic level — although with a very small number of infected individuals. In our model we confirm the absence of an epidemic threshold in scale-free networks, as shown by the example in Figure 5. We set the parameters so that R0 = 0.85 < 1. While only 979 BA network ER network Figure 5: As in SIS/SIR models, under the right assumptions, our model has a different behavior in BA or ER networks. In this example the parameter values are β = 0.1, pforget = 0.7, α = 0.8, and pverify = 0.05. Considering the infection as the sum of believers and fact checkers, we observe two different behaviors: in the ER network the infection is removed, while in the BA network the infection reaches an endemic level (with only fact checkers). susceptible users survive in ER networks, a few fact checkers survive in BA networks. 3.3 Analytical results In previous sections we show several simulation results: now we want to prove some of these with a mathematical analysis of the model. We derive mean-field equations for our model, in the hypothesis that all vertices have the same number of neighbors k and these neighbors are chosen randomly, therefore the values pi(t) do not depend on i. With some calculations, we obtain: pS (∞) = pforget β + pforget (6) pB (∞) =    0 β(2α · pforget − (1 − α) · pverify) 2α(β + pforget)(pforget + pverify) (7) pF (∞) = 1 − pB (∞) − pS (∞). (8) We can read these three values as S∞, B∞ and F∞; they match with the simulations results in the previous section. First, the density of susceptible individuals at the infinitetime limit depends only on the spreading rate β and the forgetting probability: Eq. 6 fits very well with numerical results. Second, from Eq. 7 we can obtain a sufficient condition for the hoax to be removed from the network: pB (∞) = β(2α · pforget − (1 − α) · pverify) 2α(β + pforget)(pforget + pverify) > 0 ⇒ pverify < 2α 1 − α · pforget and therefore pverify ≥ 2α 1 − α · pforget ⇒ pB (∞) = 0. (9) The threshold is plotted in Figure 6 for some value of pforget. This result is consistent with simulations and phase diagrams illustrated in Figure 4. Figure 6: Analytical values of pverify, as a function of α (with pforget = 0.1), such that the hoax is completely removed from the network. 4. CONCLUSIONS In this work we proposed a stochastic epidemic model to describe the propagation of a hoax in a social network. This model can be interpreted as a SIS model in which the compartment I of infected users has two sub-compartments: believers and fact-checkers. We implemented and tested the model on heterogeneous (scale-free), homogeneous (random), and real networks, varying parameters and topology. We focused on analyzing the crucial role of fact-checking activity, ruled by a verifying probability. Analytically, we found a threshold for this probability, a sufficient condition that assures the hoax will be removed. This is interesting because it provides an idea of how many fact-checkers would be sufficient to guarantee the complete removal of fake news. We analyzed the results of several simulations, discussing the role of each parameter in the dynamic evolution of the system, and confirming similarities of behaviors with traditional SIS and SIR epidemic models. On the basis of the results presented here, work on this point is continuing to explore if there are regions of the network in which it is easier that the hoax infection becomes endemic. Future work could also involve some extensions to the model: we could insert a “memory effect” or delay the debunking diffusion, as realistically the two propagations may not be simultaneous. Moreover, this model does not take into account the heterogeneity of agents — all have the same gullibility and verification probability, for example. In real life, people may have different tendencies to believe claims that are consistent with their world views and selectively discard factual evidence that is not consistent [15]. Additionally we began to explore how the dynamics respond to a periodical re-injection of the hoax: a spiky behavior appears, with periodic bursts of spreading. Recent work on detection of rumors and fake content identified such bursts as features with high predictive power [6, 10], so they deserve further study. We are studying a more sophisticated 980 model in which we consider different gullibility values in two communities (skeptic and gullible), observing the role of segregation level among the groups, and the polarization of the hoax in the gullible group, a fact that has been empirically observed in [11]. Finally, the model need to be validated with empirical data, analyzing the diffusion of real hoaxes in social networks. Acknowledgments We would like to thank Prof. James Kauffman (University of Virginia) who has been the target of a misinformation phenomenon some years ago. He debunked the fake story that involved him creating a letter (http://drlauraletter. com), but occasionally that rumor starts spreading again through the social networks without any apparent control. He provided us with data about his experience, helping us get some insights at a very early stage of our study. This work was supported in part by NSF (grant CCF- 1101743), DoD (grant W911NF-12-1-0037), and the James S. McDonnell Foundation (grant 220020274). 5. REFERENCES [1] D. Acemoglu, A. Ozdaglar, and A. ParandehGheibi. Spread of (mis) information in social networks. Games and Economic Behavior, 70(2):194–227, 2010. [2] N. Bailey. The Mathematical Theory of Infectious Diseases and its Applications. Griffin, London, 2nd edition, 1975. [3] C. Budak, D. Agrawal, and A. El Abbadi. Limiting the spread of misinformation in social networks. In Proceedings of the 20th international conference on World wide web, pages 665–674. ACM, 2011. [4] F. Chierichetti, S. Lattanzi, and A. Panconesi. Rumor spreading in social networks. In Automata, Languages and Programming, pages 375–386. Springer, 2009. [5] A. Friggeri, L. A. Adamic, D. Eckles, and J. Cheng. Rumor cascades. In Proc. Eighth Intl. AAAI Conf. on Weblogs and Social Media (ICWSM), 2014. [6] S. Kwon, M. Cha, K. Jung, W. Chen, and Y. Wang. Prominent features of rumor propagation in online social media. In Data Mining (ICDM), 2013 IEEE 13th International Conference on, pages 1103–1108. IEEE, 2013. [7] K. Lerman and R. Ghosh. Information contagion: An empirical study of the spread of news on digg and twitter social networks. ICWSM, 10:90–97, 2010. [8] J. Leskovec and A. Krevl. SNAP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data, June 2014. [9] S. Lewandowsky, U. K. Ecker, C. M. Seifert, N. Schwarz, and J. Cook. Misinformation and its correction continued influence and successful debiasing. Psychological Science in the Public Interest, 13(3):106–131, 2012. [10] Y. Matsubara, Y. Sakurai, B. A. Prakash, L. Li, and C. Faloutsos. Rise and fall patterns of information diffusion: model and implications. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 6–14. ACM, 2012. [11] D. Mocanu, L. Rossi, Q. Zhang, M. Karsai, and W. Quattrociocchi. Collective attention in the age of (mis) information. arXiv preprint arXiv:1403.3344, 2014. [12] Y. Moreno, M. Nekovee, and A. F. Pacheco. Dynamics of rumor spreading in complex networks. Physical Review E, 69(6):066130, 2004. [13] M. Nekovee, Y. Moreno, G. Bianconi, and M. Marsili. Theory of rumour spreading in complex social networks. Physica A: Statistical Mechanics and its Applications, 374(1):457–470, 2007. [14] N. P. Nguyen, G. Yan, M. T. Thai, and S. Eidenbenz. Containment of misinformation spread in online social networks. In Proceedings of the 3rd Annual ACM Web Science Conference, pages 213–222. ACM, 2012. [15] B. Nyhan, J. Reifler, and P. A. Ubel. The hazards of correcting myths about health care reform. Medical Care, 51(2):127–132, 2013. [16] R. Pastor-Satorras, C. Castellano, P. Van Mieghem, and A. Vespignani. Epidemic processes in complex networks. arXiv preprint arXiv:1408.2701, 2014. [17] R. Pastor-Satorras and A. Vespignani. Epidemic spreading in scale-free networks. Physical review letters, 86(14):3200, 2001. [18] J. Ratkiewicz, M. Conover, M. Meiss, B. Gon¸calves, A. Flammini, and F. Menczer. Detecting and tracking political abuse in social media. In Proc. Fifth Intl. AAAI Conf. on Weblogs and Social Media (ICWSM), 2011. [19] D. Trpevski, W. K. S. Tang, and L. Kocarev. Model for rumor spreading over networks. Phys. Rev. E, 81:056102, May 2010. [20] L. Weng, A. Flammini, A. Vespignani, and F. Menczer. Competition among memes in a world with limited attention. Scientific Reports, 2, 2012. [21] L. Zhao, X. Qiu, X. Wang, and J. Wang. Rumor spreading model considering forgetting and remembering mechanisms in inhomogeneous networks. Physica A: Statistical Mechanics and its Applications, 392(4):987–994, 2013. APPENDIX A. MEAN-FIELD ANALYSIS We can approximate the infinite-time behavior of the system with mean-field theory. First we simplify Eq. 4 substituting si(t) with pi(t) and considering that pi(t + 1) = pi(t) when t → ∞. Then, in the hypothesis that all vertices have the same number of neighbors k and these neighbors are chosen randomly, we substitute nB i (t) with pB (∞) and nF i (t) with pF (∞) in the spreading functions, Eq.5. We obtain equations only in terms of pB (∞), pF (∞) and pS (∞): p B (∞) =f · p S (∞) + (1 − pforget − pverify) · p B (∞) (10) p F (∞) =g · p S (∞) + pverify · p B (∞) + (1 − pforget) · p F (∞) (11) p S (∞) =pforget · (p B (∞) + p F (∞)) + (1 − f − g) · p S (∞). (12) Moreover, we know that pS i (t) = 1−pB i (t)−pF i (t), therefore we can substitute pS (∞) = 1 − pB (∞) − pF (∞) = 1 − β 981 in 12 and trivially obtain the solution for the susceptible rate at stationary state: pS (∞) = pforget β + pforget . (13) Similarly, we can re-write: pF (∞) =1 − pB (∞) − pS (∞) =1 − pB (∞) − pforget β + pforget (14) and, substituting it in 10, we obtain an equation only in terms of pB (∞): pB (∞) = f· pforget β + pforget +(1−pforget−pverify)·pB (∞) (15) where f = β · pB (∞) · (1 + α) pB(∞) · (1 + α) + (1 − pB(∞) − pforget β + pforget ) · (1 − α) . We can observe that pB (∞) = 0 is solution of 15, i.e. the situation in which the hoax is defeated. To find the rate of believers at the stationary state when the hoax survives, with a little algebra on 15, it is easy to obtain pB (∞) = β(2α · pforget − (1 − α) · pverify) 2α(β + pforget)(pforget + pverify) . (16) For fact-checkers we also have two solutions (hoax survives or not) and we can trivially find them substituting the values for believers and susceptible users in pF (∞) = 1 − pB (∞) − pS (∞). 982