Moderation Techniques for Social Media Content Andreas Veglis – professor Media Informatics Lab – School of Journalism & MC Aristotle University of Thessaloniki Thessaloniki 54006, Greece e-mail:veglis@jour.auth.gr, web page: http://blogs.auth.gr/veglis Abstract: Social media are perhaps the most popular services of cyberspace today. The main characteristic of social media is that they offer to every internet user the ability to add content and thus contribute to participatory journalism. The problem in that this content must be checked as far as quality is concerned and in order to avoid legal issues. This can be accomplished with the help of moderation. The problem is that moderation is a complex process that in many cases requires substantial human resources. This paper studies the moderation process and proposes a moderation model that can guarantee the quality of the content while retaining cost at an affordable level. The model includes various moderation stages which determine the applied moderation technique depending on the publication record of the user that submits the content. Keywords: Social Media, moderation, hybrid moderation, pre-moderation, post moderation, distributed moderation 1 Introduction Since the invention of the WWW, more than20 years ago we have witnessed a tremendous growth in tools and services. Although at the beginning the internet user was considered to be a passive content consumer, nowadays he has the ability to produce or reproduce and disseminate content. This change took place due to the introduction of social media, which are perhaps the most popular internet services today. Social media can be defined as Internet-based applications that belong to Web 2.0, which support the creation and exchange of user generated content. They include web-based and mobile based technologies which can facilitate interactive dialogue between organizations, communities, and individuals. Social media technologies take on many different forms including magazines, Internet forums, weblogs, social blogs, microblogging, wikis, podcasts, photographs or pictures, video, rating and social bookmarking [1]-[3]. Supported by the evolution of social media, internet users are now generating great amounts of user generated content. This content varies from blog comments and participation in online polls to citizen stories that are usually published in media web sites [4]. The problem is that in the traditional web sites there is quality control of the content. In the case of media web sites journalists act as gatekeepers, ensuring the quality of the news content. Thus the authorities of the web site that publishes user generated content are responsible for users’ contributions and attempt to check the validity of the content in order to prevent legal issues that may arise from such content. As far as the methods that can be employed in order to deal with the above issues, they can be summarized in user identification and moderation or other oversight of user material that can guarantee a certain degree of quality. Although user identification is a quite straight forward automatic process, moderation is a complex, costly and time consuming process. This paper studies techniques for checking the quality of the user generated content in the social media, with emphasis on moderation. More precisely by combining existing moderation techniques (pre-moderation, post-moderation, distributed and automated), hybrid moderation is proposed and discussed in detail. This type of moderation exploits the various types of moderation in order to achieve small publication latency, as well as high quality content. It includes various stages which determine the applied moderation technique depending on the publication record of the user that submits the content. User generated content is subjected to multiple moderation cycles that guarantee the success of the moderation process. The technique is subject to customization depending on the characteristics of web site that adopts it. The rest of the paper is organized as follows: Section 2 discuses social media as well as user generated content. The types of user generated content are presented in the following section. Section 4 deals with the existing mechanisms that ensure the quality of the user generated content. The proposed moderation model is presented and discussed in section 5. Conclusions and future extensions of this work are included in the last section. 2 The evolution of social media There is a growing trend of people shifting from the traditional media (newspaper, TV, Radio) to social media in order to stay informed. Social media has often scooped traditional media in reporting current events. Although the majority of original reporting is still generated by traditional journalists, social media make it increasingly possible for an attentive audience to tap into breaking news [1]. A classification scheme for different social media types includes six types: collaborative projects, blogs and microblogs, content communities, social networking sites, virtual game worlds, and virtual social worlds [3]. One of the most widely used types of social media is social networking. A social networking service is a web site that facilitates the building of social networks or social relations among internet users that share similar interests, activities, backgrounds, or real-life connections (http://en.wikipedia.org/wiki/Social_ networking_service). They are web-based services that allow individuals to construct a public of semi-public profile within a bounded system, articulate a list of other users with whom they share a connection, and view and traverse their list of connections and those made by others within the system [5]. Many companies have established a pres- ence in the most popular social networks (for example Facebook) in order to publish their news and attract other members of the social network to their web site. They have also integrated social media links in their web articles in order for users to link to them through their social network profiles. Users have also the ability to interact with the media companies by leaving comments [6]. The most well known and employed social network is Facebook .The latest data indicate that the number of Facebook users is above 1,19 billion and 728 million users login to the system every day (http://thenextweb.com/facebook/2013/10/30/facebook-passes-1-19-billion-monthly- active-users-874-million-mobile-users-728-million-daily-users/#!ubaXH). Although it appeared later than Facebook, Twitter is another example of social media that became quickly very popular among users [1]. Twitter is a social networking and micro-blogging service that enables its users to send and read other users' updates, known as tweets. Twitter is often described as the "SMS of Internet", in that the site provides the back-end functionality to other desktop and web-based applications to send and receive short text messages, often obscuring the actual web site itself. Tweets are text-based posts of up to 140 characters in length. Updates are displayed on the user's profile page and delivered to other users who have signed up to receive them. Users can send and receive updates via the Twitter web site, SMS, RSS (receive only), or through applications. The service is free to use over the web, but using SMS may incur phone services provider fees. Many media companies are using twitter in order to alert their readers about breaking news [6]. The evolution of the social media created participatory (or citizen) journalism. This concept derives from public citizens playing an active role in the process of collecting, reporting, analyzing, and disseminating news and information [7]. Other term used is user generated content [8]. Information and Communication technologies (social networking, media-sharing web sites and smartphones) have made citizen journalism more accessible to people all over the world, thus enabling them to often report breaking news much faster than professional journalists. Notable examples are the Arab Spring and the Occupy movement. But it is also worth noting that the unregulated nature of participatory journalism has drawn criticism from professional journalists for being too subjective, amateurish, and haphazard in quality and coverage (http://en.wikipedia.org/wiki/ Citizen_journalism). Bowman and Willis [7] characterize participatory journalism as “a bottom-up, emergent phenomenon in which there is little or no editorial oversight or formal journalistic workflow dictating the decisions of a staff”. As a substitute there are various concurrent conversations on social networks, as depicted in figure 1. The problem is that in the traditional media journalists are responsible for the news. They decide the stories to cover, the sources to use, they write the text and choose the appropriate photographs. Thus they act as gatekeepers, deciding what the public shall receive [9]. But being gatekeepers constitute them responsible for the quality of the news content. The new media gives journalists the possibility to provide vast quantities of information in various formats. But journalists are responsible not only for how much information and in what form they include in the news stories but for how truthful the information is [8]. Community Advertisers Reporters Editors Publisher Editors Reporters Community Figure 1: Participatory journalism [7]. In the case of participatory journalism journalists contribute only part of a news story. Thus they feel responsible for users’ contributions and they attempt to check the validity of the user generated content. But that is not an easy task, especially in the case that they receive a substantial volume of information from users [8]. 3 Types of User Generated Content Participatory journalism can be achieved with the variety of tools and services, namely: discussion groups, user generated content, weblog, collaborative publishing, Peerto-Peer, XML Syndication [7]. The format for the user participation may vary and in the majority of the cases is under some kind of moderation by professional journalists [10]. Next we present and briefly discuss the types of user generated content. ─ User blog: Users’ blogs hosted on the media web site. ─ User multimedia material: Photos, videos and other multimedia material submitted by users (usually checked by the web sites administrators) ─ User stories: Users written submission on topical issues, suggestions for news stories (selected or/and edited by journalists and published on the media web site) ─ Collective interviews: Chats or interviews contacted by journalists, with questions submitted by users (after moderation) ─ Comments: Views on a story submitted by users (by filling a form on the bottom of the web page) ─ Content ranking: News stories ranked by users (for example the most read, or the most emailed news story) ─ Forums: a) Discussions controlled by journalists, with topical questions posed by the newsroom and submissions either fully or reactively moderated (usually available for a limited number of days, b) Forums where users are able to engage in threaded online conversations on debates (usually available for long periods-weeks or even months). The users are given the freedom to initiate these forum topics. ─ Journalists blogs: Also known as j-blogs, include journalists’ posts on specific topics and are open to user comments. ─ Polls: Topical questions related to major issues, with users asked to make a multiple choice of binary response. They are able to provide instant and quantifiable results to users ─ Social networking: Distribution of links to stories through social platforms, for example Facebook and Twitter. 4 Mechanisms for ensuring the quality of the content The introduction of participatory journalism in media organization has resulted in a cost, related to the need of moderation of the content that can guarantee the quality of the content. If we try to outline the basic areas from which problems may arise concerning user generated content we can identify defamation, hate speech, and Intellectual property. As far as the methods that can be employed in order to deal with the above issues, are concerned, these can be summarized in user identification and moderation or other oversight of user material [11]. 4.1 User registration User registration involves the procedure in which the user provides his credentials, effectively proving his identity upon accessing a web site. Every user can become a registered user by providing some credentials, usually in the form of a username (or email) and password. After the registration of the user, he can access information and privileges unavailable to non-registered users, usually referred to simply as guests. The action of providing the proper credentials for a web site is called logging in, or signing in (http://en.wikipedia.org/wiki/Registered_user). Although user registration is a very common procedure that internet users are familiar with, there is a growing trend of social login or social sigh-in. This is a form of single sign-on using existing login information from a social networking service (Facebook, Google+ or Twitter). By this way logins a simplified for the users and the network administrators are able to acquire reliable demographic information [12]. 4.2 CAPTCHA Another mechanism applied for ensuring the quality of user generated content is CAPTCHA. It is an acronym based on the word "capture" and standing for "Completely Automated Public Turing test to tell Computers and Humans Apart” [13]. It is a type of challenge-response test used in computing as an attempt to ensure that the response is generated by a person. The process usually involves a computer asking a user to complete a simple test which the computer is able to grade. These tests are designed to be easy for a computer to generate, but difficult for a computer to solve, so that if a correct solution is received, it can be presumed to have been entered by a human. A common type of CAPTCHA requires the user to type letters or digits from a distorted image that appears on the screen, and such tests are commonly used to prevent unwanted internet bots from accessing web sites (http://en.wikipedia.org/wiki/CAPTCHA; http://www.captcha.net). This is especially useful in case of comments from unregistered users to blogs, forums, etc. The CAPTCHA technology is widely used in media web sites but sometimes the images that the user is called to identify are much distorted thus resulting in frustration on the part of the user. CAPTCHA is usually employed in the process of user’s registration and in the cases that unregister users are allowed to post comments or upload user generated content in the media web site (see figure 2). Figure 2: Captcha identification procedure (depicted from Facebook registration process) (http://www.register-facebook.com) 4.3 Moderation A moderation mechanism is the method where the webmaster of a media web site chooses to sort contributions which are irrelevant, obscene, illegal, or insulting with regards to useful or informative contributions. In other words he decides if the user generated content is appropriate for publishing or not [14]. Depending on the site's content and intended audience, the webmaster will decide what kind of user content is appropriate, and then delegate the responsibility of sifting through content to lesser moderators. The purpose of the moderation mechanism is to attempt to eliminate trolling, spamming, or flaming, although this varies widely from site to site (http://en.wikipedia.org /wiki/Moderation_system). There are four types of moderation, namely, pre-moderation, post-moderation, automated moderation, and distributed moderation [15]. Pre-moderation: In this type of moderation all content is checked before publishing. Pre-moderation provides high control of the content that is published on the website. But it can result in a substantial reduction of the mount (40% to 50%) of user generated content. It also creates a lack of instant gratification on the part of the participant, who is left waiting for their submission to be cleared by a moderator. This latency might not create problem in some cases (for example in the case of a citizen story) but it will create an inconsistency in the case of a blog post or a forum when users interact with each other in almost real time. Another disadvantage of premoderation is the high cost involved especially if the user generated content is of high volume [15]. Post-moderation: This method involves publishing the content immediately and moderating it within the next 24 hours. All user generated content is replicated in a queue for a moderator to pass or remove it afterwards. The main advantage of this moderation type is that conversations may occur in real time, based on the immediacy offered by the direct publication of the content. Of course this advantage may cause many problems since there is no initial screening of the user generated content, which may include inappropriate material. Automated moderation: This type of moderation differs from the previous types since it does not involve human intervention. It consists of deploying various technical tools (mainly filters) to process user generated content and apply pre-defined rules in order to reject or approve submissions. One of the most typical tool used is the word filter, in which a list of banned words is entered and the tool either stars the word out or otherwise replaces it with a defined alternative, or blocks or rejects the content altogether. A similar tool is the IP ban list which deletes inappropriate external links, or deletes content that comes from banned IPs. Of course there are other more sophisticated filters. Overall automated moderation is a valuable tool that involves an initial cost, but includes no operational cost [15]. Distributed moderation: One other type of moderation is Distributed moderation. This is a form of comment moderation that allows users that participate in the process of participatory journalism to moderate each other. Distributed moderation can be distinguished in two types: User Moderation and Spontaneous Moderation or Reactive moderation [15], [16]. User moderation allows any user to moderate any other user's contributions. This method works fine in web sites with large active population (for example Slashdot). More precisely each moderator is given a limited number of "mod points," each of which can be used to moderate an individual comment up or down by one point. Comments thus accumulate a score, which is additionally bounded to the range of -1 to 5 points. When viewing the site, a threshold can be chosen from the same scale, and only posts meeting or exceeding that threshold will be displayed (http://en.wikipedia.org/wiki/Moderation_system). In the case of spontaneous moderation no official moderation scheme exists. Users spontaneously moderate their peers through posting their own comments about others' comments. One variation of spontaneous moderation is meta-moderation. This method enables any user to judge (moderate) the evaluation (voting) of another user [17]. Meta-moderation can be considered as a second layer of moderation. It attempts to increase fairness by letting users "rate the rating" of randomly selected comment posts. Many media companies use pre and post moderation and others outsourced moderation, by enlisting journalists to moderate the vast amount of content users post on various services (blogs etc) offered by the media companies. In many cases the approach is to over-moderate the user generated content in order to avoid being criticized for trying to manipulating the conversation on various subjects [11]. It is obvious that moderation is a complicated issue. Media companies usually employ various types of moderation depending on the type of user participation. Automated moderation should be employed in every kind of user generated content. Table I includes the types of user generated content versus the moderation type that can be employed. It is worth noting that for certain types of user generated content in which the probability of arising legal issues is high, pre-moderation is the ideal type of moderation. On the other hand in types of user generated content that do not usually arise legal issues, distributed moderation can be applied. In any case all types of distributed moderation can be applied in case that the media web site has a large active population of users [17]. Table I: Types of user generated content versus type of moderation. Type of user generated content Type of moderation User blog Distributed moderation or post moderation User multimedia material Pre-moderation User stories Pre-moderation Collective interviews Pre-moderation Comments Distributed moderation Content ranking Spontaneous moderation Forums Pre-moderation Journalists blogs Pre-moderation Polls Spontaneous moderation Social networking Not applicable* *any comments that may accompany a link to a news article can be moderated only by the social network. Usually social network moderate user content only after a user’s complaint. Figure 3: Hybrid moderation procedure 5 Hybrid moderation Based on the types of moderation previously presented, we propose a mixed approach. This hybrid moderation method involves all moderation types. Next we briefly describe the proposed method. Users who are interested in contributing content will User signing in Content upload Automated mod- eration Publication of content Withhold until pre- moderation Yes No User with a prior record of quality contributions reject accept Pre-moderation reject accept Publication of content Distributed mod- eration reject Meta-moderation accept be obliged to register to the web site. When a registered user adds content the content is submitted immediately to automated moderation. Subsequently the moderation process is determined by the user’s record. More precisely, in case that the user has a record of good quality content, its contributed content can be assigned for post – moderation since there is a high probability that his content is of adequate quality. Thus the content is published immediately. The post moderation process is based on distributed moderation. On the other hand the case that the user has no prior history of good quality user generated content or has submitted in the past poor quality content, its contribution is published only after it has passed the moderation process (pre-moderation). That means that the user is not able to see its content published immediately but this can act as a motive for the user to establish a good publication record that will guarantee the immediate publication of his content. Finally all the published material is subject to meta-moderation. In all cases content is subject to three levels of moderation in order to ensure the quality of the content. The proposed hybrid moderation process is depicted in figure 3. The above model can be adapted to the different characteristics of each web site. For example in the initial time period of a new web site that accepts user generated content, when the registered users will be limited and most of them would not have history of content contributions, all submitted content will be subject to premoderation by the authorities of the web site. As the time will pass and the number of registered users grows distributed moderation will be initiated as well as metamoderation. Thus the hybrid model can be adapted to the requirements of each stage of the evolution of web site. It is worth noting that different contributed content may require different moderation process. For example text contributions can be easily checked by automatic moderation but this is not easy in the case of multimedia content. The content heterogeneity is a difficult parameter for the moderation process. This is an issue that needs further investigation. 6 Conclusions and future extensions. The modern ICTs have changed considerably journalism. Participatory journalism is one of the most profound changes that have occurred. Every user has now the ability to become content producer. There is a great variety of tools that can be employed in participatory journalism. Of course this new type of journalism has many negative issues that raise many concerns (defamation, hate speech, intellectual property). The solution to these problems is the control of the user generated material. This can be achieved with the registration of the users that contribute material and with the moderation of the user generated material. The registration process is a well known process to the users, since it has been employed for many years in many internet services (for example, e-mail services, social networks, etc.). On the other hand moderation can be very time consuming and the media company may have to dedicate many hu- man recourses to this task. Of course there are many different types of moderation (post-moderation, distributed moderation, or even the proposed hybrid moderation) that may alleviate to some extent this problem. The proposed hybrid moderation model combines all existing moderation techniques and applies them based on the publication record of the user. Thus it is able to overcome in many cases the necessary latency that is required in order for the user generated content to be checked. The model also guarantees that all content is subject to three moderation stages. There is no doubt that participative journalism is an issue that no media company can choose to adopt or disregard without great consideration. As usual the solution to this problem is a compromise. The media company chooses to implement some type of citizen participation, usually gradually by imposing strict moderation in order to prevent legal issues. Of course this means that a great deal of user generated material that may be rejected will be of good quality, but will be rejected just in case it might produces legal problems for the media company, thus resulting in a negative effect on its credibility. One solution to this problem is the training of the users that contribute in participative journalism, in order to act as responsible e-citizens. Another proposal involves the careful selection of the issues that are being developed with user generated content. Future extension of this work will involve the detail study of the moderation mechanism employed in participative journalism in order to locate steps in the process that may be improved. One other issue that demands further study is the automatic moderation of multimedia material. Applicable video indexing can be deployed taking advantage of motion and/or color features, while the interaction with audio parameters is very powerful towards multimodal event detection, and summarization [18]. This is also fuelled by the evolution of machine learning algorithms and hybrid expert systems that facilitate many interdisciplinary research topics and knowledge management application areas [19]. However, there are many difficulties in such content recognition and semantic analysis scenarios, which are related with content massiveness and heterogeneity, especially in user contributed content [20]. Nevertheless such focused approaches in such orientation already have been initiated and look promising [21]. 7 References 1. An, J., Cha, M., Gummadi, K., and Crowcroft, J. (2011), Media landscape in Twitter : A world of new conventions and political diversity, Artificial Intelligence (2011) Volume: 6, Issue: 1, Publisher: AAAI, Pages: 18-25 2. Spyridou, L.P., Veglis, A. (2011) Political Parties and Web 2.0 tools: A Shift in Power or a New Digital Bandwagon?, International Journal of Electronic Governance, Vol. 4, No.1/2 pp. 136 – 155 . 3. Kaplan, Andreas M.; Michael Haenlein (2010) "Users of the world, unite! The challenges and opportunities of Social Media". Business Horizons 53(1): 59–68. 4. Veglis, A., and Pomportsis, A., (2013). The e-citizen in the cyberspace – a journalism aspect, in texts and articles from the 5th International Conference on Information Law (ICIL 2012) 5. Boyd, D.M., and Ellison, N.B., (2008), Social Network Sites: Definition, History, and Scholarship, Journal of Computer-Mediated Communication, Volume: 13, Issue: 1, pp. 210-230. 6. Veglis, A. (2012), “Journalism and Cross Media Publishing: The case of Greece” chapter in the The Wiley-Blackwell Handbook of Online Journalism, edited by Eugenia Siapera and Andreas Veglis, Blackwell Publishing. 7. Βοwman, S. and Willis, C. (2003) "We Media: How Audiences are Shaping the Future of News and Information."The Media Center at the American Press Institute. Available at http://www.hypergene.net/wemedia/download/we_media.pdf. 8. Singer, J.B., Hermida, A., Domingo, D., Heinonen, A., Paulussen, S., Quandt, T., Reich, Z., and Vujnovic, M. (2011). Participatory Journalism-Guarding Open Gates at Online Newspapers., Willey-Blackwell. 9. White, D.M. (1950) The gatekeeper: A case study in the selection of news, Journalism Quarterly 27:383-96. 10. Hermida, A., Thurman, N. (2008) A clash of cultures: the integration of user generated content within professional journalistic frameworks at British newspaper web sites, Journalism Practice 2 (3): 342-356. 11. Singer, J.B., (2011). Taking Responsibility: Legal and ethical issues in participatory journalism, chapter in Singer, J.B., Hermida, A., Domingo, D., Heinonen, A., Paulussen, S., Quandt, T., Reich, Z., and Vujnovic, M. (2011). Participatory Journalism-Guarding Open Gates at Online Newspapers., Willey-Blackwell. 12. Prescott B., (2011) "Social Sign-On: What is it and How Does It Benefit Your Web Site?" - Social Technology Review; January 10. Available at http://www.socialtechnologyreview .com/articles/social-sign-what-it-and-how-does-it-benefit-your-web-site 13. Grossman, Lev (2008). "Computer Literacy Tests: Are You Human?". Time (magazine). Available at http://www.time.com/time/magazine/ article/0,9171,1812084,00.html 14. ABC, (2011), Moderating User Generated Content, Guidance Note, available in http://www.abc.net.au/corp/pubs/documents/GNModerationINS.pdf 15. Blaise Grimes-Viort, (2010), 6 types of content moderation you need to know about, Blasise Grimes-Viort – Online Communities & Social Media, December 6th . Available at http://blaisegv.com/community-management/6-types-of-content-moderation-you-need-to- know-about/ 16. Lampe, C., Resnick, P., (2004), Slash(dot) and Burn: Distributed Moderation in a Large Online Conversation Space, in Proc. of ACM Computer Human Interaction Conference 2004, Vienna Austria. 17. Momeni, E., (2012). Semi-Automatic Semantic Moderation of Web Annotations, WWW 2012 Companion, April 16–20, 2012, Lyon, France. ACM 978-1-4503-1230-1/12/04. 18. Dimoulas, C., Avdelidis, A., Kalliris, G., & Papanikolaou, G. (2008). Joint Wavelet video denoising and motion activity detection in multi-modal human activity analysis: Application to video – Assisted bioacoustic/psycho-physiological monitoring. EURASIP Journal on Advances in Signal Processing. doi:10.1155/2008/792028. 19. Dimoulas, C., Papanikolaou, G., Petridis, V. (2011). Pattern classification and audiovisual content management techniques using hybrid expert systems: a video-assisted bioacoustics application in abdominal sounds pattern analysis. Expert Systems with application 38 (10), 13082–13093. 20. Kotsakis, R., Kalliris, G., & Dimoulas, C. (2012). Investigation of broadcast-audio semantic analysis scenarios employing radio-programme-adaptive pattern classification. Speech Communication, 54(6), 743-762. 21. Chen, T. M., & Wang, V. (2010). Web filtering and censoring. Computer, 43(3), 94-97.