Statistical Analysis for Twitter Spam Detection

Authors

  • Ganesh Udge  Department of Computer Engineering, VIIT, Pune, Maharashtra, India
  • Mahesh Mohite  Department of Computer Engineering, VIIT, Pune, Maharashtra, India
  • Shubhankar Bendre  Department of Computer Engineering, VIIT, Pune, Maharashtra, India
  • Yogeshwar Birnagal  Department of Computer Engineering, VIIT, Pune, Maharashtra, India
  • Mrs. Disha Wankhede  Department of Computer Engineering, VIIT, Pune, Maharashtra, India

DOI:

https://doi.org//10.32628/IJSRSET1962170

Keywords:

Machine Learning, Parallel Computing, Spam Detection, Scalability, Twitter

Abstract

The spreading and learning of new discoveries and information is made available using current online social networks. In Recent days, the solutions may be irrelevant to the actual content; also termed as attacks in the layman’s term such attacks are been performed on Twitter as well and called as Twitter spammers. The quality of data is being compromised by addition of malicious and harmful information using URL, bio, emoticons, audio, images/videos & hash-tags through different accounts by exchanging tweets, personal messages (Direct Message’s) & re-tweets. Misleading sites may be linked with the malicious links which may affect adverse effects on the user and also interfere in their decision making processes. To improve user-experience from the spammers attacks, the training twitter dataset are applied and then by extracting and using the 12 lightweight features like user’s age, number of followers, count of tweets and re-tweets, etc. are used to distinguish the spam from non-spam. For enhancing the performance, the discretization of the function is important for transmission of spam detection between tweets. Our system creates classification model for Spam detection which includes binary classification and automatic learning algorithms viz. Naïve Bayes classifier or Support Vector Machine classifier which understands the behaviour of the model. The system will categorize the tweets from datasets into Spam and Non-spam classes and provide the user’s feed with only the relevant information. The system will report the impact of data-related factors such as relationship between spam and non-spam tweets, size of training dataset, data sampling and detection performance. The proposed system’s function is detection and analysis of the simple and variable twitter spam over time. The spam detection is a major challenge for the system and shortens the gap between performance appraisals and focuses primarily on data, features and patterns to identify real user and informing it about the spam tweets along with the performance statistics. The work is to detect spammed tweets in real time, since the new tweets may show patterns and this will help for training and updating dataset and in knowledge base.

References

  1. Q. Cao, M. Sirivianos, X. Yang, and T. Pregueiro, “Aiding the detection of fake accounts in large scale social online services,” in Proc. Symp. Netw. Syst. Des. Implement. (NSDI), 2012, pp. 197–210.
  2. G. Stringhini, C. Kruegel, and G. Vigna, “Detecting spammers on social networks,” in Proc. 26th Annu. Comput. Sec. Appl. Conf., 2010, pp. 1–9.
  3. J. Song, S. Lee, and J. Kim, “Spam filtering in Twitter using sender receiver relationship,” in Proc. 14th Int. Conf. Recent Adv. Intrusion Detection, 2011, pp. 301–317.
  4. K. Lee, J. Caverlee, and S. Webb, “Uncovering social spammers: social honeypots + machine learning,” in Proc. 33rd Int. ACM SIGIR Conf. Res.Develop. Inf. Retrieval, 2010, pp. 435–442.
  5. Nathan Aston, Jacob Liddle and Wei Hu*, “Twitter Sentiment in Data Streams with Perceptron,” in Journal of Computer and Communications, 2014, Vol-2 No-11.
  6. K. Thomas, C. Grier, D. Song, and V. Paxson, “Suspended accounts in retrospect: An analysis of Twitter spam,” in Proc. ACM SIGCOMM Conf. Internet Meas., 2011, pp. 243–258.
  7. K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song, “Design and evaluation of a real-time URL spam filtering service,” in Proc. IEEE Symp. Sec. Privacy, 2011, pp. 447–462.
  8. X. Jin, C. X. Lin, J. Luo, and J. Han, “Socialspamguard: A data mining based spam detection system for social media networks,” PVLDB, vol. 4, no. 12, pp. 1458–1461, 2011.
  9. S. Ghosh et al., “Understanding and combating link farming in the Twitter social network,” in Proc. 21st Int. Conf. World Wide Web, 2012, pp. 61–70.
  10. H. Costa, F. Benevenuto, and L. H. C. Merschmann, “Detecting tip spam in location-based social networks,” in Proc. 28th Annu. ACM Symp. Appl. Comput., 2013, pp. 724–729.

Downloads

Published

2019-04-30

Issue

Section

Research Articles

How to Cite

[1]
Ganesh Udge, Mahesh Mohite, Shubhankar Bendre, Yogeshwar Birnagal, Mrs. Disha Wankhede, " Statistical Analysis for Twitter Spam Detection, International Journal of Scientific Research in Science, Engineering and Technology(IJSRSET), Print ISSN : 2395-1990, Online ISSN : 2394-4099, Volume 6, Issue 2, pp.624-629, March-April-2019. Available at doi : https://doi.org/10.32628/IJSRSET1962170