Statistical Analysis for Twitter Spam Detection

Ganesh Udge; Mahesh Mohite; Shubhankar Bendre; Yogeshwar Birnagal; Mrs. Disha Wankhede

doi:10.32628/IJSRSET1962170

Authors

Ganesh Udge Department of Computer Engineering, VIIT, Pune, Maharashtra, India
Mahesh Mohite Department of Computer Engineering, VIIT, Pune, Maharashtra, India
Shubhankar Bendre Department of Computer Engineering, VIIT, Pune, Maharashtra, India
Yogeshwar Birnagal Department of Computer Engineering, VIIT, Pune, Maharashtra, India
Mrs. Disha Wankhede Department of Computer Engineering, VIIT, Pune, Maharashtra, India

DOI:

https://doi.org//10.32628/IJSRSET1962170

Keywords:

Machine Learning, Parallel Computing, Spam Detection, Scalability, Twitter

Abstract

The spreading and learning of new discoveries and information is made available using current online social networks. In Recent days, the solutions may be irrelevant to the actual content; also termed as attacks in the layman’s term such attacks are been performed on Twitter as well and called as Twitter spammers. The quality of data is being compromised by addition of malicious and harmful information using URL, bio, emoticons, audio, images/videos & hash-tags through different accounts by exchanging tweets, personal messages (Direct Message’s) & re-tweets. Misleading sites may be linked with the malicious links which may affect adverse effects on the user and also interfere in their decision making processes. To improve user-experience from the spammers attacks, the training twitter dataset are applied and then by extracting and using the 12 lightweight features like user’s age, number of followers, count of tweets and re-tweets, etc. are used to distinguish the spam from non-spam. For enhancing the performance, the discretization of the function is important for transmission of spam detection between tweets. Our system creates classification model for Spam detection which includes binary classification and automatic learning algorithms viz. Naïve Bayes classifier or Support Vector Machine classifier which understands the behaviour of the model. The system will categorize the tweets from datasets into Spam and Non-spam classes and provide the user’s feed with only the relevant information. The system will report the impact of data-related factors such as relationship between spam and non-spam tweets, size of training dataset, data sampling and detection performance. The proposed system’s function is detection and analysis of the simple and variable twitter spam over time. The spam detection is a major challenge for the system and shortens the gap between performance appraisals and focuses primarily on data, features and patterns to identify real user and informing it about the spam tweets along with the performance statistics. The work is to detect spammed tweets in real time, since the new tweets may show patterns and this will help for training and updating dataset and in knowledge base.

References

Q. Cao, M. Sirivianos, X. Yang, and T. Pregueiro, “Aiding the detection of fake accounts in large scale social online services,” in Proc. Symp. Netw. Syst. Des. Implement. (NSDI), 2012, pp. 197–210.
G. Stringhini, C. Kruegel, and G. Vigna, “Detecting spammers on social networks,” in Proc. 26th Annu. Comput. Sec. Appl. Conf., 2010, pp. 1–9.
J. Song, S. Lee, and J. Kim, “Spam filtering in Twitter using sender receiver relationship,” in Proc. 14th Int. Conf. Recent Adv. Intrusion Detection, 2011, pp. 301–317.
K. Lee, J. Caverlee, and S. Webb, “Uncovering social spammers: social honeypots + machine learning,” in Proc. 33rd Int. ACM SIGIR Conf. Res.Develop. Inf. Retrieval, 2010, pp. 435–442.
Nathan Aston, Jacob Liddle and Wei Hu*, “Twitter Sentiment in Data Streams with Perceptron,” in Journal of Computer and Communications, 2014, Vol-2 No-11.
K. Thomas, C. Grier, D. Song, and V. Paxson, “Suspended accounts in retrospect: An analysis of Twitter spam,” in Proc. ACM SIGCOMM Conf. Internet Meas., 2011, pp. 243–258.
K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song, “Design and evaluation of a real-time URL spam filtering service,” in Proc. IEEE Symp. Sec. Privacy, 2011, pp. 447–462.
X. Jin, C. X. Lin, J. Luo, and J. Han, “Socialspamguard: A data mining based spam detection system for social media networks,” PVLDB, vol. 4, no. 12, pp. 1458–1461, 2011.
S. Ghosh et al., “Understanding and combating link farming in the Twitter social network,” in Proc. 21st Int. Conf. World Wide Web, 2012, pp. 61–70.
H. Costa, F. Benevenuto, and L. H. C. Merschmann, “Detecting tip spam in location-based social networks,” in Proc. 28th Annu. ACM Symp. Appl. Comput., 2013, pp. 724–729.

Statistical Analysis for Twitter Spam Detection

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite