The Persian SMS Corpus: A Data Resource for Intelligent SMS Processing

نویسندگانمحسن رحمانیان - امید عرفان‌منش
همایشششمین کنفرانس مشترک فازی و سیستم‌های هوشمند ایران
تاریخ برگزاری همایش۱۳۹۶-۱۲-۰۹
محل برگزاری همایشکرمان
نوع ارائهسخنرانی
سطح همایشبین المللی

چکیده مقاله

Getting unwanted SMS or spam messages by users in high volume is one of the problems that, along with all the benefits of mobile technology, can lead to user discontent. Most mobile operators offer solutions for spamming, but most of these methods limit the filtering of advertisement SMS messages from specific numbers. In scientific literature, there are various ways to filter out spam messages, which in most cases have been based on statistical analysis of best practices. Statistical analysis methods require filtering of spam messages into a proper set of text data. The standard corpuses used in most scientific papers are in English. In the research carried out by the authors, a proper and standard collection of Persian text messages that was publicly available was not obtained. So, in this project, the first version of the Persian text message body called PSMS has been presented and evaluate its performance by several popular SMS processing algorithms. The results of the experiments show that it can be used with high reliability of this corpus to develop intelligent methods for filtering Persian spam messages. Keywords: Spam SMS, Machine Learning, Persian Corpus, Artificial Intelligence