The massive increase of spam is posing a very serious threat to email and SMS, which have become an important means of communication. Not only do spams annoy users, but they also become a security threat. Machine learning techniques have been widely used for spam detection. In this paper, we propose another form of deep learning, a linguistic attribute hierarchy, embedded with linguistic decision trees, for spam detection, and examine the effect of semantic attributes on the spam detection, represented by the linguistic attribute hierarchy. A case study on the SMS message database from the UCI machine learning repository has shown that a linguistic attribute hierarchy embedded with linguistic decision trees provides a transparent approach to in-depth analysing attribute impact on spam detection. This approach can not only efﬁciently tackle ‘curse of dimensionality’ in spam detection with massive attributes, but also improve the performance of spam detection when the semantic attributes are constructed to a proper hierarchy.
- deep learning
- spam detection