This corpus has been collected from free or free for research sources at the Internet:
[1] Gómez Hidalgo, J.M., Cajigas Bringas, G., Puertas Sanz, E., Carrero García, F. Content Based SMS Spam Filtering. Proceedings of the 2006 ACM Symposium on Document Engineering (ACM DOCENG'06), Amsterdam, The Netherlands, 10-13, 2006.
[2] Cormack, G. V., Gómez Hidalgo, J. M., and Puertas Sánz, E. Feature engineering for mobile (SMS) spam filtering. Proceedings of the 30th Annual international ACM Conference on Research and Development in information Retrieval (ACM SIGIR'07), New York, NY, 871-872, 2007.
[3]
Cormack, G. V., Gómez Hidalgo, J. M., and Puertas Sánz, E. Spam filtering for short
messages. Proceedings of the 16th ACM Conference on Information and
Knowledge Management (ACM CIKM'07). Lisbon, Portugal, 313-320, 2007.
Please
read carefully the included readme file.
We would appreciate:
Almeida, T.A., Gómez Hidalgo, J.M., Yamakami, A. Contributions to the Study of SMS Spam Filtering: New Collection and Results. Proceedings of the 2011 ACM Symposium on Document Engineering (DOCENG'11), Mountain View, CA, USA, 2011. (preprint)
The
SMS Spam Collection has been created by Tiago A.
Almeida and José María Gómez
Hidalgo.
About
We
would like to thank Min-Yen
Kan and his team
for making the NUS
SMS Corpus available.
(c) Tiago A.
Almeida and José
María Gómez Hidalgo, 2011.